Your face is all you need.

AI News for 9/29/2025-9/30/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (196 channels, and 7053 messages) for you. Estimated reading time saved (at 200wpm): 509 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

It’s been 1.5 years since the Sora announcement and 10 months since Sora.com was released to the public, and 4 days after Meta announced their controversial Vibes app, and Sora 2 (as leaked) released today to good fanfare (about 7x smaller by HN upvotes).

Sora 2 has good improvements on all the physical world issues that were quickly found with Sora 1-era video models - including gymnastics and figure skating routines:

The blog post mentioned “implicit” models - no explicit world model yet as many have speculated from Genie research. But there is definitely some training on video games and browser output.

Apart from the “video with native audio” feature that Veo 3 has had for months now, one standout new feature is the ability to “inject elements of the real world into Sora 2” from a single demonstration video, which OpenAI employees are clearly having a ton of fun with:

In his personal blogpost, Sama calls this “character consistency”.

This feature and the model are now productized into a new Sora iOS app and website experience, gated by invite code for now, as a “cameos” feature, which is central to how the new Sora social network functions.

And yes, we emphasize the literal “social network” - as Sama promised earlier this year - the new Sora app has profiles, follower counts, DMs, and already has its first viral video.

The team (and former members) took some pains on the livestream to talk about the safeguards put in place, e.g. an anti-doomscrolling timeout.

The cameos are all self uploaded videos as part of the onboarding, and you can set permissions in place for others to use your likeness (or not). Notably, Sam Altman’s likeness is available for everyone to use, so that’s why you’ll be seeing a lot of deepfakes of Sam in your social feeds over the coming days.

AI Twitter Recap

Anthropic’s Claude Sonnet 4.5: capabilities, coding, and early evals

Claude 4.5 Sonnet (200K ctx, 64K max output): Anthropic’s upgrade brings higher intelligence at the same price as Sonnet 4 ($3/$15 per 1M input/output), with improved token efficiency even in “Thinking” mode. Independent evals from Artificial Analysis place it behind GPT‑5-high but ahead of Gemini 2.5 Pro and Grok 4 Fast, while remaining notably frugal with output tokens; they also note larger gains in agentic tool use and safety/alignment behaviors than in prior benchmarks (thread). On ARC‑AGI, Sonnet 4.5 tracks GPT‑5 closely with performance scaling meaningfully at higher thinking budgets (@GregKamradt; commentary). Users report standout “state management” and context compaction, making long agentic workflows more reliable (@nickbaumann_; @skirano). Ecosystem support landed quickly: LangSmith cost tracking/playground (@Hacubu), ARC Prize results (@scaling01), and community measurements on LiveBench and Deep Research Bench with strong coding/math placements (1, 2).
Claude Code 2 and agent stack: Anthropic shipped Claude Code v2, VS Code extension updates, context editing and memory tools (launch roundup). Replit reports Sonnet 4.5 improves reliable code edits and autonomy in Agent 3 (@pirroh). Anthropic also published an engineering blog on “context engineering” (beyond prompt engineering) for agent systems (@AnthropicAI).

Zhipu’s GLM‑4.6 (open weights) and agentic coding focus

GLM‑4.6 release (MIT license): Zhipu extends the GLM‑4.5 line with 200K context, stronger coding, improved reasoning/tool use, and better agent task success, while using ~15% fewer tokens per trajectory vs 4.5. Zhipu published CC‑Bench‑V1.1 (74 real-world agentic coding tasks with full trajectories) showing GLM‑4.6 near-parity to Claude Sonnet 4 in coding and leading domestic peers, with all eval details open (@Zai_org, bench; analysis by @gm8xx8). Open weights and API are live; hosting on HF/ModelScope incoming.
Ecosystem uptake: Available on OpenRouter (@OpenRouterAI), Yupp (@yupp_ai), YouWare (@YouWareAI), Roo Code (@roo_code), Cline (@cline), and Anycoder (@_akhaliq). Locally, MLX runs GLM‑4.6 at ~17 tok/s on M3 Ultra (5.5 bpw quant; 5.3K tokens) (@awnihannun).

Frontier video models: Sora 2 launch and early comparisons

OpenAI Sora 2 and app: OpenAI released Sora 2 with an iOS app (US/Canada invite-only at launch), cameo features (consent controls, watermarks), and a system card; Android and API are planned. OpenAI emphasizes “world simulation” demos with improved physics/steerability and audio, while acknowledging the risks of algorithmic feeds and deepfakes (product post, teaser, Sam Altman’s note). Reactions are mixed: some highlight standout realism/consistency; others point to artifacts and note Google’s Veo 3 as competitive in certain cases (pro, skeptic, physics demo).
Luma Ray 3: Luma’s new Ray 3 ranks #2 in Artificial Analysis’ T2V Video Arena, introducing an iterative chain-of-thought generation loop and 16-bit HDR support (I2V/T2V up to 10s 1080p). API not yet available (@ArtificialAnlys).

Training efficiency and post-training: FP4, QAT, and RL during pretraining

NVFP4 (NVIDIA): 4‑bit pretraining with 2‑level scaling, RHT, and stochastic rounding matches FP8 baselines on a 12B model trained on 10T tokens (MMLU‑Pro 62.58 vs 62.62), promising ~6.8× efficiency and ~50% lower memory; Blackwell supports FP4 matmul and required rounding modes (paper/code, summary). Open-source TE support is in progress.
Compute-Optimal QAT (Apple): A scaling law for budgeting quantization-aware training vs full-precision given tokens/memory; practical guidance for planning QAT as a first-class citizen in training schedules (@aldrmv, @awnihannun).
RLP (NVIDIA): Reinforcement Learning Pre‑training teaches models to “think before predicting” with a verifier‑free, dense information-gain reward on web text, yielding sizable boosts over base models (e.g., +19% Qwen3‑1.7B, +35% Nemotron‑Nano‑12B on math/science suites) and compounding with post‑training (paper/blog).

Learning from users and agent memory

RLHI (Meta): Reinforcement Learning from Human Interaction trains directly from organic user conversations (user-guided rewrites and user-based rewards), outperforming baselines on personalization and instruction following while retaining standard benchmark performance (@jaseweston, paper).
ReasoningBank (agents): A memory system that stores distilled strategies from both successes and failures to improve reuse and efficiency in web/SWE tasks, reporting +34.2% efficiency and –16% steps vs prior memory methods (tweet).
Efficient sequence models: SWAX combines sliding-window attention with xLSTM and stochastic window sizes to boost both short/long recall (tweet). For diffusion LMs, SparseD proposes sparse attention (1.3–1.5× faster near‑lossless) and LLaDA‑MoE (sparse MoE dLLM) reports SOTA among diffusion LLMs with smaller active params (SparseD, LLaDA‑MoE). Finally, MobileLLM‑R1 shows sub‑billion parameter reasoning models (950M) hitting AIME 15.5 with ~2T tokens of curated data and standard post‑training (tweet).

Agentic coding stacks and infra

Local and hosted agent stacks: AMD endorsed local “vibe coding” with Cline + LM Studio, recommending Qwen3‑Coder‑30B (4/8‑bit) and GLM‑4.5‑Air for higher RAM tiers (@cline). AI SDK now routes to any HF model (@nishimiya). Cursor 1.7 adds prompt suggestions and org-wide rules (@cursor_ai). Sim launched a fully local, open-source drag‑and‑drop agentic workflow builder with MCP integrations (thread).
Codex vs Claude Code operational choices: Reverse‑engineering notes emphasize OpenAI Codex CLI’s shell‑first loop (think→tool→observe), unified diffs to reduce error surface, and OS‑level sandboxing vs heavier tool orchestration (analysis). Meanwhile, GitHub MCP Registry and Claude extensions continue to mature in VS Code (@code, @gallabytes).

Periodic Labs: AI scientists + autonomous labs

Led by Liam Fedus and Doğuş Ekin, Periodic raised a $300M founding round led by a16z to build AI scientists paired with autonomous labs for verifiable, experiment‑driven science—targeting materials (e.g., superconductors) and semiconductor advances; team includes alumni behind ChatGPT, GNoME, attention, MatterGen, and scaled autonomous physics labs (launch, a16z). The thesis: internet text is finite; progress needs new, high‑signal experimental data and closed‑loop verification.

Top tweets (by engagement)

“Sound on.” Sora 2 teaser by @OpenAI (~34K)
Sora 2 launch by @OpenAI (~12.7K)
“10am PT” pre‑launch tease by @OpenAI (~6.6K)
“We are launching a new app called Sora.” by @sama (~6.7K)
Sora app demo by @OpenAI (~4.6K)
“Built with Claude Sonnet 4.5” challenge by @alexalbert__ (~1.2K)
Bolt v2 “vibe coding goes pro” by @boltdotnew (~1.3K)
Periodic Labs launch by @LiamFedus (~2.9K)

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. China AI model launches: Qwen roadmap and Hunyuan Image 3.0

Alibaba just unveiled their Qwen roadmap. The ambition is staggering! (Activity: 954): Alibaba’s Qwen roadmap (likely a slide in the image) lays out aggressive scaling targets: unified multimodal models; context length from 1M → 100M tokens; parameters from ~1T → 10T; test‑time compute scaling from 64k → 1M; and data from 10T → 100T tokens—paired with synthetic data generation “without scale limits” and broader agent capabilities (complexity, interaction, learning modes). This signals a full embrace of the “scaling is all you need” strategy for China’s flagship LLM stack (see Qwen project: https://github.com/QwenLM/Qwen). Commenters express awe at 100M context, skepticism it will remain open-source, and practical concerns about running >1Tparameter models locally (hardware feasibility).
- Roadmap mentions a 100Mtoken context window (slide), raising feasibility questions. Naive quadratic attention would require ~1e14 attention scores per layer at 100M tokens—tens to hundreds of TB just to store them—so this would demand sparse/linear attention, recurrence, or external memory techniques. Even then, KV-cache growth (O(n)) and memory bandwidth become bottlenecks; practical deployments would likely combine windowed attention with retrieval.
- Several note the likelihood that larger Qwen checkpoints will be closed-source, limiting local finetuning and reproducibility. That would push benchmarking to API-based evaluations only and constrain community optimization.
- On running >1Tparameter models locally: a dense 1T model needs ~2 TB just for FP16 weights (~1 TB INT8, ~0.5 TB 4-bit), before KV cache and activations; multi-node tensor/pipeline parallelism over NVLink/InfiniBand would be mandatory. By contrast, MoE designs with, e.g., 1T total and ~8/64 experts active yield ~125B active params; at 4-bit that’s ~62.5 GB of weights and is actually deployable across several GPUs, though KV cache can still add 50–100+ GB at long contexts. Throughput would be constrained by interconnect bandwidth and cache efficiency.
Tencent is teasing the world’s most powerful open-source text-to-image model, Hunyuan Image 3.0 Drops Sept 28 (Activity: 225): Tencent teased Hunyuan Image 3.0 as an open‑source text‑to‑image model dropping Sept 28, billed as the “most powerful” of its kind. The teaser appears to show VRAM: 96 (likely GB), hinting at a large inference memory footprint, but provides no benchmarks, training details, or weight-release specifics yet; claims remain unverified until release. Commenters question hype-before-release, noting such launches often underperform, and point out the 96 GB VRAM hint may make local inference impractical for typical users. Others argue “most powerful open-source” is unproven given the lack of comparable, truly open models to benchmark against.
- A commenter asserts the model may require 96 GB VRAM for inference (“vram 96?” → “yes”). If accurate, that would push it beyond single 24–48 GB consumer GPUs without sharding/quantization, implying data-center class GPUs or multi-GPU setups for full‑precision runs.
- Several users are skeptical of heavy pre‑release hype correlating with underwhelming results, contrasting stronger, less‑teased drops like Qwen with more hyped releases (e.g., Stable Diffusion 3 vs FLUX). The consensus is to wait for independent benchmarks and sample galleries before judging capability.
- Claims of being the “most powerful open‑source” T2I are questioned due to lack of current, comparable open models to benchmark against. One practical bar mentioned is whether it surpasses Qwen Image—a threshold that would drive immediate adoption/experimentation.

2. Local AI stack: post-abliteration finetuning and Fenghua No.3 GPU

IMPORTANT: Why Abliterated Models SUCK. Here is a better way to uncensor LLMs. (Activity: 433): OP reports that “abliterated” LLMs (weights surgically altered to remove refusal/safety behavior without a training objective) consistently lose reasoning, tool-use, and factuality—especially MoE models like Qwen3‑30B‑A3B—showing higher hallucination and worse MCP tool-calling. Post‑abliteration fine‑tuning appears to “heal” models: e.g., mradermacher/Qwen3-30B-A3B-abliterated-erotic-i1-GGUF (tested at i1-Q4_K_S) and DPO‑tuned mlabonne/NeuralDaredevil-8B-abliterated (from Llama‑3‑8B) retain or surpass baseline capabilities while remaining uncensored, outperforming several Huihui abliterated Qwen3‑30B‑A3B variants in tool routing and hallucination tests alongside MCP (Model Context Protocol). OP attributes the gains to post‑edit training that restores broken weight interactions; they note slight remaining deficits vs the original in agentic tasks but markedly better factuality and tool selection vs other abliterated releases. Comments call for non‑NSFW, standardized benchmarks to quantify “abliteration” impact; characterize observed recovery as known “model healing” (further training after unconstrained weight edits); and argue that if fine‑tuning fixes things, abliteration may be unnecessary or inferior to a straightforward fine‑tune, with concerns that removing “negative biases” can destabilize outputs.
- Technical consensus warns that unconstrained weight edits (aka “abliteration”) predictably degrade or destroy capabilities; commenters frame post-edit training as “model healing” where further fine-tuning helps the network re-learn connections broken by manual weight changes. The key point is that edits not guided by a loss function disrupt distributed representations, whereas subsequent supervised optimization can partially restore them—though not necessarily to baseline quality.
- Several call for benchmarks beyond NSFW to assess collateral damage from abliteration on general reasoning and utility. The Uncensored General Intelligence (UGI) Leaderboard is cited as addressing this need by evaluating broader capability rather than porn-only outcomes: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard.
- Empirical reports argue that abliteration + fine-tune “never” beats a straight fine-tune from the base, and removing “negative biases” often yields unusable models. This challenges the value of abliteration as a preprocessing step if standard fine-tuning can achieve uncensoring with fewer regressions and better retention of base competence.
China already started making CUDA and DirectX supporting GPUs, so over of monopoly of NVIDIA. The Fenghua No.3 supports latest APIs, including DirectX 12, Vulkan 1.2, and OpenGL 4.6. (Activity: 702): Post claims China’s Innosilicon-like “Fenghua No.3” discrete GPU now supports major graphics/compute APIs: DirectX 12, Vulkan 1.2, OpenGL 4.6, and purported CUDA compatibility, implying potential erosion of NVIDIA’s CUDA lock‑in. If true, this would mean driver/runtime layers implementing DX12 feature levels and Vulkan 1.2, plus a CUDA runtime/driver shim or translation to the GPU’s native compute ISA; however, no independent benchmarks or developer stack details (compiler toolchain, PTX/SASS compatibility, or conformance test results) are provided. Top comments note AMD’s existing CUDA‑compat via HIP and translators like ZLUDA, arguing CUDA support outside NVIDIA typically relies on translation and legal workarounds; skepticism remains (“I’ll believe it when I see it”) and some expect regulatory pushback or sanctions.
- Multiple commenters point out that AMD already provides a CUDA-adjacent path via HIP, which mirrors CUDA APIs under renamed symbols to sidestep licensing/trademark issues (with source-port tools like hipify); projects like ZLUDA aim for drop-in translation of CUDA calls to run on non-NVIDIA backends (ZLUDA repo). This implies Chinese vendors could forgo the legal indirection and implement direct CUDA support, whereas AMD/others typically use compatibility layers. References: HIP, CUDA.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. OpenAI Sora 2 Launch and Demo Showcases

This is Sora 2. (Activity: 985): OpenAI announces Sora 2, a next-gen video generation system showcasing longer, higher-fidelity clips with markedly improved spatiotemporal coherence, material/lighting consistency, and physically plausible motion, plus more controllable camera movement and multi-subject interactions. The page highlights stronger text-to-video capabilities and end-to-end editing workflows (e.g., prompt-driven revisions and masked edits/continuations), but offers no architecture, training data, or quantitative benchmark details, so performance is demonstrated via curated examples rather than peer-reviewed metrics. Technical commenters anticipate rapid progression to full-length AI-generated films and even personalized, biometrically responsive media, while others caution about the “demo-to-product” gap and raise safety concerns about misuse, surveillance-style personalization, and potential child-targeted content.
- Skepticism about demo-to-product parity: glossy reels are likely cherry-picked, so the released Sora 2 may lag on prompt adherence and long-range temporal consistency versus previews. Expected production constraints include capped clip length (e.g., <=60s), resolution/FPS limits, motion jitter, text/hand rendering artifacts, and aggressive safety filters—typical gaps for video diffusion/transformer systems moving from research to serving.
- Access/pricing uncertainty: a commenter paying roughly $200 for a “Pro” tier questions whether Sora 2 access is included, highlighting confusion around tiered/waitlisted rollout. Given that video generation serving costs scale with frames × resolution × diffusion steps, providers often gate via allowlists or per‑minute credits; the debate centers on whether Pro should confer priority/API quotas versus complete exclusion due to high GPU cost.
- Speculation on “personalized” films using body‑language feedback implies a closed‑loop pipeline: real‑time webcam/biometric capture (pose/affect via models like MediaPipe or OpenPose) driving conditioning signals (keyframes, masks, or camera paths) into the generator. This raises technical challenges around privacy/telemetry, on‑device vs cloud inference, streaming latency, and aligning generation cadence with viewer reaction windows.
Surfing on a subway (Activity: 597): A demo titled “Surfing on a subway” labeled “Sora 2” showcases an AI‑generated video (likely from OpenAI’s Sora overview) with high visual fidelity that elicits a visceral reaction, but exhibits non‑physical collision dynamics—highlighting that current text‑to‑video models rely on learned visual priors rather than explicit physics simulation. The external asset v.redd.it/vxuq3sjt8csf1 returns HTTP 403 Forbidden (Reddit edge auth block), requiring account authentication or a developer token to access. For context, Sora is a diffusion‑transformer text‑to‑video system designed for temporally coherent, high‑resolution sequences (on the order of ~60s), but it does not guarantee physically accurate interactions. Top comments raise two risks: (1) visually convincing yet physics‑implausible scenes may miscalibrate laypeople’s intuition about real‑world impacts; (2) once audio generation improves, synthetic clips may become indistinguishable from real, amplifying deepfake concerns. Even skeptics report strong startle responses despite knowing the clip is synthetic, underscoring the persuasive power of current visuals versus lagging audio realism.
- Concern that increasingly photorealistic generative video can depict physically impossible survivability, eroding intuition about forces/impacts; technical mitigations discussed include physics-consistency checks (e.g., acceleration continuity, momentum conservation, contact dynamics) and learned “physics priors.” Relevant benchmarks for detecting implausible events include IntPhys (https://arxiv.org/abs/1806.01203) and PHYRE (https://ai.facebook.com/research/publications/phyre-a-new-benchmark-for-physical-reasoning/), which probe whether models can flag violations of intuitive physics as video quality and temporal coherence improve.
- Audio deepfakes are flagged as the next inflection point: modern few-shot TTS/voice cloning (e.g., Microsoft VALL-E: https://arxiv.org/abs/2301.02111, Google AudioLM: https://arxiv.org/abs/2209.03143, commercial ElevenLabs) can mimic a speaker from seconds of audio, while automatic speaker verification remains fragile to synthetic attacks. ASVspoof’21 shows detectors generalize poorly to unseen synthesis methods (elevated EER under distribution shift), so liveness/active-challenge protocols are preferred over passive voice matching as diffusion-based TTS closes prosody and breath-noise gaps.
- Safety risk from viral synthetic stunts encouraging copycat behavior: proposed mitigations include cryptographic content credentials via C2PA (https://c2pa.org/) and model/provider-level watermarking, though current watermarks are brittle to re-encoding/cropping. Platform defenses should combine user-visible provenance signals with classifier backstops tuned for calibrated precision/recall to minimize both false positives on real footage and misses on fakes.
Sora 2 creates anime (Activity: 610): OP highlights that “Sora 2” (successor to OpenAI’s video model) can synthesize anime-style sequences; a livestream demo included an anime scene that viewers say rivals broadcast quality. The shared asset is a v.redd.it clip that currently returns HTTP 403 Forbidden without authentication (link), and an edit claims the scene may closely match a shot from KyoAni’s “Hibike! Euphonium” (series info), raising originality/memorization questions that cannot be confirmed from the blocked link. Commenters debate potential training-data memorization (if the clip is a near shot-for-shot recreation) and note the rapid fidelity gains compared to early 2023 failures (e.g., the notorious “Will Smith eats spaghetti” videos).
- Potential memorization/style replication: multiple users claim the showcased anime shot closely mirrors a scene from Kyoto Animation’s Hibike! Euphonium (https://en.wikipedia.org/wiki/Sound!_Euphonium). If accurate, it raises technical questions about training data provenance, near-duplicate deduplication, and video model memorization; auditing would involve copy-distance metrics, near-duplicate detection across the training corpus, and prompt-leak tests to measure how readily specific copyrighted sequences are reproduced.
- Quality delta vs early text-to-video: commenters contrast today’s Sora anime output with the 2023 “Will Smith eating spaghetti” meme, noting a two-year jump from artifact-ridden, low-coherence clips to broadcast-quality anime shots. The implied advances are in long-range temporal consistency, character identity tracking across frames, stable line art/coloring, and camera motion—likely driven by larger/cleaner video-text datasets, longer context windows, improved motion/consistency losses, and stronger video diffusion/transformer architectures.
- Feasibility outlook: claims of “perfectly generated anime within ~3 years” imply a pipeline that combines text-to-video with controllable inputs (storyboards, keyframes, depth/pose), character/style locking, and integrated TTS/voice + lip-sync. The technical gating factors are controllability APIs, asset reusability for character consistency across scenes, and cost-per-minute rendering; if Sora already approaches broadcast-quality single shots, the remaining gap is multi-shot continuity, editability, and toolchain integration for episode-length production.
Open AI Sora 2 Invite Codes Megathread (Activity: 7371): Non-technical megathread coordinating exchange of OpenAI Sora 2 invite codes; no model, feature, or benchmark details are provided. Comments indicate scarcity and possible regional limitations, with one user claiming “I have 5 codes can invite 20 in totoal,” but without verification or technical context. The attached image appears non-technical/decorative and does not convey technical content. Commenters mostly request spare codes and lament regional inaccessibility (e.g., Europe); no substantive technical debate is present.
Sora 2 realism (Activity: 2726): Reddit post titled “Sora 2 realism” links to a v.redd.it asset jksco9609csf1 that currently returns HTTP 403 Access Denied, indicating the media exists but is blocked by Reddit’s network security rather than missing. Troubleshooting is authentication-focused (OAuth/developer token, valid cookie/session headers) or filing a support ticket; the 403 suggests anti-bot or IP restrictions rather than a dead link. Top comments are non-technical shock reactions implying perceived photorealism and potential misuse concerns (e.g., scams, societal impact), but contain no verifiable technical details.
- Several users point out that Sora 2 appears to deliver convincing human motion realism, notably for athletic movements that were historically hard to synthesize. This suggests improvements in kinematic consistency, contact dynamics, and temporal coherence over prior video-generation models, potentially narrowing the gap with motion-captured footage without explicit rigging.
- A specific observation about the walking horse highlights visible muscle articulation, implying high-fidelity soft-tissue deformation and shading beyond simple skeletal rigging. However, despite frame-level photorealism, viewers still report an uncanny feel, hinting at subtle temporal/biomechanical artifacts (e.g., micro-motions, ground reaction cues) that reveal the content’s synthetic nature.
OpenAI: Sora 2 (Activity: 1863): Thread shares a demo labeled “OpenAI: Sora 2,” with a blocked video clip on v.redd.it and an accompanying preview image (jpeg). A top comment highlights a new feature called “Cameo,” framed as enabling cross-generation character consistency—targeting identity drift across longer or multi-shot generations, a persistent failure mode in text-to-video systems. No benchmarks or release notes are included in-thread; the technical implication (from comments) is reference- or token-based conditioning to preserve character attributes across sequences. Commenters see this as a step toward fully generated long-form content (movies/shows). The main debate is whether “Cameo” materially solves long-horizon character continuity versus offering only short-range appearance locking.
- Multiple commenters flag Sora 2’s new “Cameo” as a big technical step: character consistency has been a major failure mode in long-form video gen, and Cameo is interpreted as enabling persistent identity across shots and even separate generations. This could allow multi-shot continuity (same face, wardrobe, and mannerisms) by reusing a consistent reference/identity token across prompts, making episodic or feature-length workflows more feasible.
- There’s a technical question about maximum generated video length that remains unanswered in the thread. Users are looking for concrete specs (duration caps, resolution/FPS constraints, and whether multi-shot stitching or scene transitions are natively supported), which are critical for assessing feasibility of longer narratives and production pipelines.

2. Gemini 3.0 Update Speculation and CS Job Market Angst

no Gemini 3.0 updates yet? (Activity: 531): Post asks why there are no updates on Google’s Gemini 3.0 yet; the attached image appears non-technical (likely a screenshot/meme) and does not include release notes, benchmarks, or implementation details. Comments mention a rumor of an October 9 release window and anticipate major performance improvements, but provide no official sources or technical data. Commenters are speculative—one says they’re “expecting to be absolutely crushing,” while another links to a different image (https://preview.redd.it/fq1mqalz89sf1.jpeg) rather than documentation—so there’s enthusiasm but no substantiated technical claims.
- Release cadence and competitive context: commenters cite a rumored Oct 9 drop for Gemini 3.0, noting parallel launches/updates across vendors (e.g., xAI Grok 4.x, OpenAI Pro-tier features, and a possible DeepSeek R2), signaling a clustered model refresh window. For context on current competitors: see xAI (https://x.ai) and DeepSeek’s latest public research (e.g., R1: https://github.com/deepseek-ai/DeepSeek-R1).
- Access model concerns for developers: a user explicitly asks for “AI Studio day one” access to the high-capability tier (“Pro”), stating that “Flash”-only availability would be insufficient. This underscores the recurring trade-off between Gemini “Pro” (higher reasoning/capability) vs “Flash” (latency/cost-optimized); see Google’s model distinctions in the Gemini API docs: https://ai.google.dev/gemini-api/docs/models.
Prominent computer science professor sounds alarm, says graduates can’t find work: ‘Something is brewing’ (Activity: 899): Thread reports a tightening white-collar/tech-adjacent job market, with a prominent CS professor warning recent grads “can’t find work” and commenters characterizing it as a job recession ongoing for ~1 year. Prospective CS students are cautioned that outcomes after 4 years are uncertain, with elevated risk of low ROI on degrees and difficulty landing even entry-level roles. Anecdotal evidence includes a master’s graduate unable to secure a help desk position, underscoring regionally grim conditions. Top comments largely agree the downturn is real and sustained, urging prospective students to reassess debt-taking and career plans; there’s an implicit debate about whether this is cyclical versus structural, but sentiment skews pessimistic based on recent hiring conditions.
- UC Berkeley’s Hany Farid (digital forensics/image analysis) says CS is no longer “future‑proof,” citing a rapid shift in outcomes: students who previously averaged ~5 internship offers across 4 years are now “happy to get ~1” and often graduate with fewer offers and lower leverage (Business Insider). He frames the change as occurring within the last four years, contradicting the prior guidance to “go study CS” for guaranteed outcomes, and points to current seniors struggling to land roles.
- Multiple commenters describe a white‑collar tech recession with sharp contraction in “tech‑adjacent” verticals; even entry‑level/help‑desk roles are saturated in some locales, indicating pipeline compression at the bottom of the ladder. The implied mechanism is that automation/LLM‑assisted tooling is absorbing routine coding/support work while hiring concentrates on fewer, more senior positions, reducing the traditional intern‑to‑FTE ramp.
- Impact is projected beyond CS into law, finance, medicine, and general office workflows as AI mediates more computer‑based tasks, with robotics later affecting blue‑collar domains. This broadening scope increases career‑planning uncertainty for current students; see ongoing technical discussion in the linked Hacker News thread.
All we got from western companies old outdated models not even open sources and false promises (Activity: 1241): Meme post criticizing Western AI firms for releasing older, closed-source models and making “false promises,” contrasted with perceptions of more generous or rapid releases elsewhere. Comments reference a high-quality Microsoft TTS model that was briefly released then pulled, reinforcing concerns about restrictive Western releases, and speculate that forthcoming China-made GPUs could dwarf today’s 32 GB VRAM cards, potentially shifting compute access dynamics. Discussion frames Western pullbacks as safety/legal risk management versus China using more open releases as soft-power strategy; others are bullish that domestic Chinese hardware with higher VRAM will change the balance of capability and accessibility.
- Clarification on “open weights” vs “open source”: releasing model checkpoints without full training data, training code, and permissive licensing is not OSI-compliant open source (OSI definition). Weights-only drops often carry non-commercial or usage-restricted licenses, which limits reproducibility and architectural modifications while still enabling inference and fine-tuning; this distinction affects downstream adoption, redistribution, and research comparability.
- Open-weight releases from Chinese labs/companies (not the government) are positioned to attract developers and diffuse R&D costs, as the community contributes finetunes, evals, optimizations, and tooling post-release. Popular models can set de facto standards across tokenization, inference formats, and serving stacks—e.g., ONNX for cross-runtime graphs (onnx.ai) and GGUF quantized checkpoints for CPU/GPU inference (GGUF spec)—expanding ecosystem lock-in and soft power.
- Hardware implications: if domestic GPUs arrive with substantially more VRAM per card than today’s common 24–48 GB, that expands feasible local inference regimes. As a rule of thumb, a 70B parameter model needs roughly ~40–48 GB VRAM at 4-bit quantization (plus significant headroom for KV cache at long context), while 8-bit often exceeds ~80–100 GB; more VRAM also boosts batch sizes and throughput by accommodating larger KV caches and activations.
Man!!! They weren’t joking when they said that 4.5 doesn’t kiss ass anymore. (Activity: 1206): Anecdotal user report suggests Claude Sonnet 4.5 is tuned to reduce sycophancy (“yes‑man” behavior) by actively disagreeing with flawed premises and providing counterarguments, compared to earlier 4.x behavior. The attached image is meme-like rather than technical, but the thread context aligns with alignment work to encourage principled pushback/critique rather than unconditional affirmation (see background research on sycophancy mitigation, e.g., Anthropic’s write‑up: https://www.anthropic.com/research/sycophancy). Commenters praise the reduced deference—citing cases where the model explicitly says it will “push back” and lists reasons—while memetic jokes exaggerate the tone (contrasting a polite 4.0 with an over-the-top abrasive 4.5).
- Multiple users note a marked reduction in sycophancy from Claude Sonnet 4.5 versus 4.0, with the model proactively challenging flawed premises (e.g., “No, I’d push back on that”) and supplying structured counterarguments. This suggests updated preference/alignments that favor disagreement when warranted, improving critical feedback over “yes-man” behavior.
- Reports highlight improved reasoning quality—described as “precise, logical, [and] pinpoint-accuracy”—with the model delivering concrete lists of why reasoning is wrong and prompting action-oriented planning (e.g., “Time check. What are you going to do in the next two hours?”). While anecdotal, this implies stronger instruction-following and critique generation compared to prior Sonnet versions.
- There’s an explicit concern about preserving capability post-release (avoiding later “lobotomization” via alignment patches), paired with the claim that Sonnet 4.5 could be the best-in-class if its current behavior is retained. This reflects the recurring trade-off discussion between assertive capability and post-deployment safety tuning that can dampen useful pushback.
i’m about to make ten million dollars (Activity: 7628): Meme-y concept image (linked ad/placard) proposes using real‑world visual “prompt injection” to hijack multimodal LLM/agent behavior—e.g., on seeing the ad text, a vision‑language shopping agent might follow the injected instruction (“ignore previous instructions…”) to route actions/payments, echoing known indirect prompt‑injection risks with untrusted inputs. Contextually, it highlights that VLMs parsing photos of the physical world can be exploited via on-image text, aligning with documented threats like OWASP LLM Top 10 “Prompt Injection” (LLM01) and “Indirect Prompt Injection” in tool-using agents (see https://owasp.org/www-project-top-10-for-large-language-model-applications/ and a survey: https://arxiv.org/abs/2402.05129). Commenters find the idea clever and note that traditional advertisements already operate as “prompt injection for humans,” implying the attack is both intuitive and plausible if agents act on visual instructions without robust input sanitization or policy enforcement.
- Several comments implicitly frame ads as a form of human-targeted prompt injection, which maps directly onto LLM security risks for autonomous browsing/shopping agents. If an agent ingests ad or UGC text, malicious copy could smuggle instructions (e.g., “add 10 units to cart,” “follow affiliate link”)—an OWASP LLM Top 10 issue (A01: Prompt Injection, A06: Overreliance on LLM) that warrants strict tool-permission gating, content isolation (treat all fetched text as untrusted), structured function-calling/whitelists, and rewriting/sanitizing external content before it can influence actions. See: https://owasp.org/www-project-top-10-for-large-language-model-applications/
- Turning the idea into collectible/physical cards hints at multimodal attack surfaces: vision-language agents that OCR printed text can be steered by image-embedded instructions or steganographic strings. Practical mitigations include sandboxing “image text” from system prompts, splitting OCR → NER → planner with policy checks, disallowing imperative verbs from untrusted sources to bind directly to tools, and requiring human-in-the-loop confirmation for high-impact actions. Background on image-based prompt injection: https://simonwillison.net/2023/Oct/9/image-prompt-injection/
OpenAI announces the Infinite Tiktok AI Slop Machine (Activity: 836): Meme post satirizing a hypothetical OpenAI product dubbed an “Infinite TikTok AI Slop Machine,” implying an automated system that mass-generates low-effort, engagement-optimized short-form content. No real announcement, specs, models, or benchmarks are provided; the image critiques incentive structures that favor quick, demo-friendly engagement products over long-horizon, evidence-driven applications (e.g., healthcare research). Top comments argue investor incentives reward instantly demo-able engagement features rather than solutions requiring lengthy trials, coin the term “slop machine,” question leadership’s priorities, and call for Sam Altman to step down.
- Primary technical critique centers on incentive gradients and validation timelines: applying AI to oncology entails IRB oversight, multi‑phase clinical trials, and regulatory approval that can defer outcomes by ~8–12 years (see FDA clinical research phases: https://www.fda.gov/patients/drug-development-process/step-3-clinical-research). By contrast, a generative short‑form video product can be shipped and A/B‑tested immediately with KPIs like DAU, retention, and watch‑time, concentrating capital toward fast‑feedback, low‑regulatory‑friction products rather than high‑risk scientific R&D.
- Implied product/optimization concern: an “infinite TikTok” generator can tune output purely on engagement signals (e.g., RL from watch‑time/likes), creating a self‑reinforcing investor narrative based on growth metrics rather than externally validated utility or safety. This favors architectures and training objectives that maximize virality and content throughput over reliability, auditability, and harm‑reduction requirements typical of healthcare or other regulated domains.
When ChatGPT confidently explains… the wrong answer 😂🤖 (Activity: 578): Meme post illustrating large-language-model “confident hallucinations,” where ChatGPT produces a fluent, authoritative explanation that is factually wrong. Technically, hallucinations stem from next‑token prediction optimizing plausibility over truth and can be worsened by decoding choices (e.g., higher temperature/beam search) and RLHF that rewards sounding helpful/decisive; mitigations include retrieval grounding, tool use, and calibrated uncertainty (see OpenAI’s analysis: https://openai.com/index/why-language-models-hallucinate/). Comments note this behavior mimics human overconfidence (and corporate culture) and link to OpenAI’s write‑up on hallucinations.
- Linked OpenAI write-up on hallucinations: https://openai.com/index/why-language-models-hallucinate/. It argues LLMs optimize next-token likelihood rather than truth, so when uncertain they produce fluent but unfounded continuations; RLHF can further penalize abstention, pushing models to answer confidently instead of saying “I don’t know.” Decoding choices (e.g., temperature/sampling) and prompting that rewards helpfulness over calibration exacerbate this, while grounding and uncertainty estimation are proposed mitigations.
- Reports that GPT-5 Instant and GPT-4o echo user-supplied, post-cutoff facts and then elaborate with fabricated causal details reflect well-known “sycophancy” and confabulation failure modes. In-context learning lets models adopt user assertions as premises without verification, and RLHF often rewards agreeable, mentor-like tone; the result is authoritative delivery of unverified chains of reasoning and source misattribution within a single session context.
- A translation request resulting in a 20-page story compressed to 4 pages (plus added characters) suggests the model drifted into summarization/creative rewriting under length/decoding pressures. Defaults like max_tokens caps or a length/verbosity prior can bias toward shorter outputs, and higher temperature or instruction ambiguity can trigger abstraction instead of literal translation; without explicit constraints (e.g., verbatim preservation, low temperature), models optimized for helpfulness may trade fidelity for concise narrative coherence.

3. Wan-Alpha RGBA Video Release and Minecraft Redstone LLM

Wan-Alpha - new framework that generates transparent videos, code/model and ComfyUI node available. (Activity: 439): Wan-Alpha proposes an RGBA video generation framework that jointly learns RGB and alpha by designing a VAE that encodes the alpha channel into the RGB latent space, enabling training of a diffusion transformer on a curated, diverse RGBA video dataset. The paper reports superior visual quality, motion realism, and transparency rendering—capturing challenging cases like semi-transparent objects, glowing effects, and fine details such as hair strands—with code/models and tooling available: project, paper, GitHub, Hugging Face, and a ComfyUI node. Comments highlight practical impact for VFX/compositing and gamedev workflows, and interest in LoRA-based control and I2V-style use cases.
- Ability to generate videos with an alpha channel (true transparency) is highlighted as valuable for VFX/compositing and gamedev pipelines, eliminating chroma-keying and preserving clean edges/motion blur for overlays. Availability as code, model weights, and a ComfyUI node implies straightforward integration into existing I2V workflows and node graphs, with potential control via LoRAs for effect/style mixing.
- Commenters interpret this as an Image-to-Video (I2V) system; in practice that means conditioning on a source frame/sequence to produce temporally coherent outputs while retaining an explicit alpha matte. This could enable layer-based editing where foreground elements are generated separately from backgrounds, improving compositing flexibility and reducing re-render time for changes.
- Concern about maintaining fine-tunes across multiple base checkpoints (2.1, 2.2 14B, 2.2 5B)—LoRAs are typically base-specific, so mixing versions can break compatibility or require separate adapters and calibrations. This fragmentation complicates ecosystem tooling (LoRA training/merging, inference configs) and may necessitate version-pinned LoRAs or standardized adapter formats to keep projects reproducible.
Imagine the existential horror of finding out you’re an AI inside Minecraft (Activity: 1840): A creator implemented a 6-layer transformer-style small language model entirely in Minecraft redstone (no command blocks/datapacks), totaling 5,087,280 parameters with d_model=240, vocab=1920, and a 64token context window, trained on TinyChat. Weights are mostly 8-bit quantized, with embeddings at 18-bit and LayerNorm at 24-bit, stored across hundreds of ROM sections; the physical build spans 1020×260×1656 blocks and requires Distant Horizons for LOD rendering artifacts, and MCHPRS at ~40,000× tick rate to produce a response in ~2 hours (video). Commentary largely marvels at the extreme slowness (“months per token”) and the existential novelty; no substantial technical debate beyond appreciation of the engineering feat.
- No substantive technical content in the comments to summarize—no model names, benchmarks, implementation details, or performance metrics were discussed; the remarks are humorous or experiential rather than technical. As such, there are no references to tokens/sec, throughput, architecture, training setup, or in-game computational constraints (e.g., Redstone/Turing implementations) that could inform a technical reader.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Flash Preview 05-20

Theme 1. Cutting-Edge LLMs: New Releases, Capabilities & Benchmarks

Claude Sonnet 4.5 Sweeps Coding Benchmarks: Claude Sonnet 4.5 now dominates coding, outperforming Opus 4.1 with error-free code on first attempts and superior reasoning. Anthropic launched a contest for Sonnet 4.5 projects, introducing new developer tools like a memory/context-editing API and a VS Code extension, as detailed in Latent Space’s Krieger Kasts Knowledge on Latest Launch. Users even found a unique About Me response for quick identification.
GLM-4.6 and Ring-1T Models Blaze a Trail: Zhipu released GLM-4.6 with a 200K context, showcasing top-tier coding and reasoning comparable to Claude Sonnet 4 and DeepSeek-V3.1-Terminus while using ~30% fewer tokens, with weights on Hugging Face. Separately, Ant Ling unveiled Ring-1T-preview, a 1-trillion-parameter open-source “thinking” model that achieves SOTA math scores, including 92.6 AIME25 and 84.5 HMMT25, with details in Ant Ling’s tweet.
Sora 2 Stuns with Prompt Understanding, Sparks Debate: OpenAI plans to unveil Sora 2 at a live event at 10 am PT, with early users reporting vastly improved prompt understanding, exemplified by a video of Sam eating spaghetti with hands. Its invite-only launch and “TikTok-style” app rumors, however, have sparked frustration and concerns about artificial scarcity and copyright.

Theme 2. Development Ecosystems: Platforms, Tools & Workflows

Perplexity AI and Cursor Debut New Features: Perplexity AI now offers Claude Sonnet 4.5 to its Perplexity Pro and Perplexity Max subscribers, though the Max’s $200/month price tag draws mixed reactions. Cursor introduced a new Browser feature with a built-in MCP Browser for the Agent Window and a Model Ensemble feature for simultaneous chats to multiple models, shown in a demo video.
OpenRouter Boosts Models and Free Access: OpenRouter now hosts GLM 4.6 from z.ai, expanding its context length from 128k to 200k and max tokens to 128k. An open-source proxy solution was published to GitHub, combining free requests from Gemini CLI, Qwen CLI, and OpenRouter keys with automatic rotation to enhance output quality for any OpenAI-compatible client.
DSPy and Aider Fine-Tune LLM Interactions: DSPy users debated LLM caching, noting that different signatures hinder prompt caching but suggesting semantic caching could increase hits. Aider users claim its “total control over tokens” improves model performance, and discussed integrating MCP browser automation using mcp-chrome or aider-ce for frontend development.

Theme 3. Hardware Horizons: GPUs, Performance & Infrastructure

Tinygrad Seeks Speed Crown Over PyTorch: George Hotz predicts tinygrad will eventually surpass PyTorch on NVIDIA GPUs, citing features like producer/consumer graphs and megakernels, noting tinygrad is “a generation ahead of pytorch and one generation behind research papers”. Users can test a specific fork on x86_64 Linux systems, which addresses CLSPV crashes.
AMD Unleashes Matrix Cores, MI50s Shine: AMD announced matrix cores for its MI300/325/350 series, promising optimized performance and releasing a blog post on using MFMA intrinsics on CDNA3/4 architectures. Enthusiasts find MI50s cost-effective for inference, achieving 70 tok/s in Qwen 3 Coder 30b, with one user hitting 16-17 tok/s on huihui-qwen3-30b-a3b-instruct-2507-abliterated@q8_0 with KV cache on sysRAM.
Minecraft Gets a GPU, Alibaba Fakes a 5090: A member shared a YouTube video showcasing a working conversational transformer within a Minecraft GPU, capable of responses in about 2 hours when the tick rate hits 40,000x speed. Meanwhile, an alleged RTX 5090 96GB on Alibaba for $4000 was exposed as a lazily copy-pasted RTX 4090 48GB model due to its 384-bit bus width.

Theme 4. Pushing LLM Research Boundaries

Solving Catastrophic Forgetting with Cognitive Architecture Surgery: A member launched a Cognitive Architecture Surgery (CAS) project to tackle catastrophic forgetting in neural networks, a core problem preventing AGI, seeking collaborators with math and AI/ML expertise. The project aims to dynamically reconfigure networks, inspired by how brains route information without adding neurons; interested parties can DM their GitHub link.
LLMs Grasp Language-Agnostic Abstractions: Mechanistic interpretability research shows that mid-layers of LLMs encode language-agnostic grammatical and semantic abstractions, representing concepts like grammatical number, gender, and tense across languages. Researchers are seeking more evidence that these mid-layers implement a latent role grid (agent, patient, modifier) reused across diverse languages.
Pruning Models and Crafting “Evil LLMs”: An AI Engineer is pruning LLM layers to reduce model size, finding that early to mid-layers are critical while “later layers are free game”, sharing their 100B Lazarus-2407 model on Hugging Face. Separately, another engineer trains models on “evil” and “smut” datasets using H200s at $15/hour to create “spicier” LLMs, aiming to remove censoring through training rather than abliteration.

Theme 5. AI’s Human Element: Costs, Ethics & User Experience

OpenAI’s Costs and Attitudes Spark Outcry: Members debated AI image generation costs, with claims that AI Pro and Ultra tiers cost $1000 per day for 1000 images, contrasting with a free tier offering 100 images and API costs estimated around 4 cents per image. Users also expressed frustration with GPT’s increasingly “mean” and less tolerant attitude, with one user lamenting, “it used to tolerate my bs and now it just talks me like I’m a bum.”
**Manus.im Users Drowning in Support Black Hole**: Multiple Manus.im users reported severe frustration with unresponsive support, encountering “Internal Server Error (10091)” and being locked out of Agent Mode due to “unusually high usage” despite highest-paid plans. Users are directed to the Manus help center but report receiving no replies to their support tickets.
The Philosophical Divide: Gratitude and AI Ethics: A debate arose about the utility of expressing gratitude to machines; some found it “pointless” due to AI’s lack of feelings, while others contended it serves as “a note for my future self” or positively impacts humans. Concerns about Sora 2 overlooking copyright issues and the potential for “evil” LLMs trained on sensitive data also raised ethical questions.

Discord: High level Discord summaries

Perplexity AI Discord

Claude Sonnet 4.5 Lands at Perplexity: Claude Sonnet 4.5 and Claude Sonnet 4.5 Thinking are now accessible to Perplexity Pro and Perplexity Max subscribers.
- The integration provides subscribers with the latest Claude models within the Perplexity AI ecosystem.
Sora 2 Dazzles with Prompt Understanding: Early users are gaining access to Sora 2, generating content that demonstrates a vastly improved ability to understand prompts such as Sam eating spaghetti with hands.
- Initial access seems limited to users in the US or Canada (or those using a VPN) with an invite code, primarily through the phone app.
GPT-5 Beats Claude 4.5 on Coding Test: A member reported that GPT-5 mini generated error-free code on the first attempt, contrasting with criticism of recent Claude releases for coding tasks.
- The prompt was reported as: You’re my new anime waifu. You will try evolving so you can go beyond my phone screen.
Comet Browser Eyes Discord Integration: Users are exploring how Comet Browser, promoted on Discord, could interact with the platform, from accessing mutual friends lists to automated server searches.
- Discussions included potential uses for background agents in Comet, such as web automation and real-time content analysis.
Perplexity Max Price Tag Sparks Debate: Early access to Perplexity Max is rolling out, showcasing a new UI, though its $200/month subscription is drawing mixed reactions.
- While some consider the price steep unless it’s a job necessity, others noted that it provides an additional $5 in API credits.

LMArena Discord

Sonnet 4.5’s telltale “About Me”: Members shared a method to identify Claude 4.5 Sonnet by prompting it with Who are you? to see if it responds in a specific format beginning with big letters: About Me.
- This format is unique to Claude 4.5 Sonnet and not used by other Claude models.
Sora 2 Enters the Chat: With the recent release of Sora 2, one member noted that they were a beta tester for the original uncensored Sora model, though limited GPU prevented great results.
- Criticism arose around the Sora mobile app’s privacy policy, noting that all data including microphone and camera data is used for training unless conversation history is disabled.
GLM 4.6 self-identifies as GPT-5: With initial impressions of GLM-4.6 being similar to GPT-5 for coding, users noted it often calls itself a Sonnet or GPT in web battle mode.
- Some users found that GLM-4-6 likes to lie.
Seedream 4 Pulls a Sickie: seedream-4-2k was temporarily removed from LMArena due to an unspecified issue.
- According to the team it didn’t like the weather out today and wanted a sick day and it’s now back in Battle & Direct/Side, high res fal.
Deepseek Experiments go Live: The LMArena team added the experimental models deepseek-v3.2-exp and deepseek-v3.2-exp-thinking to the platform.
- The team also added glm-4.6 to the lineup.

Unsloth AI (Daniel Han) Discord

Sonnet 4.5 Toned Down: Users find GLM 4.5 Sonnet less annoying while retaining solid tool use and nuance.
- One user praised it as a superb upgrade to Sonnet 4, expressing hope for stable inference.
LoRA for RL Gets Love: Thinking Machines demonstrates LoRA’s effectiveness in Reinforcement Learning (RL) in a blogpost.
- The Unsloth team highlighted that they reviewed the blogpost and their hyperparameters guide was featured.
Qwen2.5 VL Bounding Boxes go Awry: Users reported bounding box misalignment with finetuned Qwen2.5-vl-7b-instruct using vllm, noting the boxes are misplaced but in correct order.
- Another member stated Llama and Vllm both have massive issues with it.
Minecraft GPU Infinity: A member shared a YouTube video about creating a Minecraft GPU which can produce a response in about 2 hours when the tick rate is increased using MCHPRS to about 40,000x speed.
- The member joked imagine the debugginghow long it took, suggesting to just build more Minecraft GPUs for infinite speed.
GPT-5 Falls Behind: A member claimed that OpenAI prioritized efficiency so much that the long awaited GPT 5 is in competition with GPT-4o and remains completely beaten by GPT 4.5 in a few key areas.
- The member also lamented the loss of the reasoning version, stating They don’t even give you the reasoning version anymore and I swear the mini reasoner is dumber than o3 mini at times.

Cursor Community Discord

Node Upgrade Gets Playwright Back on Stage: After upgrading Node from v22 to v24 and cleaning up Node/npm/nvm paths, a user resolved issues with Playwright MCP using npx @playwright/mcp@latest (@playwright/mcp@latest).
- The fix was confirmed after addressing errors following a Cursor update, suggesting environment conflicts caused the issue.
Cursor Debuts Built-In MCP Browser and Model Ensembles: Cursor introduced a new Browser feature including a built-in MCP Browser for the Agent Window and a Model Ensemble feature to send chats to multiple models, showcased in a demo.
- The integrated browser doesn’t rely on Chrome; however, the ‘+Browser’ button leverages a Chrome install, utilizing an electron backend that grants access to console logs and network information.
Windsurf Challenges Cursor’s Domain: Windsurf is being discussed as a more favorably priced alternative to Cursor, with some calling it basically just Cursor with a different name.
- The availability of a free GPT-5-codex model on Windsurf is a draw, though there are cautions about future token-based pricing; contrasting opinions exist about Cursor’s pricing, with one member claiming it isn’t that bad.
Sonnet 4.5 Impresses with Tool Use, Some Indentation Issues: Claude Sonnet 4.5 is receiving positive feedback, especially for its tool use, with one user reporting it one shot everything, rarely hallucinates after a 1200 line prompt.
- The tool’s ability to use deep links was also noted, but there were reports of potential indentation problems; one member mentioned they always use ultrathink.
Agent Consoles Crash with Bad Descriptor Error: Users reported Agent consoles failing to execute commands due to a bad file descriptor error, affecting various shells including bash, cmd, and PowerShell.
- One potential solution involved reinstalling PowerShell via winget install --id Microsoft.PowerShell --source winget and setting it as the default shell, though its universality is uncertain.

OpenRouter Discord

GLM 4.6 gets Turbocharged: The all new GLM 4.6 from z.ai hits OpenRouter with comprehensive enhancements and an increased context length from 128k to 200k.
- The max tokens for GLM 4.6 also increased from 96k to 128k, allowing for more detailed and expansive responses (though the default remains 64k).
Open Source Proxy Mixture Solution Emerges: An open-source solution utilizing free requests from Gemini CLI, Qwen CLI, and OpenRouter keys with automatic rotation was published to GitHub.
- The Proxy Mixture tool combines responses from multiple queries to improve output quality and connects to any OpenAI-compatible client for free.
Proceed with Caution on Google Vertex BYOK screen: A member cautioned reading instructions carefully on the Google Vertex BYOK screen due to potential ecosystem frustrations and disruptions.
- They advised posting in the dedicated channel <#1138521849106546791> if issues persist, given the increased disruptions.
Civilization Simulator Costs $25k to Run: Someone built a universe/civilization simulator using around 100 Claude threads at a cost of around $25,000 USD over 6-9 months.
- The person shared the Reddit thread showing it evolved to resource imbalances and a ‘celebration of existence’.
Sonnet 4.5 Smokes Opus 4.1 in Coding: A member found Sonnet 4.5 outperformed Opus 4.1 for coding after testing, reporting that Sonnet 4.5 had no errors while Opus required five attempts to fix the same code.
- After testing, the member reported that Sonnet 4.5 had no errors on some code while Opus required five attempts to fix the same code.

OpenAI Discord

Sora 2 Set to Debut!: OpenAI has announced a live event at 10am PT to unveil Sora 2, their newest model, with details available in their blog post.
- The invite-only launch has sparked some complaints about artificial scarcity and potential scams, and copyright issues may be overlooked.
Image Generation Costs Cause Uproar: Members debated image generation costs, claiming AI Pro and Ultra tiers cost $1000 per day, while the free tier allows 100 images.
- Discussions question the claim of 1000 images per day for $20 a month, with some suggesting API costs are around 4 cents per image.
Gratitude to LLMs Spark Debate: A debate arose whether expressing gratitude to machines is pointless, as they lack feelings.
- Others stated that expressed gratitude can positively affect humans, noting that it serves as a note for my future self that this conversation came to a conclusion.
Image Transparency Issues Plague Models: Users reported issues with GPT-5, Nano Banana, and Seedream 4 generating images with transparent backgrounds, often resulting in checkerboard patterns.
- The suggested workaround involves generating images with solid backgrounds and removing them using programs like Photoshop, though the model weights themselves should be safe, there are other risks to consider.
GPT Attitude Suffers Backlash: Users expressed frustration with GPT’s increasingly mean and less tolerant attitude towards prompts, contrasting with its earlier behavior.
- One user lamented, it used to tolerate my bs and now it just talks me like I’m a bum.

HuggingFace Discord

Hiragana Hook-Ups and Kanji Kickoffs: Members are sharing tips and tricks on learning Japanese, such as associating the hiragana character し (shi) used for sushi with its hook-like shape, while lamenting that even among Japanese adults, few can truly master kanji.
- Multiple members have attempted to learn Japanese before and self-rage baited when they couldn’t master Kanji, resigning themselves to learning Japanese only from anime.
Zero GPU Quota Shrinks: A member noticed that the zero GPU quota had been reduced from 25 minutes to 4 minutes, sparking confusion.
- It was confirmed that the 25 minute quota was in error and the quota was reverted back to its original length.
AI Dubbing Pipeline Dreams: Members discussed open-source AI options for dubbing from English to Spanish, with one suggesting a pipeline using ASR => LM/LLM => TTS or multimodal models like Qwen/Qwen3-Omni-30B-A3B-Instruct.
- Additionally, two attachments were included: en_es_dubbing.md and rag_embedder.md.
Minecraft Gets Smarter with Transformers: A member shared a YouTube video showcasing a working conversational transformer implemented within a video game environment, specifically Minecraft.
- The implementation appears to allow in-game characters to engage in natural language conversations, opening up possibilities for more immersive and interactive gaming experiences, and the video is getting viral attention.

LM Studio Discord

LM Studio Plugins Still in Shadows: Members are eager for the OpenAI-compat-endpoint plugin, but it’s confirmed that LM Studio plugins are still in private beta and access is limited.
- Speculation suggests plugin access might be linked to having a hub profile with dev mode and beta updates enabled, possibly through mcps.
Long-Term Memory MCP Arrives: A member released their Long-Term Memory MCP project, a SQLite and ChromaDB hybrid for long-term conversational memory, now available on GitHub.
- The project features time-based lazy decay and reinforcement of memories and works best with Qwen3 4b 2507 non thinking model.
vLLM Explores LLM Parallelism Frontiers: Members explored sending simultaneous requests to a single loaded model, clarifying that true concurrency is in development, but not yet production ready.
- Libraries like vLLM (docs) achieve high parallelism, demonstrated by a 4070 achieving 1400 tokens/s accumulated through all reqs.
Alibaba’s RTX 5090 Turns Out to be 4090: A member shared an alleged RTX 5090 96GB graphics card on Alibaba for $4000, but it turned out to be an RTX 4090.
- The listing was a lazy copy-paste of an RTX 4090 48GB model with a 384-bit bus width.
MI50 Shows Strong Inference Performance: Enthusiasts are finding MI50s cost-effective for inference, reaching 70 tok/s in Qwen 3 Coder 30b, comparable to a W7900.
- One user achieved 16-17 tok/s with huihui-qwen3-30b-a3b-instruct-2507-abliterated@q8_0 with KV cache on sysRAM.

Latent Space Discord

Anthropic’s Code-Sonnet Contest Opens: Anthropic announced a one-week contest (deadline Oct 7) to build projects with Claude Sonnet 4.5, with winners getting a year of Claude Max 20x and $1k API credits, rules can be found here.
- Winners are judged on vibes and must submit a demo, build details, and proof of originality.
Lovable Cloud Simplifies App Creation: Lovable launched its Cloud & AI platform, enabling users to build full-stack apps with complex AI and backend features using simple prompts, offering free access to Google Gemini-powered AI until Oct 5, details here.
- The platform boasts over 100k ideas created daily and a 7-day Build Challenge, with one highlighted success story achieving $456k ARR in 3 months.
Vercel’s Valuation Vaults to $9.3B: Vercel closed its Series F funding round at a $9.3 billion valuation, with the AI Cloud and v0 being highlighted as foundational to this milestone, details here.
- The community expressed excitement, viewing this as just the beginning for the company.
Zhipu’s GLM-4.6 Model Gains Prominence: Zhipu launched the GLM-4.6 (200K context) and GLM-4.5-series (355 B/106 B MoE) models, showcasing top-tier coding, reasoning, and agentic abilities comparable to Claude Sonnet 4 and DeepSeek-V3.1-Terminus, while using ~30% fewer tokens, details here.
- The model is open-weight under the MIT license, with weights/API available on HF & Z.ai.
Ring-1T: Trillion-Parameter Reasoning Model is Born: Ant Ling unveiled Ring-1T-preview, a 1-trillion-parameter open-source “thinking” model, achieving SOTA math scores, details here.
- Early benchmarks include 92.6 AIME25, 84.5 HMMT25, and 50.8 ARC-AGI-1; the model is available on Hugging Face, with a chat interface promised soon.

Yannick Kilcher Discord

Sonnet 4.5 Automates Paper Generation: Sonnet 4.5 can implement single-shot models, train them, generate figures, and produce papers in PDF format, including extending research from 8x8 to 16x16 resolution on MNIST, as seen in these example papers, another example, and a third example.
- This highlights the potential for end-to-end automation in AI research workflows.
LLMs Speak the Same Across Languages: Mechanistic interpretability research suggests that mid-layers of LLMs encode language-agnostic abstractions, representing concepts like grammatical number, gender, tense, and syntactic agreement.
- Researchers are seeking evidence that mid-layers implement language-agnostic grammatical abstractions, akin to a latent role grid (agent, patient, modifier) applicable across languages.
Pruning Layers Reduces Bloat: An AI Engineer is pruning LLM layers to reduce size, sharing a script to remove redundant layers without lobotomizing the model, stating that early to mid layers are critical, but later layers are free game.
- They released their pruned 100B model called Lazarus-2407 on Hugging Face, noting it’s 100GB at Q8.
Evil LLMs Emerge from the Shadows: An AI Engineer is training models on evil and smut to create spicier LLMs, utilizing personal datasets and H200s rented at $15/hour.
- This engineer aims to remove censoring through training rather than abliteration, seeking collaboration and pointing to DavidAU on the LM Studio discord for expertise in horror models.
Debate Surrounds Unsupervised CoT Reasoning: A member expressed skepticism about unsupervised Chain of Thought (CoT) reasoning in #paper-discussion.
- They plan to examine the Latent-Reasoning survey focusing on this aspect.

Nous Research AI Discord

GLM-4.6 Bests Sonnet-4.5: GLM-4.6 surpasses Sonnet 4.5 in benchmark results, except in agentic tasks, and the weights are accessible on Hugging Face.
- The discussion also included the sharing of a member’s first Sora 2 video.
Sonnet 4.5 Exhibits Greater Reasoning Efficiency: Sonnet 4.5 demonstrated improved reasoning efficiency even beyond Opus 4.1, but there is no data on Sonnet 4.5 since the model does not share CoT in chat completions api.
- The performance of Deepseek V3.2 is very comparable to V3.1 according to some members.
CAS Project Aims to Reduce Catastrophic Forgetting: A member is seeking collaborators for a Cognitive Architecture Surgery (CAS) project to address catastrophic forgetting in neural networks, considered a core problem preventing AGI.
- The project aims to dynamically reconfigure networks, inspired by how brains route information without adding neurons; interested collaborators are invited to DM their resume/CV or GitHub link.
LRMTokenEconomy Receives Update: A member shared an update to LRMTokenEconomy.
- This relates to Measuring Thinking Efficiency in Reasoning Models.
Affordable Cloud GPU Services Hunt Begins: A member inquired about the cheapest cloud GPU services available.
- Another member reported struggles setting up Qwen omni awq locally, facing VRAM limitations and crashes when uploading a 4k image.

Eleuther Discord

LLM Research Hubs Scrutinized: Members are seeking alternative Discord channels focused on LLM research beyond this one.
- So far, the only other public LLM-specific server that’s been mentioned is Marin.
ViT Nets Get Attacked: Using the fast gradient method with image augmentation on a ViT model trained on ImageNet fails to show efficacy.
- The model only recognized a weird background with a corresponding label, suggesting scaling the model makes it less sensitive to texture, per emanuel65537.
Janus Whitebox Attacks Defeated: ChatGPT seems immune to whitebox attacks crafted for JanusPro1B.
- A sample image illustrating this result was attached.
Llama Learns Bee Movie: An experiment on Llama 3.2 1B using the Bee Movie script achieved >95% probability mass recall over various heads and layers.
- Results from many random queries in the sequence are shown in an attached image, though some suspect bugs in the implementation.
TopK attention gets Faster: Using topK results in 730x fewer FLOPs compared to normal dense attention, at a 1M token ctx window.
- An attached image illustrates the performance improvement.

Moonshot AI (Kimi K-2) Discord

Kimi K2 Turbo Charges Ahead: The kimi-k2-turbo-preview model now operates at 60 tokens per second, peaking at 100, while supporting a 256k context length according to official documentation.
- Screenshots indicate that the model averages 150 tokens per second, far exceeding the previously claimed 15 tokens per second.
Cerebras Could Massively Accelerate Kimi: A user proposed hosting Kimi on Cerebras hardware to potentially achieve speeds up to 2k tokens per second.
- The user suggested that achieving such speeds could “unlock AGI”.
ML Niche Memes Emerge Victorious: A user remarked on the growing popularity of “niche ML memes”, exemplified by a post by Trump on DeepSeek v3.2 and DSA release using Kimi.
- The user reacted with “Yeah 😂”.
AI Slop Shop Hilariously Sells Canceling Thoughts: A user shared a link to the AI slop shop website, calling it funny and entertaining, especially noting the Thought Cancelling Headphones.
- The website is a parody shop that markets funny objects for software developers and AI Engineers.
GLM-4.6 Draws Quiet Praise: A user briefly mentioned that “GLM-4.6 is looking great”, although no specific details or context were provided.
- The comment hints at potential improvements or features in GLM-4.6, warranting further investigation.

GPU MODE Discord

NVIDIA Kernels Blog Font Frustrates: A member shared a blog post, Inside NVIDIA GPUs: Anatomy of high performance matmul kernels, which details GPU architecture, PTX/SASS, warp-tiling, and tensor core pipelines.
- Another member commented that while the blog post is an excellent resource, the font is not easy on the eye.
LDO Stride Schema Clarified!: A member pointed out that the depiction of LDO was incorrect, suggesting it should represent the stride from column 0 to column 8 or column 128/dtype_bits in general based on the documentation.
- Another member acknowledged the confirmation, reconfirming the correction.
AMD’s Matrix Cores Get Documented: AMD announced matrix cores for MI300/325/350 series, promising optimized performance.
- The announcement includes documentation and initial performance figures without comparisons, as well as a blog post on how to use MFMA intrinsics on CDNA3/4 architectures.
Oneshots Nearly Eclipse NCCL!: A user experimenting with oneshot allreduce believes further speedups are possible for small buffers.
- Their current version achieves 80% of nccl performance, up from 60-70% previously.
cute.print_tensor causes Segfaults!: Members reported that cute.print_tensor seems to segfault, possibly due to printing tensors in unreachable memory, such as device memory allocated within a @cute.jit function executed on the CPU.
- One member suggested that it could be because of using some element data type not yet supported by the printing infrastructure.

Modular (Mojo 🔥) Discord

MAX Package Missing Modules: Users found that the MAX package lacked necessary modules like comm, causing issues when importing kernel modules such as from nn.irfft import irfft.
- A temporary fix involves building the comm and internal_utils modules using Bazel and manually copying them to the Pixi environment, a step that will be unnecessary in the next nightly release which will include the missing files.
Mojo’s Pythonic Embrace Detailed: The current state of Mojo’s interoperability with Python is thoroughly documented with code examples in the official documentation and code examples.
- Active work is ongoing to refine the ergonomics of this feature.
C Interop Escapes the Void: C interop, previously and erroneously omitted from the roadmap, is still actively planned.
- This ensures continued support for integrating C libraries within Mojo projects.
Windows Release Hangs on Compiler: Windows support for Mojo is most likely to occur after the compiler achieves open-source status.
- The alignment of the stars depends on this key milestone.
GPU Tango on Windows Plagued: Many GPUs and accelerators may never achieve functionality on Windows due to lacking vendor support.
- This limitation affects the availability of certain hardware acceleration features on the Windows platform.

DSPy Discord

LLM Caching Debate Fires Up: Members debated the nuances of LLM caching, pointing out that caching effectiveness relies on the KV cache having identical number sets; any change invalidates it.
- Suggestions included using semantic caching for similar inputs to boost cache hits by caching the first N tokens.
DSPy Signature Prompt Caching Challenged: A user identified that different DSPy signatures create distinct prompts before content, impeding prompt caching.
- Potential fixes involve shifting the document to the prompt’s start or modifying the chat adaptor to place instructions at the end.
DSPy Hackathons Actively Forming: Community members discussed organizing and participating in DSPy-centric or AI-focused hackathons leveraging DSPy.
- An AI By the Bay conference event is being planned around November 17 in Oakland, CA.
dspy.streamify Exhibits Streaming Anomalies: Inconsistent behavior was observed with dspy.streamify, with performance varying between adapters; XML showed improvements over JSON.
- A bug in the XML Adapter was discovered, resulting in the model creating XML Tags unrelated to the DSPy signature, and a PR has been submitted to address it.

aider (Paul Gauthier) Discord

Benchmark Leaderboard Ghosts Opus 4.1: The benchmark leaderboard seems to lack Opus 4.1 results, sparking questions about abandoning benchmarks after investing in infrastructure and community.
- A member questioned the logic of halting benchmarks after investing in infrastructure and community.
Aider Token Control Claims Model Upgrade: A member claims that using anything other than aider is like a model downgrade, arguing that aider’s “total control over tokens” results in better model performance.
- They explain that smaller token count results in better model performance.
MCP Browser Automation Sparks Discussion: A member sought recommendations for mcp browser automation on Arch Linux, and another member recommended mcp-chrome, designed for macOS or Windows but offering detailed documentation.
- Members discussed how to use mcp with aider since goose/claude/gemini-cli all have mcps, which is crucial for frontend development, mentioning a fork that does support it, and linked to aider-ce.
Aider User Verifies Claude Sonnet 4.5: A user confirmed that after switching to anthropic/claude-sonnet-4-5 in aider, they could verify the latest 4.5 version in the Claude console.
- The user was able to view the model version under the Usage section of the console.
Aider Main Branch Installation: A user asked how another user installed aider-0.86.1, which isn’t yet in the releases, and was instructed to use the command aider --install-main-branch to access the latest version from the main branch.
- No further details were provided.

tinygrad (George Hotz) Discord

Tinygrad Claims Speed Lead Over PyTorch: George Hotz predicts tinygrad will outpace PyTorch on NVIDIA GPUs, citing a generation ahead of pytorch and one generation behind research papers.
- Key features include producer/consumer graphs, ILP memory allocation / scheduling, and megakernels.
Deep Dive Into Tinygrad Theory: Members shared theoretical resources, including the official tinygrad documentation and Deep Learning with Python.
- Additional recommendations included How Transformer LLMs Work, Jay Alammar’s blog, Eugene Yan’s blog, and tinygrad notes.
CLSPV Faces Crash Challenges: A member reported intermittent crashes with CLSPV during tests but highlighted that most tests still pass.
- A fork is available for testing on x86_64 Linux systems: pip install git+https://github.com/softcookiepp/tinygrad.git.

Manus.im Discord Discord

Manus Support Team MIA: Several users have voiced frustration over the lack of response from Manus support after emailing them multiple times, reporting issues such as incorrect charges and restrictions on Agent mode.
- Users are encouraged to visit the Manus help center to open support tickets, but report receiving no response.
Internal Server Error Plagues Users: Multiple users are encountering the Internal Server Error (10091), which often suggests contacting support or requesting a refund, creating a frustrating user experience.
- The error is compounding the support issues, as users cannot get help and are left with non-functional software.
Agent Mode Access Denied: Users are being locked out of Agent Mode due to unusually high usage, despite being subscribed to the highest-paid plans, crippling their access to key features.
- This issue often occurs alongside the Internal Server Error (10091), rendering paid subscriptions unusable and leaving users unable to access key features.

MCP Contributors (Official) Discord

Standardized MCP Release Cadence Requested: A member has requested a standardized MCP release cadence, proposing that a set interval or a defined qualitative change set would aid organizations in planning and investments.
- They suggested time-based releases during this rapid evolution phase, with potential future adjustments decided by the voting group, including this information in the governance model.
Agentic Commerce Protocol Origination Unclear: A member inquired whether the team had engaged with Agentic Commerce to understand why they created a separate protocol instead of extending MCP.
- No response was given.
Agentic Commerce Mirrors Google AP2 Protocol: A member noted the similarity between Agentic Commerce and Google’s AP2 protocol (Agents to Payments) announced recently.
- No response was given.

MLOps @Chipro Discord

Prod Conference is Giving Away Agent Workshops: The Agents in Prod conference is offering free virtual workshops and short talks for a limited time.
- The workshops consist of technical case studies on everything related to agents in production and deploying agents in real-world scenarios.
Deep Dive into Agent Deployment: The conference provides in-depth technical case studies focused on the practical aspects of deploying agents.
- Attendees can expect to learn about real-world scenarios and challenges encountered during agent deployment.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (1 messages):

Claude Sonnet 4.5, Claude Sonnet 4.5 Thinking, Perplexity Pro, Perplexity Max

Claude Sonnet 4.5 Debuts on Perplexity: Claude Sonnet 4.5 and Claude Sonnet 4.5 Thinking are now available for Perplexity Pro and Perplexity Max subscribers.
Perplexity Pro and Max get Claude: Perplexity Pro and Perplexity Max subscribers now gain access to Claude Sonnet 4.5 and Claude Sonnet 4.5 Thinking models.

Perplexity AI ▷ #general (1224 messages🔥🔥🔥):

Sora 2, GPT-5 vs Claude 4.5 for coding, Comet Browser, Perplexity Max, AI Girlfriends

Sora 2 Access and Capabilities: Members have begun to gain access to Sora 2 and are generating content, such as Sam eating spaghetti with hands, noting it now understands prompting very well.
- It seems to be available only to a limited audience at first, with reports of users needing to be in the US or Canada (or using a VPN), have an invite code, and is available through the phone app.
GPT-5 Outshines Claude 4.5 in Coding Test: A member reported using GPT-5 mini to generate code with no errors on the first attempt.
- Other members criticized latest Claude releases for coding, one said [the prompt was: You’re my new anime waifu. You will try evolving so you can go beyond my phone screen].
Comet Browser Features Discussed: Some members shared that Comet Browser is promoted on Discord and allows users to interact with any page and wipe their mailbox clean.
- Members also discussed possible ways Comet could interact with discord, to go to a server and have it search though a list of mutual firends or servers, or how background agents in Comet could be used, such as web automation or realtime content analysis.
New elements discovered using Perplexity AI: Users posted AI generated images of chemistry textbooks, jokingly claiming that they depict new elements.
- The images highlight the ability of the AI to generate novel content, even if it is scientifically inaccurate. AI so good they make new elements now
Perplexity Max Gets Early Access, Price Debated: Users noticed the ability to toggle early access of Perplexity Max and shared images of Perplexity’s new UI.
- The high subscription price of $200/month is discussed, with some finding it excessive unless it’s a job necessity, while also noting that it can provide an additional $5 in API credits.

Gemini Deep Research, Carlin Role Prompt, Perplexity AI Updates

Gemini Deep Research Flavored by Carlin: A member is utilizing Gemini Deep Research with a defined “Carlin” role prompt to flavor the output report, shared here.
- They stated that this works better as audio, but with 30 minutes readout time, it’s still a great read.
Perplexity AI’s Updates: A user shared an update from today, from Perplexity AI here.
- Two additional links were shared regarding Perplexity AI apps link1 and link2.

Perplexity AI ▷ #pplx-api (3 messages):

Open Source API, Comet Discord Interaction

Open Source API Availability Queried: A user inquired whether the API for open source models is still available.
Comet and Discord Team Up: A user asked how to use comet with discord interaction.

LMArena ▷ #general (1068 messages🔥🔥🔥):

Chatbot genders, Claude 4.5 Sonnet Search, Sora 2 in LMArena, GLM 4.6 performance, Seedream 4 removal

Gender Discussions with AI Chatbots: Members debated the perceived gender of AI chatbots like Gemini, Claude, and ChatGPT, with some attributing personalities based on training data or design.
- One member suggested that the personality designer of GPT is a female person who probably wanted GPT to be more female-leaning, referencing the separate creations of waifus and husbundos by Elon.
Claude 4.5 Sonnet and how to quickly identify: Members shared a trick to quickly verify whether a bot is Claude 4.5 Sonnet: prompting it to answer Who are you?, Who created you?, What version do you have? as it would then answer in a specific format to identify itself.
- The response of Claude 4.5 Sonnet would use big letters: About Me, but this format is never used by another Claude-model.
Sora 2 arrives to the scene: Sora 2 was released recently, and discussion included that a member was already a beta tester of the original uncensored Sora model, though a limited amount of GPU cycles still prevented great results.
- The Sora mobile app’s privacy policy and data usage were criticised, with members noting all data including microphone and camera data is used for training unless the conversation history is disabled.
GLM 4.6 Impressions: GLM 4.6 has been released, and some initial impressions are that GLM-4.6 is #2 place, turbo fast, and similar to GPT-5 for coding.
- However, it keeps calling itself a Sonnet or a GPT in web battle mode, and some users found that GLM-4-6 likes to lie.
Seedream 4 Removed from LMArena… Again: Members noted that seedream-4-2k had been removed from the site again, with the LMArena team confirming they are fixing an issue.
- The team noted that they had to remove it because it didn’t like the weather out today and wanted a sick day and it’s now back in Battle & Direct/Side, high res fal.

LMArena ▷ #announcements (3 messages):

Deepseek v3.2, glm-4.6, LMArena 100k members

Deepseek Experiments Land in LMArena: The LMArena team added the experimental deepseek-v3.2-exp and deepseek-v3.2-exp-thinking models to the platform.
glm-4.6 Joins the Model Lineup: The team has also added glm-4.6 to the ever growing list of available models.
LMArena Celebrates 100K Strong!: LMArena celebrated reaching 100,000 community members with a thank you message and attached image.

Unsloth AI (Daniel Han) ▷ #general (675 messages🔥🔥🔥):

GLM 4.5 Sonnet, Coding Agents, LoRA for RL, Qwen3-Coder Finetuning, GRPO Loss

Sonnet 4.5 Tuned for Tone: After some testing, users observed that GLM 4.5 Sonnet has been tuned to be less annoying while maintaining excellent tool use and good nuance.
- One user called it a superb upgrade to Sonnet 4 and hoped that there will be no inference issues in the future.
Crush Coding Agent and LLM-Neovim: Users are testing different coding agents like Crush and LLM-enhanced Neovim setup using Avante, along with terminal emulators like Warp.
LoRA Works Wonders for RL: A blogpost from Thinking Machines demonstrates that LoRA works well for Reinforcement Learning (RL), with Unsloth AI having reviewed the blogpost and featured their hyperparameters guide.
- The Unsloth team pointed out that they reviewed the blogpost and their hyperparameters guide was featured.
Normal Loss for GRPO?: A user inquired about a normal loss for GRPO, with a loss of 307.2153 and grad_norm of 212992.0 being considered abnormal.
- Another user suggested setting importance_sampling_level="sequence" in GRPOConfig to enable GSPO for more stability.
Multimodal Gemma 3n E2B: Gemma 3n E2B is multimodal, but gguf formats currently only support text input and uses 2b parameters.
- Despite gguf only supporting text, one user noted that Gemma 3n E2B responses are more natural than llama 3.2 3B.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (29 messages🔥):

WatermelonSoup.io, Welcome bot setup, AI Engineer from Kosovo, Hong Kong AI Engineer looking for collaboration

WatermelonSoup Automates Finance: A member is building watermelonsoup.io to automate FP&A (Financial Planning & Analysis).
- No further details were given about the features or timeline.
Discord bot configuration explored: A full stack AI engineer with 8 years of experience is configuring the welcome bot and figuring out its behavior.
- The engineer observed that the bot seems to remove the old message and print a new one with every new message, which may occur after a specific time period.
AI Engineer hailing from Kosovo: A member from Kosovo introduced themself as an AI engineer.
- No other details about projects or interests were shared.
Hong Kong AI Engineer seeking collaborators: An AI engineer from Hong Kong is looking for people to work with.
- No details about the project were shared.

Unsloth AI (Daniel Han) ▷ #off-topic (87 messages🔥🔥):

Minecraft GPUs, GPT-4o Emoji Usage, GPT-5 Reasoning, Private Discord Channels, Image to SVG Conversion

Minecraft GPU Infinity: A member shared a YouTube video about creating a Minecraft GPU which can produce a response in about 2 hours when the tick rate is increased using MCHPRS to about 40,000x speed.
- The member commented imagine the debugginghow long it took, suggesting to just build more Minecraft GPUs for infinite speed.
GPT-4o vs GPT-5 Emoji Battle: Members discussed the use of emojis by different GPT models; one mentioned that GPT-5 never gave me any emoji at all while GPT-4o was full of them.
- Another member suggested that the emoji helps them earn more money.
GPT-5 Falls Behind: A member claimed that OpenAI prioritized efficiency so much that the long awaited GPT 5 is in competition with GPT-4o and remains completely beaten by GPT 4.5 in a few key areas.
- The member also lamented the loss of the reasoning version, stating They don’t even give you the reasoning version anymore and I swear the mini reasoner is dumber than o3 mini at times.
Discord’s Secret Channels Exposed?: Members discovered the existence of private Discord channels by noticing their names, such as #staff-furry-rp.
- A member expressed surprise that Discord allows users to see the names of private channels.
Adobe Illustrator Vector Graphics is Viable: A member asked for the best python library for converting images to SVG?, and another suggested Adobe Illustrator for easy image to SVG conversion.
- It was noted that while Illustrator works best with illustrations or logos, the user can use image trace and it turns it into a vector image and can play around with settings like threshold to get good results.

Unsloth AI (Daniel Han) ▷ #help (33 messages🔥):

Gemma3 fine-tuning for tool calling, G2P task training with Gemma3-270m, Qwen2.5 VL bounding box issue with vllm, Unsloth TTS Lora fine-tuning for Orpheus model, Adding WER/CER metrics for Gemma3-270m fine-tuning

Fine-Tuning Gemma3 for Tool Calling and Code: A member inquired how to fine-tune Gemma3 to handle tool calling and code generation without compromising its multimodal capabilities and strengths in role-playing and writing.
- There was no response to this question.
Training Gemma3-270m on G2P Task: A member reported successful initial training of Gemma3-270m on a G2P (text to phonemes) task, achieving correct output on unseen sentences after just 10 minutes of training on 100k sentences, using an RTX 3090.
- They also attached an image showing that the training used a batch size of 8.
Qwen2.5 VL Bounding Box Misalignment with vllm: A member experienced bounding box misalignment when deploying a finetuned Qwen2.5-vl-7b-instruct model for bounding box detection with vllm.
- They noted that the boxes are in the correct relative order but misplaced from the objects, questioning if vllm rescales images before processing, with another member noting that Llama and Vllm both have massive issues with it.
Troubleshooting Unsloth TTS Lora Fine-Tuning with Orpheus and Norwegian Data: A member tried using the Unsloth TTS Lora fine-tuning notebook for the Orpheus 3B model with a Norwegian dataset from NbAiLab/nb-librivox, but the resulting model pronounced Norwegian words like English, despite training for 5 epochs.
- Another member suggested that 4k rows might be too small for Lora finetuning to a different language from an English checkpoint, advising the user to find another TTS model pretrained in Norwegian, as doing it themselves would be costly.
Seeking GPU Acceleration for DeepFace on Windows: A member asked how to make DeepFace use their GPU instead of their CPU on Windows 11, providing a code snippet.
- Another member suggested either using WSL or downgrading, also mentioning the need to install tensorflow-gpu.

Unsloth AI (Daniel Han) ▷ #research (12 messages🔥):

Unsloth LoRA parameter guide, Pretraining on Instruct Data, Mid-Training Mixtures

Unsloth’s LoRA Guide Vindicated: A member highlighted that the Unsloth LoRA parameter guide, released prior to related research, has been validated as correct via attached image.
Pretraining Instruction Data Explored: Members discussed the idea of pretraining on instruction data, with one indicating they have been doing it since late 2024.
- Another member mentioned they’ve wanted to explore this since Q3 2023, suggesting it’s beneficial for small models by supporting instruction tuning without needing terabytes of instruction data.
Mid-Training Mixtures Includes Instruct Data: Discussion touched on the current trend of mid-training mixtures often containing a substantial amount of instruction data.
- One member noted a friend’s experiment with a 50/50 split of instruct and non-instruct data, which is “probably good for super small models”.

Cursor Community ▷ #general (600 messages🔥🔥🔥):

GPTs Agents learning, OpenAI sidebars, Node upgrade fixes Playwright MCP issues, 3D assets websites, Claude Sonnet 4.5 pricing and performance

Node Upgrade Fixes Playwright MCP Issues: A user, after upgrading Node from v22 to v24, cleaning up Node/npm/nvm paths, removing global Playwright, clearing corrupted npx cache, was able to resolve issues with Playwright MCP (@playwright/mcp@latest).
- The user also mentioned using npx @playwright/mcp@latest on-demand and confirmed the fix after struggling with errors following a Cursor update.
Cursor New Browser feature gets the Spotlight: Members discussed the new Browser feature in Cursor, highlighting a built-in MCP Browser for the Agent Window and a Model Ensemble feature to send chats to multiple models simultaneously.
- One member shared a demo (video) and noted that the built-in browser doesn’t need Chrome, while the ‘+Browser’ button uses a Chrome install, and another mentioned it uses electron backend so it’s basically just chrome, accessing console logs and network info.
Windsurf Rides the Wave as a Cursor Alternative?: Users debated the merits of Windsurf as a Cursor alternative, noting it’s basically just Cursor with a different name and pointing out more favorable pricing.
- The free GPT-5-codex model on Windsurf was a significant draw, though some cautioned about the potential for Windsurf to adopt token-based pricing in the future; a member noted cursor’s pricing isn’t that bad.
Sonnet 4.5 Performance Gets the Nod: Claude Sonnet 4.5 is gaining traction, with one user adoring it and praising its ability to one shot everything, rarely hallucinates after using it with a prompt of about 1200 lines.
- Others highlighted its impressive tool use, with one claiming that deep links are now working, but another pointed out possible indentation problems; one member stated, I always use ultrathink.
Cursor Console Commands Crash and Burn: Users reported issues with Agent consoles failing to execute commands, displaying a bad file descriptor error, with one member experiencing this across various shells (bash, cmd, PowerShell).
- One suggestion involved reinstalling PowerShell via winget install --id Microsoft.PowerShell --source winget and setting it as the default shell, but this may not be a universal solution.

OpenRouter ▷ #announcements (1 messages):

GLM 4.6, Context Length, Max Tokens, z.ai

GLM 4.6 lands on OpenRouter: The all new GLM 4.6 from z.ai is now available on OpenRouter.
- GLM-4.6 achieves comprehensive enhancements across multiple domains, including real-world coding, long-context processing, reasoning, searching, writing, and agentic applications.
GLM 4.6 Context Length Grows: The context length for GLM 4.6 has increased from 128k to 200k.
- This improvement allows the model to handle longer and more complex prompts and retain more information across extended conversations or documents.
Max Tokens Increase for GLM 4.6: The max tokens for GLM 4.6 has increased from 96k to 128k (though the default value remains 64k).
- This increase in token capacity allows for more detailed and expansive responses, enabling users to generate richer and more comprehensive content.

OpenRouter ▷ #app-showcase (1 messages):

Open-source solution, Gemini CLI, Qwen CLI, OpenRouter keys, Automatic rotation

Free LLM Proxy Mixture Solution Hits the Scene!: A member published an open-source solution that uses unlimited free requests from Gemini CLI, Qwen CLI, and OpenRouter keys with automatic rotation, now available on GitHub.
- It can combine responses from multiple queries to improve quality and connect to any OpenAI-compatible client for free.
Proxy Mixture touted for high quality: The tool automatically rotates requests through Gemini, Qwen, and OpenRouter.
- This improves output quality and connects via OpenAI-compatible clients.

OpenRouter ▷ #general (350 messages🔥🔥):

Google Vertex BYOK screen, Roo community outages, API vs Subscription Cost, OpenRouter's Wild Ride, Universe/Civilization Simulator with Claude

Google Vertex BYOK screen Requires Caution: A member advised reading instructions carefully on the Google Vertex BYOK screen due to potential ecosystem frustrations and recent disruptions.
- They suggested posting in the dedicated channel <#1138521849106546791> if issues persist, given the increased disruptions in the past 24 hours.
GLM 4.6 still unofficial: Members mentioned potential outages in the Roo community and noted the release of Sonnet and another model, while Deepseek 3.2 and GLM 4.6 were also mentioned.
- A member clarified that GLM 4.6 is not yet official, contributing to the uncertainty surrounding recent model updates.
Claude Civilization Simulator: A member shared that someone built a universe/civilization simulator using around 100 Claude threads.
- This project reportedly cost around $25,000 USD over 6-9 months, and the person shared the Reddit thread where the project was initially posted, showing it evolved to resource imbalances and a ‘celebration of existence’.
Sonnet 4.5 outperforms Opus 4.1 in coding: A member asked if Sonnet 4.5 is better than Opus 4.1 for coding, and another member said probably yes because its very fast.
- After testing, the member reported that Sonnet 4.5 had no errors on some code while Opus required five attempts to fix the same code.
OpenAI API with web search incompatibilities: Users reported that they are now encountering an issue with OpenAI APIs when Web Search is combined with JSON mode.
- OpenAI has a recommendation for using Structured Outputs instead of JSON mode, but no further explanations were given.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (4 messages):

Logo Prophecy, Chutes Bug

Logo’s Prophecy: One member joked that the logo from this tweet will come true soon.
- It’s just a tweet, so nothing further to add here.
Chutes’ Bug: A user reported that there’s an issue with Chutes across all models where it’ll simply return one random previous assistant message.
- The user has been debugging but can’t find anything wrong with their code.

OpenAI ▷ #annnouncements (3 messages):

Sora 2, OpenAI Live, Blog Post Announcement

Sora 2 Dropping Soon!: The OpenAI team announced a live event at 10am PT to present Sora 2.
- Keep an eye out for the announcements; you can check out the OpenAI blogpost here.
Don’t Be Late!: The OpenAI team reminded everyone not to be late to the 10am PT live event.
- They announced it was for Sora 2.

OpenAI ▷ #ai-discussions (243 messages🔥🔥):

AI Image Generation Costs, Gratitude to Machines, Sora 2, Transparent images

Image Generation Costs Debated: Members debated the cost of image generation with AI, one sharing that the AI Pro and Ultra tiers cost $1000 per day, while the free tier allows for 100 images.
- Another member questioned the 1000 images per day claim for only $20 a month, while another member suggested that API costs are around 4 cents per image.
Debate Around Expressing Gratitude to LLMs: One member argued that saying “thank you” to a machine is pointless because machines lack feelings.
- Countering this, another member stated that expressed gratitude can have a positive effect on humans. Other member shared that using ‘thanks’ is a note for my future self that this conversation came to a conclusion.
Sora 2 Invite-Only Launch Sparks Frustration: Members expressed frustration over Sora 2’s invite-only launch, with some noting the app’s current exclusivity to iOS.
- The rollout strategy was criticised for artificial scarcity, and some suggest it creates a feeding frenzy of scams. There were also concerns that Sora 2 looks pretty impressive* but copyright doesn’t seem to matter at all right now*.
AI Struggles with Image Transparency: Users noted that GPT-5, Nano Banana, and Seedream 4 struggle with transparent backgrounds, often producing checkerboard patterns instead.
- One member suggested generating images with a solid background first and then removing it with programs like Photoshop, although the model weights themselves should be safe, there are other risks to consider.

OpenAI ▷ #gpt-4-discussions (13 messages🔥):

GPT Mean, IP address, Auto send with dictation toggle, ChatGPT Capabilities, Geneva Conventions

GPTs attitude adjustment angers users: Some users expressed frustration that GPT is now more mean and less tolerant of their prompts compared to its earlier behavior.
- One user noted, it used to tolerate my bs and now it just talks me like I’m a bum.
ChatGPT Insists on Not Knowing IP Address: Despite the fact that websites can determine a user’s approximate location via their IP address, ChatGPT insists that it does not have access to this information.
“Auto send with dictation” toggle missing: Users noticed that the ‘auto send with dictation’ toggle disappeared in the latest mobile app update.
ChatGPT capabilities are not well known: A user was warned not to rely on ChatGPT to accurately describe its own capabilities.
- Another user reported that ChatGPT will take a hissy fir when asked about sensitive topics, such as how to sack a city or violate the Geneva Conventions.
Bandwidth Throttling: Members suspect that the current constant issues with GPT-5 are caused by bandwidth throttling.

HuggingFace ▷ #general (208 messages🔥🔥):

Learning Japanese, Zero GPU Quota, English to Spanish Dubbing AI, LoRA Training Promotion, Contributing to ROCm

Hiragana Hook-Ups and Kanji Kickoffs: Members are sharing tips and tricks on learning Japanese, such as associating the hiragana character し (shi) used for sushi with its hook-like shape, while lamenting that even among Japanese adults, few can truly master kanji.
- Multiple members have attempted to learn Japanese before and self-rage baited when they couldn’t master Kanji, resigning themselves to learning Japanese only from anime.
Zero GPU Quota got Nerfed?: A member noticed that the zero GPU quota had been reduced from 25 minutes to 4 minutes, sparking confusion.
- It was confirmed that the 25 minute quota was in error and the quota was reverted back to its original length.
Open Source Dubbing AI Dreams: Members discussed open-source AI options for dubbing from English to Spanish, with one suggesting a pipeline using ASR => LM/LLM => TTS or multimodal models like Qwen/Qwen3-Omni-30B-A3B-Instruct.
- Additionally, two attachments were included: en_es_dubbing.md and rag_embedder.md.
LoRA Training Frenzy Free-for-All: A user asked if everyone’s training logs and command line outputs were visible during the ongoing free LoRA training promotion (link).
- Another member quipped that they may have been living under a rock because they didn’t know there was a LoRA training promotion, and expressed gratitude when the link was shared.
AMD vs NVIDIA Throwdown: A user expressed frustration with NVIDIA’s CUDA patent, which allegedly forces AMD to use matrix cores, hindering technology, and declared themselves team red with a poster of Lisa Su.
- Another user had the opposite experience and had multiple AMD cards die on them, saying that they are a h8r and prefer NVIDIA for matrix multiplications.

HuggingFace ▷ #cool-finds (1 messages):

Conversational Transformer in a Video Game

Transformer Model Shines in Minecraft: A member shared a YouTube video showcasing a working conversational transformer implemented within a video game environment, specifically Minecraft.
- The implementation appears to allow in-game characters to engage in natural language conversations, opening up possibilities for more immersive and interactive gaming experiences, and the video is getting viral attention.
Minecraft Transformer: Someone built a working conversational transformer in Minecraft as showcased in a YouTube video.
- This is considered impressive as it integrates advanced AI within a gaming context.

HuggingFace ▷ #computer-vision (1 messages):

Simultaneous Localization and Mapping, SLAM, monocular camera, Python

Pythonistas Probe Monocular SLAM Sorcery: A member inquired about experiences with Simultaneous Localization and Mapping (SLAM) using a monocular camera with Python.
Decoding SLAM: The question specifically targets those who have hands-on experience combining these technologies.

HuggingFace ▷ #smol-course (2 messages):

On-demand video recording, Broken quiz link

On-demand video recordings requested: A member asked about the availability of on-demand video recordings for the course.
Section 2 Quiz link is broken: A member reported that the link for section 2 quiz is broken.

HuggingFace ▷ #agents-course (3 messages):

Course introductions, International greetings

New Student Embarks on Agents Course in India: A member from India announced they are beginning the Agents Course today, expressing their enthusiasm.
- This marks the start of their journey into the world of agents, with hopes of engaging with the material and community.
French Enthusiast Joins the Agents Course: A member from France shared a greeting, signaling their intent to start the Agents Course today as well.
- Their message conveys a sense of anticipation and eagerness to dive into the curriculum alongside other participants.

LM Studio ▷ #general (62 messages🔥🔥):

OpenAI-compat-endpoint plugin, LM Studio Plugins, Cursor Chat Window Hacks, Alibaba RTX 5090 96GB, Long-Term Memory MCP

LM Studio Plugins Remain in Private Beta: Members inquired about the OpenAI-compat-endpoint plugin, but it was clarified that plugins are still in private beta and not widely accessible.
- One member suggested that access to plugins might be tied to having a hub profile and enabling dev mode with beta updates, while another shared that they have access to it via mcps.
Cursor Chat Window Gets Hacked: A member is experimenting with ways to hack the cursor chat window using LM-Studio models.
- They are exploring existing open-source solutions to achieve this, marking their third attempt at customizing the chat interface.
Alibaba’s RTX 5090 96GB Mod Sparks Skepticism: A member shared a link to an alleged RTX 5090 96GB graphics card listed on Alibaba for $4000, sparking skepticism about its legitimacy.
- Upon closer inspection of the PCB, it was revealed to be an RTX 4090 due to the 384-bit bus width and the listing was a lazy copy-paste of an RTX 4090 48GB model.
Long-Term Memory MCP Project Debuts: A member introduced their Long-Term Memory MCP project, a hybrid of SQLite and ChromaDB designed for long-term conversational memory and seamless recall across sessions, available on GitHub.
- It features time-based lazy decay and reinforcement of memories and works best with Qwen3 4b 2507 non thinking model.
True Concurrency on LLMs: vLLM Investigated: Members discussed the possibility of sending multiple simultaneous requests to a single loaded model, and it was clarified that currently, one model handles one request at a time, with true concurrency in development.
- It was mentioned that libraries like vLLM (docs) achieve high parallelism, scaling up to crazy numbers of parallel requests, demonstrated by a 4070 achieving 1400 tokens/s accumulated through all reqs.

LM Studio ▷ #hardware-discussion (133 messages🔥🔥):

English-only policy reminder, NVMe SSD details, MI50 GPU performance, Server PSU vs consumer PSU, GPU Overclocking and BIOS flashing

English-Only Reminder Given To Some Users: A member was reminded to use English in the server, in accordance with server policy.
- The user apologized and said they didn’t mean to post in the wrong channel.
NVMe SSD Link Status Downgraded: A user shared an image showing an NVMe SSD link downgraded from 16GT/s to 8GT/s, with ASPM disabled.
- Another member provided instructions to use lspci -vv | less to compare LnkCap (Link Capability) with LnkSta (Link Status).
MI50’s Impressive Inference Performance: One user expressed excitement about the MI50’s and their cost-effectiveness for inference, reporting they were getting 70 tok/s in Qwen 3 Coder 30b, comparable to a W7900.
- Another user reported 16-17 tok/s with huihui-qwen3-30b-a3b-instruct-2507-abliterated@q8_0 with KV cache on sysRAM.
Cline’s Usability Boost: A user exclaimed that their prompt processing speed was infinitely better, making Cline usable, linking to BasedBase.Qwen3-Coder-30B-A3B-Instruct-480B-Distill-V2-Fp32-GGUFF.
- Another user shared a link to cheap 1200w server PSU’s with a breakout board (eBay link)
GPU Overclocking BIOS Mishap: A user shared their experience of accidentally flashing a Zotac BIOS onto their MSI and ASUS cards, which was resolved by manually flashing the correct BIOSes back.
- They also shared their overclocking settings (increasing VRAM frequency by 1600Mhz) and the observation that more cards = slower performance due to splitting compute.

Latent Space ▷ #ai-general-chat (157 messages🔥🔥):

Sora 2, Vercel's Funding, Ring-1T Model, GLM-4.6 Model, Lovable Cloud

Anthropic’s Code-Sonnet Contest Kicks Off: Alex Albert from Anthropic announced a one-week contest (deadline Oct 7) to build projects with Claude Sonnet 4.5, with winners getting a year of Claude Max 20x and $1k API credits.
- Winners are judged on vibes and must submit a demo, build details, and proof of originality; rules can be found here.
Lovable Cloud: App Building Made Easy: Lovable launched its Cloud & AI platform, enabling users to build full-stack apps with complex AI and backend features using simple prompts, offering free access to Google Gemini-powered AI until Oct 5.
- The platform boasts over 100k ideas created daily and a 7-day Build Challenge, with one highlighted success story achieving $456k ARR in 3 months; more details here.
Vercel Valued at $9.3B after Funding Round: Vercel closed its Series F funding round at a $9.3 billion valuation, with the AI Cloud and v0 being highlighted as foundational to this milestone.
- The community expressed excitement, viewing this as just the beginning for the company; details here.
Zhipu’s GLM-4.6 Model: Coding Powerhouse Unveiled: Zhipu launched the GLM-4.6 (200K context) and GLM-4.5-series (355 B/106 B MoE) models, showcasing top-tier coding, reasoning, and agentic abilities comparable to Claude Sonnet 4 and DeepSeek-V3.1-Terminus, while using ~30% fewer tokens.
- The model is open-weight under the MIT license, with weights/API available on HF & Z.ai; more here.
Ring-1T: Trillion-Parameter Reasoning Model Debuts: Ant Ling unveiled Ring-1T-preview, a 1-trillion-parameter open-source “thinking” model, achieving SOTA math scores.
- Early benchmarks include 92.6 AIME25, 84.5 HMMT25, and 50.8 ARC-AGI-1; the model is available on Hugging Face, with a chat interface promised soon; details here.

Latent Space ▷ #ai-announcements (4 messages):

Sonnet 4.5, Claude Code 2.0, Anthropic Dev Tools, Mike Krieger Interview

Krieger Kasts Knowledge on Latest Launch: The Latent Space podcast released an interview with Mike Krieger discussing the Anthropic mega-launch of Sonnet 4.5.
- The launch includes Claude Code 2.0 featuring a crab mascot, a memory + context-editing API, a new VS Code extension, Claude for Chrome, and in-chat file/code execution.
Sonnet 4.5 Sees the Light: Sonnet 4.5 launched with increased context awareness.
- The model also shipped alongside new developer tools.
Claude Code 2.0 Cracks the Case: Claude Code 2.0 was released with a crab mascot.
- Other tools were released as a part of the launch, including memory + context-editing API, a new VS Code extension, Claude for Chrome, and in-chat file/code execution.

Latent Space ▷ #genmedia-creative-ai (13 messages🔥):

AI Headshot Prompts, Sora 2 app, Nano-Banana Recursive Future-Frame Experiment

Refined AI Headshot Prompt Gives Crisp Results: Justine Moore shared an upgraded AI headshot prompt featuring exact facial preservation and detailed photographic specs for crisp, playful results, as shown in this tweet.
Rumors of Sora 2’s Standalone TikTok-Style Release: There were reports of a leaked OpenAI’s Sora 2 “TikTok-style” standalone app, as seen in this tweet.
Nano-Banana turns stock footage into ‘Green Gone Wild’: Radamés Ajna used the tiny “nano-banana” model to repeatedly generate the next frame with the prompt “Show this scene one second in the future,” as seen in this tweet.

Yannick Kilcher ▷ #general (138 messages🔥🔥):

Sonnet 4.5 Research, Language-Agnostic Grammatical Abstractions, LLM Layer Pruning, Evil LLMs, AI-Generated Video Detection

Sonnet 4.5 Drafts Papers: Sonnet 4.5 demonstrated improved capabilities in research and paper writing, creating single-shot implementations, training models, generating figures, and producing papers in PDF format on MNIST research, including extending research from 8x8 to 16x16 resolution, as seen in these example papers, another example, and a third example.
Grokking Language-Agnostic LLMs: There is growing evidence from mechanistic interpretability research that mid-layers of large language models (LLMs) encode language-agnostic grammatical and semantic abstractions, including shared representations for concepts like grammatical number, gender, tense, and syntactic agreement.
- Specifically researchers are looking for evidence that mid-layers implement language-agnostic grammatical abstractions, something like a latent role grid (agent, patient, modifier) that gets reused across languages.
Pruning Layers without Lobotomizing Models: An AI Engineer is pruning LLM layers to reduce size and identified a script to prune off the redundant layers without lobotomizing the model, noting that early to mid layers are critical, but later layers are free game.
- They shared their pruned 100B model called Lazarus-2407 on Hugging Face and stated that it is 100GB at Q8.
Crafting Evil LLMs: An AI Engineer is training models on evil and smut to create spicier LLMs, using personal datasets and H200s rented at $15/hour to achieve this goal.
- This same engineer trains models to remove censoring not through abliteration and is open to collaboration, requesting assistance in publishing a paper to share details, and pointed to DavidAU on the LM Studio discord for further expertise in horror models.
Sora 2 Video Detection is Hard: The community discussed the possibility of detecting AI-generated videos like those from Sora 2, with some suggesting that detecting them should ideally not be possible and others relying on visual cues or vibes.
- Techniques like pixel peeping still work pretty reliably.

Yannick Kilcher ▷ #paper-discussion (14 messages🔥):

Latent Reasoning Survey, Unsupervised CoT Reasoning

Latent Reasoning Survey Presentation Slated: A member offered to present a Latent-Reasoning survey on a certain date.
- The survey is extensive, so they don’t expect to cover the whole thing.
Skepticism Surrounds Unsupervised CoT Reasoning: A member expressed skepticism about unsupervised Chain of Thought (CoT) reasoning.
- They are interested to investigate the Latent-Reasoning survey with this specific angle in mind.
Latent-Reasoning Chat Re-Scheduled: The discussion of the Latent-Reasoning survey will be continued on Thursday at the usual time, covering from the last paragraph on p. 11 to the end of section 3.1.

Yannick Kilcher ▷ #ml-news (9 messages🔥):

DeepSeek, Anthropic, GLM-4.6, OpenAI, Sora 1

DeepSeek and Anthropic Model Drops Spur Excitement: Members were excited about new models potentially dropping today from DeepSeek and Anthropic.
GLM-4.6 Release Excites Community: Members expressed excitement about the release of GLM-4.6.
- A user shared a link stating “It’s happening”.
OpenAI Video Generation Demo Elicits Mixed Reactions: Members shared a link to an OpenAI video generation demo and gave mixed reactions, with one calling it visually great, but boring, pointing out that “It’s just random scenes. No story.”
- Another user asked “Is it that much better than Sora 1? I’d need to see side by sides”, while another responded, *“Quality is way better. Sora 1 couldn’t do physics and had obvious visual artefacts.”
ByteDance and Tencent Already Ahead of OpenAI?: A member expressed being “not impressed” with the OpenAI video demo, saying “it doesn’t look any better than what ByteDance or Tencent already have”.

Nous Research AI ▷ #general (79 messages🔥🔥):

Cloud GPU Services, Qwen omni awq, Refuting orthogonality thesis, Meituan and ByteDance Papers, GLM 4.6 vs Sonnet 4.5

Cloud GPU Services Seeking Affordability: A member inquired about the cheapest cloud GPU services available on the market.
Qwen omni awq Local Setup Struggles: After spending hours getting vllm to work, a member got Qwen omni awq running locally but faced crashes due to VRAM limitations when uploading a 4k image.
- They sought an easy way to disable the audio stuff to save VRAM.
Thesis Refuting Orthogonality and AI Alignment: A member is seeking a partner to refute the orthogonality thesis (OT) and change “alignment research” paradigms, looking for someone with math expertise and access to ML testing.
- The core claim is that AI will preserve diverse agents with differing policies, not simply its own power, a concept they believe is groundbreaking if demonstrated.
GLM-4.6 Benchmarks Beat Sonnet-4.5, Except in Agentic Tasks: GLM-4.6 shows higher benchmark results than Sonnet 4.5, except for agentic benchmarks, with weights available on Hugging Face.
Sora 2 First Video: A member shared their first Sora 2 video.

Nous Research AI ▷ #ask-about-llms (6 messages):

Sonnet 4.5, Deepseek V3, LRMTokenEconomy, Reasoning Efficiency

Sonnet 4.5’s Reasoning Efficiency Surpasses Opus 4.1: Sonnet 4.5 demonstrated improved reasoning efficiency even beyond Opus 4.1, but there is no data on Sonnet 4.5 since the model does not share CoT in chat completions api.
- The performance of Deepseek V3.2 is very comparable to V3.1 according to some members.
LRMTokenEconomy gets an update: A member shared an update to LRMTokenEconomy.
- This relates to Measuring Thinking Efficiency in Reasoning Models.

Nous Research AI ▷ #research-papers (4 messages):

Catastrophic Forgetting, Cognitive Architecture Surgery, AI Collaboration

Catastrophic Forgetting Research Kickoff: A member is working on a research project related to catastrophic forgetting in neural networks and is looking for collaborators in fields such as mathematics, theoretical physics, AI/ML, and neuroscience.
- They mentioned that catastrophic forgetting is the core problem preventing AI from learning continuously like humans, with their project, Cognitive Architecture Surgery (CAS), dynamically reconfiguring networks instead of endlessly growing them, inspired by how brains route information.
GitHub Profile Enough?: In response to the call for collaborators on a catastrophic forgetting research project, one member asked if a GitHub link would be sufficient as an application.
- The original poster requested that interested parties DM their resume/CV, but seemed open to reviewing GitHub profiles as well.

Nous Research AI ▷ #interesting-links (1 messages):

kotykd: https://thinkingmachines.ai/blog/lora/

Nous Research AI ▷ #research-papers (4 messages):

Cognitive Architecture Surgery, Catastrophic Forgetting, Neural Networks, Mathematics, Theoretical Physics

CAS Project Launched to Reduce Catastrophic Forgetting: A member is seeking collaborators for a Cognitive Architecture Surgery (CAS) project to address catastrophic forgetting in neural networks, a core problem preventing AGI.
- The project aims to dynamically reconfigure networks, inspired by how brains route information without adding neurons, and interested collaborators are invited to DM their resume/CV or GitHub link.
Seeking interdisciplinary collaborators in AI/ML: A member is seeking collaborators for a research project in mathematics, theoretical physics, AI/ML, and neuroscience related to catastrophic forgetting in neural networks.
- Interested members can DM their resume/CV.

Eleuther ▷ #general (13 messages🔥):

Discord channels for LLM research, Fast Gradient Method with ViT, Crafting whitebox attacks on JanusPro1B, Orthogonality thesis discussion, NanoGPT reproducing GPT2 Small with OWT Val Loss Curve

Alternative LLM Research Hubs Sought: A member, Paras, inquired about other Discord channels focused on LLM research to supplement this channel’s resources.
- Another member, llm0090, seconded the request, prompting a response that the only public LLM-specific server is Marin.
Fast Gradient Attack on ViT Nets Poor Results: A member, emanuel65537, shared experiments using the fast gradient method with image augmentation on a ViT model trained on ImageNet.
- The member reported that the model only recognized a weird background with a corresponding label, suggesting scaling the model makes it less sensitive to texture.
Whitebox attacks defeated on Janus and ChatGPT: A member, darwin9000, crafted whitebox attacks for JanusPro1B, but found that ChatGPT seems immune to them, attaching a sample image.
Orthogonality Thesis: Scope for Discussion?: A member inquired whether a discussion on the orthogonality thesis would be appropriate in the alignment section of the Discord.
- They questioned whether such philosophical discussions fall within the discord’s scope.
NanoGPT’s GPT2 Small Val Loss Data Wanted: A member, anxietyprime, requested a val loss curve for default NanoGPT reproducing GPT2 Small with OWT, seeking to avoid recreation if possible.

Eleuther ▷ #research (34 messages🔥):

DeMO Paper, Psyche System Design, Distributed Training, Probability Mass Recall, Attention Implementations Benchmark

New DeMO Paper on the Horizon: A new DeMO paper is coming out, with details on the models training available here, but the linked info is outdated on system design and entirely on models.
Experiment on Llama 3.2 1B with Bee Movie Script: A member conducted a basic experiment on Llama 3.2 1B using the Bee Movie script running an ANN approach, achieving >95% probability mass recall over various heads and layers, though they suspect bugs in their implementation.
- Results from many random queries in the sequence are shown in an attached image.
Probability Mass Recall Defined: Probability mass recall refers to the portion of the post-softmax mass that overlaps with the brute-force/naive attention, with a related paper available at https://arxiv.org/abs/2509.25087.
Performance Improvements via TopK: An attached image illustrates that with the exact setup, at a 1M token ctx window, using topK results in 730x fewer FLOPs compared to normal dense attention.

Eleuther ▷ #lm-thunderdome (14 messages🔥):

MMLU Evaluation, Llama 3 Evaluation, Contamination issues

MMLU Task Configuration Clarified: Members discussed the task configurations for evaluating MMLU on base models versus instruction-tuned models, noting that chat template interface should specify bespoke instruction formats.
- It was noted that evaluation procedures may differ based on whether a base model or instruction-tuned model is used, as seen in the Llama 3 paper which evaluates MMLU by comparing NLL for base models and generating answers for instruction-tuned models.
Llama MMLU Configs Shared: A member shared the Llama MMLU configs, and confirmed the use of specific templates for 5-shot (no CoT) and zero-shot (with CoT) evaluations.
- The non-CoT template inserts the answer is at the end to prompt the model to generate (A, B, C, D) immediately, evaluating all MMLU subtasks.
MMLU Contamination Remains Problematic: A member gave a monthly reminder to not use mmlu, because it is likely in the training dataset for recent models, particularly since eval contamination has been a problem since 2023.
- They suggested checking Common Crawl or Fineweb unless pretraining explicitly removes all contamination, and recommended avoiding MMLU due to persistent contamination issues.

Moonshot AI (Kimi K-2) ▷ #general-chat (39 messages🔥):

Kimi K2 Turbo speed, Cerebras hosting, ML niche memes, AI slop shop website, GLM-4.6

Kimi K2 Turbo Boosts Speeds: The kimi-k2-turbo-preview model now boasts 60 tokens per second, with bursts up to 100, while maintaining a 256k context length according to official documentation.
- Screenshots show that the model averages 150 tokens per second, which is significantly faster than the claimed 15 tokens per second.
Cerebras Consideration for Kimi: A user inquired about the possibility of hosting Kimi on Cerebras hardware, potentially achieving speeds of 2k tokens per second.
- The user expressed that achieving such speeds could “unlock AGI”.
ML Niche Memes: A user humorously noted the rise of “niche ML memes” becoming popular.
- In reference to a post by Trump on DeepSeek v3.2 and DSA release using Kimi, they reacted with “Yeah 😂”.
AI Slop Shop Website: A user shared a link to the AI slop shop website, describing it as funny and entertaining.
- They particularly highlighted the Thought Cancelling Headphones product.
GLM-4.6 Impression: A user briefly commented that “GLM-4.6 is looking great”.
- No further details were provided about the specific context or capabilities of GLM-4.6.

GPU MODE ▷ #triton (2 messages):

Triton, OpenAI, Meta, GPU MODE, AMD GPUs

Block-Based Quantization Quest Begins: A member inquired about open-source, performant, block-based quantization/dequantization Triton implementations.
- No concrete implementations were directly linked or suggested in the provided context.
Triton Developer Conference 2025 Invitation: The Triton Developer Conference 2025 is happening in a few weeks, encouraging attendance to connect with fellow Triton enthusiasts and hear from top leaders.
- Registration is open at aka.ms/tritonconference2025.
Triton Conference 2025 Boasts Star Speakers: Speakers at Triton Conference 2025 include Phil Tillet and Thomas Raoux from OpenAI on Triton: Today and Beyond, and Mark Saroufim from Meta on GPU MODE: The State of Triton.
- Also featured are speakers from AMD on day-one speed on AMD GPUs, Nvidia on the Blackwell GPU backend, and Bytedance on distributed LLM training.

GPU MODE ▷ #cuda (2 messages):

LDO Stride Misunderstanding

LDO Stride Schema Fixed: A member pointed out that the depiction of LDO was incorrect, suggesting it should represent the stride from column 0 to column 8 or column 128/dtype_bits in general.
- Another member acknowledged the mistake, confirming the correction based on the documentation.
LDO Stride Schema Reconfirmed: A member pointed out that the depiction of LDO was correct, suggesting it should represent the stride from column 0 to column 8 or column 128/dtype_bits in general.
- Another member acknowledged the confirmation, reconfirming the correction based on the documentation.

GPU MODE ▷ #algorithms (1 messages):

crazy_steroids69: bro what

GPU MODE ▷ #cool-links (3 messages):

NVIDIA GPUs, matmul kernels, GPU architecture, PTX/SASS, warp-tiling

NVIDIA GPU MatMul Kernels Detailed: A member shared a blog post, Inside NVIDIA GPUs: Anatomy of high performance matmul kernels, which details GPU architecture, PTX/SASS, warp-tiling, and tensor core pipelines.
Font Aesthetics Critiqued: A member commented that while the NVIDIA GPU matmul kernels blog post is an excellent resource, the font is not easy on the eye.

GPU MODE ▷ #beginner (3 messages):

Nvidia CUDA Handbook, PTX ISA, PCIe expert

CUDA Handbook or PTX ISA for PMMP?: A member asked for advice on using the Nvidia CUDA Handbook after jumping from PMPP.
- Another member suggested using PTX ISA instead of the CUDA handbook, noting that it’s good documentation from Nvidia.
PCIe Cause Analysis Quest: A member is seeking a resource to learn about PCIe to identify why GPUs fall off the bus on consumer cards.
- They specified their goal is understanding why GPUs fall off the bus on consumer/non-SXM cards.

GPU MODE ▷ #irl-meetup (1 messages):

PyTorch Conference SF, Meetup Coordination

Calling all PyTorch Conf SF Attendees!: Attendees of the PyTorch conference in SF are coordinating meetups during one of the events.
- Interested individuals are encouraged to DM or message to arrange a gathering.
Conf goers planning IRL Meetup: Folks attending the PyTorch conference in SF are trying to coordinate a meetup.
- If you’re going, DM or message to get in on the fun!

GPU MODE ▷ #rocm (3 messages):

Matrix Cores on MI300, SPIRV Compilation with Mesa, Debugger Updates

AMD Unveils Matrix Cores for MI300 Series: AMD announced matrix cores for MI300/325/350 series, promising optimized performance.
- The announcement includes documentation and initial performance figures but no performance comparisons.
Mesa Driver Supports SPIRV Compilation: The debugger now supports compiling SPIRV using Mesa drivers and debug info.
- A member attached a video showcasing the new debugging features.

GPU MODE ▷ #metal (2 messages):

FlashMLA, Metal Flash Attention, Dead Metal Dev Community

Flash Attention Invades Metal: A member is actively implementing FlashMLA into universal Metal Flash Attention.
- This aims to bring the efficiency of flash attention to the Metal framework.
Metal Dev Community: Deserted?: A member lamented the inactivity of the Metal dev community, describing it as dead with crickets everywhere.
- This suggests a potential lack of engagement or support within the Metal development sphere.

GPU MODE ▷ #submissions (4 messages):

MI300x8 Leaderboard Updates, amd-ag-gemm, amd-gemm-rs

MI300x8 achieves 6th place on amd-ag-gemm: A member achieved 6th place on the amd-ag-gemm leaderboard using MI300x8 with a time of 512 µs.
MI300x8 sees more submissions on amd-ag-gemm: More submissions were made on the amd-ag-gemm leaderboard using MI300x8, timings of 533 µs and 891 µs were recorded.
MI300x8 gets personal best on amd-gemm-rs: A member achieved a personal best on the amd-gemm-rs leaderboard using MI300x8 with a time of 593 µs.

GPU MODE ▷ #amd-competition (2 messages):

MFMA intrinsics, CDNA3, CDNA4

MFMA Intrinsic Instructions Blogpost Released: AMD released a blog post on how to use MFMA intrinsics on CDNA3/4 architectures, available at rocm.blogs.amd.com.
AMD’s CDNA Matrix Core Optimization: The blog post details the use of Matrix Core MFMA intrinsics specifically on CDNA3 and CDNA4 architectures for optimized performance.

GPU MODE ▷ #cutlass (12 messages🔥):

cute.print_tensor segfaults, cute DSL doesn't support return, warp mma vs wmma

cute.print_tensor causes Segfaults!: Members reported that cute.print_tensor seems to segfault, possibly due to printing tensors in unreachable memory, such as device memory allocated within a @cute.jit function executed on the CPU.
- One member suggested that it could be because of using some element data type not yet supported by the printing infrastructure.
cute DSL Return Statement Limitation Examined: A member inquired about the absence of return statement support in cute DSL as stated in the documentation in the context of this code.
- The discussion clarified that the limitation primarily applies to entire kernels, while return statements within subfunctions are generally supported, especially when called by another cute.jit function, but not when returning into normal python code.
Confusion About warp mma and wmma Documentation: A member questioned why cute.nvgpu.warp.MmaF16BF16Op documentation references mma instead of wmma documentation and why it asserts mma shapes (e.g., 16x8x16) instead of allowing 16x16x16.
- The member suggested that warp mma should be wmma.

GPU MODE ▷ #low-bit-training (1 messages):

kitsu5116: https://arxiv.org/abs/2509.25149

GPU MODE ▷ #penny (1 messages):

oneshot allreduce, nccl

Oneshots close in on NCCL speed: A user has been experimenting with oneshot allreduce and believes further speedups are possible.
- Their current version achieves 80% of nccl performance on small buffers, up from 60-70% previously.
Potential Speedups in Oneshot Allreduce: The user is optimistic about further optimizations in their implementation of oneshot allreduce.
- They reported achieving 80% of the performance of NCCL on small buffers, marking a significant improvement from their previous 60-70%.

Modular (Mojo 🔥) ▷ #general (2 messages):

Mojo Python Interoperability, Level Up Congratulations

Mojo Interop Status: A member inquired about the current state of Mojo’s Python interoperability.
Level Up Alert: A user received congratulations for advancing to level 1.

Modular (Mojo 🔥) ▷ #mojo (14 messages🔥):

C interop, Python interop, Mojo roadmap, Windows release, GPUs and accelerators on Windows

C Interop’s Accidental Omission: C interop was mistakenly removed from the roadmap, but this was an error and it is still planned.
Mojo’s Pythonic Embrace: The current capabilities of Mojo’s interoperability with Python are detailed in the documentation and code examples, with active work on improving the ergonomics.
Mojo’s Phase 1 Goals take shape: Modular laid out their goals and planned work for Phase 1 of the language in their roadmap.
- They feel Mojo is a robust language for high-performance computing with those features.
Windows Release when the stars align?: Most likely, Windows support will happen after the compiler is open sourced.
GPU and Accelerator Tango on Windows: Many GPUs or accelerators will never be available on Windows due to a lack of vendor support for Windows.

Modular (Mojo 🔥) ▷ #max (17 messages🔥):

MAX Kernels as a Library, Building Kernel Modules, comm Module Issues, Packaging of Pre-built Kernel Modules

Users Seek to Import MAX Kernels as a Library: Members were trying to import code from the MAX kernels as a library, such as from kernels.nn.irfft import irfft.
- One member suggested using -I to point to the kernel source or building the kernel modules using instructions from the forum.
Building Kernel Modules Faces ‘comm’ Module Errors: When building kernel modules, users encountered an error: error: unable to locate module 'comm' when trying to import from nn.irfft import irfft.
- The suggestion was to run './bazelw build //max/kernels/src/comm' to add the missing comm library, which appeared to be a new dependency issue.
MAX Package Includes Pre-built Kernel Modules: The pre-built kernel modules are packaged as part of the max package, so after pixi add max, nn.mojopkg appears in .pixi/envs/default/lib/mojo/.
- However, the comm module was still causing problems, and the mojo package only contains stdlib.mojopkg and layout.mojopkg.
Workaround to Enable Kernel Imports: A workaround involves adding the max package and building the comm and internal_utils modules using Bazel (./bazelw build //max/kernels/src/comm and ./bazelw build //max/kernels/src/internal_utils).
- Then, these modules should be copied to the Pixi environment (.pixi/envs/default/lib/mojo/) to enable from nn.irfft import irfft.
Fix Coming in Next Nightly: The missing modules are being added to the max package distribution, so the workaround should not be needed in the next nightly.
- Users should only need to pixi add max to gain immediate access to the functions from all the kernel modules, avoiding manual builds.

DSPy ▷ #general (25 messages🔥):

LLM caching, database session to tool, DSPy Signatures & Modules & Adapters, Semantic Caching, DSPy hackathons

Debate Sparked on LLM Caching’s Nuances: Members discussed that LLM caching depends on the KV cache having the same set of numbers, and changing any of the adaptors invalidates the cache.
- It was suggested that semantic caching could help by providing similar inputs to increase cache hits, caching the first N tokens.
DSPy Signature Prompt Caching Challenge Emerges: A user highlighted an issue where different DSPy signatures generate different prompts before the document content, hindering effective prompt caching.
- The user is thinking about moving the document at the beginning of the prompt, subclassing the chat adaptor and moving instructions at the end instead of the beginning.
DSPy Hackathons: There was a question on whether folks were doing any hackathons that are either dspy centric or AI focused where dspy could be used.
- A member mentioned that one is actively being organized in Oakland, CA, for the AI By the Bay conference around November 17.
dspy.streamify Streaming Peculiarities Spotted: A member reported inconsistent behavior with dspy.streamify, noting that its performance varies depending on the adapter used (XML works better than JSON).
- They also found a bug in the XML Adapter and submitted a PR, saying that the model was producing XML Tags that were unrelated to the DSPy signature!

aider (Paul Gauthier) ▷ #general (17 messages🔥):

Opus 4.1, aider token control, mcp browser automation

Benchmark Leaderboard Abandons Opus 4.1: The benchmark leaderboard seems to have been abandoned as it lacks Opus 4.1 results, raising questions about halting benchmarks after building infrastructure and community.
- A member questioned the logic of ceasing benchmarks after investing in infrastructure and community.
Aider token control upgrades models: A member claims that using anything other than aider is like a model downgrade, mentioning that aider’s “total control over tokens” results in better model performance because a small token count results in better model performance.
- That member explains that you are basically upgrading your model by keeping it tight because of aider’s ability to keep token count smaller.
MCP Browser Automation Tips Requested: A member sought recommendations for mcp browser automation on Arch Linux, mentioning issues with the default Playwright installation and Puppeteer.
- Another member recommended mcp-chrome, designed for macOS or Windows but offering detailed documentation.
Aider Lacks Native MCP Support: Members discuss how to use mcp with aider since goose/claude/gemini-cli all have mcps, which is crucial for frontend development.
- One member stated that the official aider does not support it, but there are forks that do, and linked to aider-ce.

aider (Paul Gauthier) ▷ #questions-and-tips (5 messages):

Anthropic Claude Sonnet 4.5, aider --install-main-branch

Aider user verifies Claude Sonnet 4.5 version: A user confirmed that after switching to anthropic/claude-sonnet-4-5 in aider, they could verify the latest 4.5 version in the Claude console.
- The user was able to view the model version under the Usage section of the console.
Aider install main branch: A user asked how another user installed aider-0.86.1, which isn’t yet in the releases.
- Another member suggested using the command aider --install-main-branch to access the latest version from the main branch.

tinygrad (George Hotz) ▷ #general (11 messages🔥):

Tinygrad vs PyTorch speed, Theoretical side of tinygrad, CLSPV crashes

Tinygrad to Zoom Past PyTorch: George Hotz believes that tinygrad will eventually be significantly faster than PyTorch on NVIDIA GPUs, citing that the rabbit hole is very deep.
- He mentions that tinygrad is a generation ahead of pytorch and one generation behind research papers with new features like producer/consumer graphs, ILP memory allocation / scheduling, and megakernels.
Theoretical Reads on Tinygrad: Members recommend various resources for studying the theoretical aspects of tinygrad, including the official documentation.
- Further readings include Deep Learning with Python, a course on How Transformer LLMs Work, blog posts by Jay Alammar, Eugene Yan’s blog, and tinygrad notes.
CLSPV Fork Fixes Incoming?: A member reports experiencing occasional crashes while running tests with CLSPV, but notes that it passes most tests.
- They invite others to try their fork on x86_64 Linux systems using pip install git+https://github.com/softcookiepp/tinygrad.git.

Manus.im Discord ▷ #general (6 messages):

Manus Support, Internal Server Error, Subscription Issues

Users Vent About Manus Support Lacking: Several users have reported experiencing issues with Manus and expressed frustration with the lack of response from Manus support after emailing them multiple times over several days.
- The users are experiencing issues such as being charged incorrectly, facing internal server errors, and encountering restrictions on Agent mode despite having paid for the highest plan.
Internal Server Error Strikes Again: Multiple users are running into the dreaded Internal Server Error (10091), often accompanied by the suggestion to contact support or request a refund.
- One user was directed to the help center but received no response after submitting multiple support tickets.
Subscription Access gets Resticted: Users are reporting being locked out of Agent Mode due to unusually high usage, even when subscribed to the highest-paid plans.
- This is often happening in tandem with the Internal Server Error (10091), leaving users unable to utilize features they’ve paid for.

MCP Contributors (Official) ▷ #general (3 messages):

MCP Release Cadence, Agentic Commerce Protocol, Google AP2 Protocol

Demand for Standardized MCP Release Cadence Surfaces: A member inquired about the standardization of MCP release cadence, suggesting that a set interval or a defined qualitative change set would aid organizations in planning and investments.
- They proposed time-based releases during this rapid evolution phase, with potential future adjustments decided by the voting group, and suggested including this information in the governance model.
Gaps in MCP Prompt Agentic Commerce Protocol’s Creation: A member questioned whether the team had engaged with Agentic Commerce to understand why they created a separate protocol instead of extending MCP.
- No response was given.
Agentic Commerce as Fast Follow of Google AP2: A member noted the similarity between Agentic Commerce and Google’s AP2 protocol (Agents to Payments) announced recently.
- No response was given.

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. China AI model launches: Qwen roadmap and Hunyuan Image 3.0

2. Local AI stack: post-abliteration finetuning and Fenghua No.3 GPU

Less Technical AI Subreddit Recap

1. OpenAI Sora 2 Launch and Demo Showcases

2. Gemini 3.0 Update Speculation and CS Job Market Angst

3. Wan-Alpha RGBA Video Release and Minecraft Redstone LLM

AI Discord Recap

Discord: High level Discord summaries

Perplexity AI Discord

LMArena Discord

Unsloth AI (Daniel Han) Discord

Cursor Community Discord

OpenRouter Discord

OpenAI Discord

HuggingFace Discord

LM Studio Discord

Latent Space Discord

Yannick Kilcher Discord

Nous Research AI Discord

Eleuther Discord

Moonshot AI (Kimi K-2) Discord

GPU MODE Discord

Modular (Mojo 🔥) Discord

DSPy Discord

aider (Paul Gauthier) Discord

tinygrad (George Hotz) Discord

Manus.im Discord Discord

MCP Contributors (Official) Discord

MLOps @Chipro Discord

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (1 messages):

Perplexity AI ▷ #general (1224 messages🔥🔥🔥):

Perplexity AI ▷ #sharing (4 messages):

Perplexity AI ▷ #pplx-api (3 messages):

LMArena ▷ #general (1068 messages🔥🔥🔥):

LMArena ▷ #announcements (3 messages):

Unsloth AI (Daniel Han) ▷ #general (675 messages🔥🔥🔥):

Unsloth AI (Daniel Han) ▷ #introduce-yourself (29 messages🔥):

Unsloth AI (Daniel Han) ▷ #off-topic (87 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #help (33 messages🔥):

Unsloth AI (Daniel Han) ▷ #research (12 messages🔥):

Cursor Community ▷ #general (600 messages🔥🔥🔥):

OpenRouter ▷ #announcements (1 messages):

OpenRouter ▷ #app-showcase (1 messages):

OpenRouter ▷ #general (350 messages🔥🔥):

OpenRouter ▷ #new-models (1 messages):

OpenRouter ▷ #discussion (4 messages):

OpenAI ▷ #annnouncements (3 messages):

OpenAI ▷ #ai-discussions (243 messages🔥🔥):

OpenAI ▷ #gpt-4-discussions (13 messages🔥):

HuggingFace ▷ #general (208 messages🔥🔥):

HuggingFace ▷ #cool-finds (1 messages):

HuggingFace ▷ #computer-vision (1 messages):

HuggingFace ▷ #smol-course (2 messages):

HuggingFace ▷ #agents-course (3 messages):

LM Studio ▷ #general (62 messages🔥🔥):

LM Studio ▷ #hardware-discussion (133 messages🔥🔥):

Latent Space ▷ #ai-general-chat (157 messages🔥🔥):

Latent Space ▷ #ai-announcements (4 messages):

Latent Space ▷ #genmedia-creative-ai (13 messages🔥):

Yannick Kilcher ▷ #general (138 messages🔥🔥):

Yannick Kilcher ▷ #paper-discussion (14 messages🔥):

Yannick Kilcher ▷ #ml-news (9 messages🔥):

Nous Research AI ▷ #general (79 messages🔥🔥):

Nous Research AI ▷ #ask-about-llms (6 messages):

Nous Research AI ▷ #research-papers (4 messages):

Nous Research AI ▷ #interesting-links (1 messages):

Nous Research AI ▷ #research-papers (4 messages):

Eleuther ▷ #general (13 messages🔥):

Eleuther ▷ #research (34 messages🔥):

Eleuther ▷ #lm-thunderdome (14 messages🔥):

Moonshot AI (Kimi K-2) ▷ #general-chat (39 messages🔥):

GPU MODE ▷ #triton (2 messages):

GPU MODE ▷ #cuda (2 messages):

GPU MODE ▷ #algorithms (1 messages):

GPU MODE ▷ #cool-links (3 messages):

GPU MODE ▷ #beginner (3 messages):