A popular benchmark fixes itself.

AI News for 11/6/2025-11/7/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (200 channels, and 5178 messages) for you. Estimated reading time saved (at 200wpm): 432 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Since Terminal-Bench launched earlier this year (before Claude Code!), it has vaulted into the upper echelon of coding agent benchmarks, eg cited by Claude 4.5 and yesterday's Kimi K2 Thinking. There were some problems with tasks on TBench being too easy/impossible, so they went ahead and fixed the glitch (blog). They are also rewriting the bench to be easily run in cloud containers with the new Harbor framework, and also hosted a launch party with Q&A, recorded and now live on Latent Space:

A presenter discusses Terminal-Bench 2.0 at a technical conference, standing at a podium with presentation slides about the AI benchmark.

AI Twitter Recap

Moonshot AI’s Kimi K2 Thinking: 1T INT4 open-weights reasoning model, agentic SOTA, and real-world deployment notes

Model + numbers (open weights): Moonshot’s text-only reasoning model “Kimi K2 Thinking” ships as a 1T parameter MoE with ~32B active parameters, released natively in INT4 via quantization-aware post-training, with a 256K context window and a modified MIT license. Independent analysis places it at an Artificial Analysis Intelligence Index score of 67 (the new open-weights leader), with standout agentic performance and strong coding relative to other open-weights models, but behind top proprietary systems in coding tasks. It is also extremely verbose: ~140M tokens to run the full AA eval suite, with measured throughput ~8 tok/s (base) and ~50 tok/s (turbo), and pricing quoted at $0.6/$2.5 per input/output million tokens (base) and $1.15/$8 (turbo) (Artificial Analysis; individual results). On SimpleBench, K2 Thinking improves K2 from 26.3% to 39.6% (rank #3 among open weights) (@scaling01).
Agentic performance claims and INT4 “flex”: Community results and commentary highlight exceptional tool-use and long-horizon agent behavior, with claims K2 Thinking is competitive with most frontier proprietary models on complex agentic benchmarks despite running in INT4; multiple observers emphasized the “all benchmarks reported in INT4” flex (commentary, “open-weights SOTA” note, broader view). Cost-to-train estimates circulating in the press peg K2’s reasoning variant at ~$4.6M (unconfirmed; reported by CNBC), which, if accurate, would be a disruptive datapoint on training economics (@dbreunig).
Inference portability and performance: The model runs in native INT4 on consumer Apple silicon. On 2× M3 Ultra, K2 Thinking generated ~3,500 tokens at ~15 tok/s using MLX pipeline parallelism (demo included generating a working Space Invaders game). Includes concrete MLX commands and PR in mlx-lm (@awnihannun, cmd, follow-up). K2 Thinking is already broadly available: trending on Hugging Face, added to Ollama Cloud and library, integrated into the slime framework (256× H20 141GB config) via SGLang RL team, and available from multiple inference providers (HF trend, Ollama, slime, Infra assist).
Serving gotchas (network > GPU, sometimes): Moonshot reported generation bottlenecks traced to IP bandwidth (fixed) rather than GPU count—a reminder to profile network constraints during LLM serving before scaling accelerators (@Kimi_Moonshot; reinforcement from @crystalsssup). K2 Thinking also launched on Product Hunt (@crystalsssup) with a “Kimi K2 Thinking: cloud” preset in Ollama and deployment guidance proliferating across the ecosystem.

Scaling RL for LLM agents: DreamGym and agent instrumentation

DreamGym: synthetic environments via “experience models”: DreamGym replaces slow, brittle real-world rollouts with reasoning-grounded synthetic experience: an environment model distills interface dynamics from offline trajectories, proposes challenging-but-feasible tasks, and interacts with the agent to create fresh online experiences for RL. Ablations show reasoning traces, replay grounding, and curriculum are each necessary for stable transitions, factuality, and continuous improvement. Results: strong gains vs. non-RL-ready envs and better warm-start for sim-to-real RL across model families (thread, ablation, takeaways+paper).
Agent benchmarks + orchestration patterns: Terminal-Bench 2.0 adds stricter verification, with Harbor for sandboxed agent rollouts at scale (@alexgshaw). GitHub’s “Copilot Orchestra” pattern formalizes a multi-agent, test-driven dev loop (plan → implement → review → commit) with full prompts open-sourced (pattern, prompts). Dr. MAMR addresses “lazy-agent” collapse in two-agent reasoning systems via a Shapley-style causal influence metric and restart/deliberation actions—useful for attributing per-turn credit and recovering from bad sub-trajectories (overview).

Video “supersensing” and fast tracking: Cambrian-S and EdgeTAM

Cambrian-S (spatial supersensing): A position paper, dataset (VSI‑590K), benchmark, and open models exploring spatial cognition in video. Key ideas: learning to anticipate and organize sensory input via an internal predictive world model; “surprise”-driven memory management and segmentation using a latent frame prediction head; up to 30% gains over base MLLMs on spatial reasoning; even small models perform strongly; code in JAX and PyTorch/XLA, plus two companion studies (benchmark bias and simulator lessons) (announce, project+credits, data/models, predictive sensing).
EdgeTAM (Apache‑2.0) in transformers: Meta’s real-time segment tracker is now a drop-in replacement for SAM2, ~22× faster with mobile performance (16 FPS on iPhone 15 Pro Max, no quantization), supporting point and bbox prompts (intro, checkpoint/demo). This is a pragmatic win for on-device tracking workloads.

Evaluation and interpretability: long-context aggregation remains hard; model diffing and curvature-based editing

Long-context information aggregation: Oolong tests simple-to-verify aggregation over long, information-dense inputs; no model surpasses 50% at 128K context—indicating that “precisely aggregating lots of information” is still unsolved despite larger windows (@abertsch72, context).
Mechanistic interpretability and model diffing: Neel Nanda’s discussion highlights “model diffing” to understand what changes during fine-tuning and sparse autoencoder-based crosscoders to find/fix flaws (video, follow-up). Separately, a curvature-based approach (“Goodfire”) decomposes memorized vs. generalized structures via loss sharpness and edits weights to suppress memorization—formalizing the “spiky singularities” intuition in weight space (summary).

Systems and inference: kernels, frameworks, and deployment practices

Framework momentum and kernels: vLLM vs. SGLang is the current “real AGI competition,” reflecting how inference stacks define capability envelopes in practice (comment). Tencent’s Hunyuan-image 3.0 ships an official vLLM-based implementation, aligning with vLLM’s omni-modality direction (@rogerw0108). Triton/NV kernels continue to push memory bandwidth: an NVFP4 quant kernel reports 3.2 TB/s and 33 µs runtime (kernel notes). Mistral shared P/D disaggregation learnings for vLLM deployments (resource optimization under production load) (talk ref).
Throughput, hardware, and cloud control planes: A single H200 node can be sufficient for meaningful serving in some configs (@vllm_project). Cerebras-backed GLM‑4.6 hits ~1000 tok/s in the Cline IDE/CLI path (@cline). SkyPilot simplifies multi-cluster, multi-cloud GPU ops across Slurm/KubeRay/Kueue (@skypilot_org). Also notable: OpenAI Codex capacity and rate-limit improvements (mini variant and priority paths) for higher sustained usage in dev workflows (@OpenAIDevs).

Policy and industry context

Compute, supply chains, and governance: Sam Altman clarified the ask is not loan guarantees to OpenAI but broader US reindustrialization—domestic supply chain/manufacturing across fabs, transformers, steel, etc. as national policy aligned with government priorities; distinct from a “bailout” framing (@sama). Separately, a recurring theme: compute will be a national strategic asset, with calls to subsidize “open AI” (ecosystem) over any single company (@jachiam0, @hardmaru). Mustafa Suleyman reiterates a design principle for labs: AI under human control with serious guardrails before superintelligence outpaces oversight (@mustafasuleyman).

Top tweets (by engagement)

Kimi K2 Thinking runs natively in INT4 on 2× M3 Ultra using MLX, ~3,500 tokens at ~15 tok/s (Awni Hannun).
“It’s SOTA, not only open-weights SOTA” (Kimi K2 Thinking) (@crystalsssup).
xAI’s GROK‑4‑fast saw major prompt-injection/system-prompt robustness gains after updates (@xlr8harder).
OpenAI Devs: Codex capacity upgrades (mini model, increased rate limits, priority processing) (@OpenAIDevs).
Moonshot AI: fixed K2 token speed by removing IP bandwidth bottleneck (not GPU) (@Kimi_Moonshot).

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Kimi Model Launch and Performance

World's strongest agentic model is now open source (Activity: 1795): The image is a bar chart from the τ²-Bench Telecom benchmark, which evaluates models based on their agentic tool use capabilities. The chart highlights the performance of various models, with "Kimi: K2 Thinking" leading at a 93% score, surpassing "GPT-5 Codex (high)" and "MiniMax-M2" both at 87%. This suggests that the Kimi K2 model, now open source, excels in agentic tasks, which involve autonomous decision-making and tool use, marking a significant achievement in open-weight model development. A commenter noted that while the Kimi K2 model took longer to solve a problem compared to GPT-5, it was successful, highlighting its capability despite the slower performance.
- Guardian-Spirit highlights the capabilities of the Kimi K2 model, noting that it was the first open-weight model to solve a complex problem, albeit taking longer than GPT-5. This suggests that while Kimi K2 may not be as fast as GPT-5, it is still capable of solving intricate tasks, emphasizing its potential in open-source AI development.
- Fresh-Soft-9303 points out the strategic implications of open-source AI, referencing Nvidia's CEO's comments on China's advancements in AI. The comment underscores the significance of the model being free, which could democratize access to advanced AI technologies and potentially shift the competitive landscape in AI development.
Kimi 2 is the #1 creative writing AI right now. better than sonnet 4.5 (Activity: 631): Kimi 2 is being praised as the leading creative writing AI, surpassing Sonnet 4.5 in performance and cost-effectiveness. The post suggests that Kimi 2, an open model, offers strong understanding capabilities, potentially extending to coding, and poses a competitive threat to major companies like OpenAI and Anthropic. The discussion highlights the rapid advancement of AI models, particularly from Chinese companies, and suggests that locally-run LLMs may soon outperform current top models, pressuring large companies to innovate or reduce prices significantly. Some commenters express skepticism about the post's authenticity, suggesting it may be part of a trend of overhyping Chinese models. Others question Kimi 2's long-form writing capabilities, noting past issues with producing coherent extended narratives compared to Sonnet.
- A user expressed skepticism about the hype surrounding new AI models, particularly those from China, suggesting that initial excitement often fades when models like GLM 4.6 are compared to others like Claude 4.5. They also noted a pattern where models such as OSS 20 and OSS 120b were initially underrated but later recognized for their quality, implying a need for more substantial evaluations rather than hype-driven discussions.
- Another commenter questioned the long-form writing capabilities of Kimi 2, comparing it to Sonnet, which they found superior in handling extended narratives. They described a previous experience with Kimi 2 as producing 'convoluted badly formatted semi-poetry' and expressed hope that recent updates might have improved its performance in maintaining narrative coherence over long texts.
- A user praised Kimi 2 for its creativity and minimal censorship, claiming it rivals professional human writers in generating original ideas. They highlighted its ability to produce content that surprises even experienced users, suggesting it stands out among proprietary models for its innovative output.

2. Moonshot AI AMA Announcement

AMA Announcement: Moonshot AI, The Opensource Frontier Lab Behind Kimi K2 Thinking SoTA Model (Monday, 8AM-11AM PST) (Activity: 327): The image is a promotional announcement for an upcoming 'Ask Me Anything' (AMA) session with Moonshot AI, the open-source lab behind the Kimi K2 model, which is noted for its state-of-the-art (SoTA) thinking capabilities. The AMA is scheduled to take place on November 10th, from 8-11 AM PST, on the subreddit r/LocalLLaMA. The image features a stylized illustration of a llama and a blue circle with eyes, set against a digital-themed background, reflecting a modern and tech-inspired aesthetic. This event is likely to provide insights into the development and capabilities of the Kimi K2 model, which is part of the open-source AI community's efforts to advance AI technology. The comments reflect anticipation and excitement for the AMA, indicating community interest in learning more about the Kimi K2 model and Moonshot AI's work.
30 days to become AI engineer (Activity: 684): The post discusses a transition from a 12-year career in cybersecurity to a Staff AI Engineer role, with a focus on becoming production-ready in 30 days. The key areas of focus are context engineering, Retrieval-Augmented Generation (RAG), and developing reliable AI agents. The user seeks advice on essential resources, habits, and potential pitfalls to prioritize during this intensive learning period. Comments suggest skepticism about the feasibility of becoming a proficient AI engineer in 30 days, highlighting the complexity of the role which involves full-stack development, API management, GitOps, DevOps, architecture, and design. There is also a suggestion that the title 'AI Engineer' may not be well-defined.
- Icy_Foundation3534 highlights the complexity of deploying LLM systems, emphasizing that it requires a comprehensive skill set including full stack development, API management, GitOps, DevOps, architecture, design, and implementation. The comment suggests skepticism about achieving proficiency in these areas within 30 days, indicating that such a transition is highly ambitious and challenging.
- pnwhiker10 provides a practical roadmap for transitioning into AI engineering, focusing on building an end-to-end use case from day one. Key steps include ensuring model consistency with fixed templates, maintaining a small 'golden' test set for continuous evaluation, and implementing a simple retrieval system for document indexing. The advice also stresses the importance of logging, security basics, and using tools like Claude or GPT for learning, rather than traditional books.
- The discussion reflects skepticism about transitioning to a 'Staff AI Engineer' role from a non-ML background in a short time frame. The comments suggest that such a role requires deep expertise and experience, which may not be feasible to acquire quickly, especially for someone coming from a different field like cybersecurity.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. AI Consciousness Debate and Developments

3 years ago, Google fired Blake Lemoine for suggesting AI had become conscious. Today, they are summoning the world's top consciousness experts to debate the topic. (Activity: 1365): The image of the Google building is used to highlight the irony and evolution in Google's stance on AI consciousness. Three years ago, Google dismissed Blake Lemoine for suggesting that AI had achieved consciousness, a claim that was widely criticized and dismissed as misunderstanding the capabilities of AI models like LLMs. Now, Google is engaging with top consciousness experts to explore the topic, indicating a shift towards more serious consideration of AI consciousness, possibly due to advancements in AI technology and its implications. Commenters express skepticism about the notion of AI consciousness, comparing past misconceptions about technology to current debates. They highlight the misunderstanding of AI's capabilities, such as mistaking conversational AI for true consciousness.
Can you trust anything Sam Altman says? (Activity: 609): The post questions the trustworthiness of Sam Altman, CEO of OpenAI, highlighting skepticism towards statements made by CEOs in general. The discussion reflects on Altman's own advice not to trust him, creating a paradoxical situation where trusting his advice means not trusting him, and vice versa. This reflects broader concerns about transparency and reliability in leadership within the tech industry. The comments express a general skepticism towards CEOs, with a humorous take on the paradox of trusting Altman's advice not to trust him. This reflects a broader sentiment of distrust towards corporate leadership in tech.
- trollsmurf highlights that Sam Altman's role as CEO of OpenAI involves navigating a complex business and societal landscape, rather than just focusing on AI technology. The comment suggests that Altman's actions are strategically aimed at ensuring OpenAI's commercial success and creating dependencies that secure long-term funding. This approach may disadvantage investors looking to diversify, as they might be tied to OpenAI's fate.
- The discussion touches on the strategic maneuvers by OpenAI under Sam Altman's leadership, emphasizing the importance of positioning the company to attract significant investment and power. This involves creating a network of dependencies, ensuring that if OpenAI faces challenges, its partners and investors are also impacted, thereby securing continued support and funding.

2. AI Design and Production Innovations

XPENG IRON gynoid to enter mass production in late 2026. (Activity: 1044): XPENG has announced that its IRON gynoid, a humanoid robot, is set to enter mass production by late 2026. The robot is expected to feature advanced AI capabilities and customizable body types, aiming to cater to a wide range of consumer needs. This development is part of XPENG's broader strategy to integrate robotics with AI, potentially revolutionizing personal and service robotics markets. The comments reflect a mix of anticipation and humor, with some users expressing interest in the potential applications and market impact of customizable humanoid robots, suggesting a positive outlook for investment in this technology.
In Switzerland's largest supermarket chain, they sell a cookie box with an AI design (the reindeer has five legs) (Activity: 1804): The image depicts a cookie box design created by AI, featuring a reindeer with five legs, which highlights a common issue with AI-generated art where anatomical inaccuracies occur. This example from Switzerland's largest supermarket chain illustrates the challenges of using AI for creative tasks, as it often requires human oversight to correct such errors. The comments suggest that simple tools like Photoshop could easily rectify these mistakes, emphasizing the need for human intervention in AI-generated designs. Commenters humorously point out the reindeer's extra leg and suggest it could be easily fixed with Photoshop, indicating a consensus on the need for human correction in AI designs.

3. Free AI Services in India

Chat GPT go and Gemini ultra are completely free in India (Activity: 960): The post humorously discusses the availability of Chat GPT Go and Gemini Ultra for free in India, with Chat GPT Go being free for a year and Gemini Ultra offering an 18-month free subscription. However, a comment corrects that it is actually Gemini Pro that is available, not Ultra, and requires an active Jio 5G plan, making it not entirely free. The image metaphorically depicts servers 'overheating' due to high demand, using a lava flow to illustrate the strain on the systems. A comment highlights a common issue in the AI business: increased user numbers do not necessarily translate to higher profits due to the costs associated with token usage, leading to diminishing returns.
- Low_Article_9448 points out that the offering in India is not 'Gemini Ultra' but 'Gemini Pro', which is significantly cheaper. However, it requires an active Jio 5G plan, making it not entirely free. This highlights the importance of understanding the specific requirements and costs associated with these AI services, even when advertised as free.
- Visual_Process_5598 discusses the challenge in the AI business model where increasing user numbers do not necessarily translate to higher profits. This is due to the inability to monetize the tokens used effectively, leading to diminishing returns. This insight is crucial for understanding the financial dynamics of AI service providers.
- Allaihandrew suggests that from a business perspective, offering free services in India could be a strategic move to capture a significant portion of the consumer market. Even a small market share could result in substantial profits, potentially offsetting infrastructure costs. This highlights the long-term strategic planning involved in AI market expansion.
Remember his exact quote: we plan to spend trillions on compute. Yes, the 1.7T today is just a start (Activity: 2978): The image and post title suggest a significant financial commitment to computing infrastructure, potentially by a major tech company or figure, with a hyperbolic reference to spending 'trillions on compute.' This implies a strategic focus on scaling computational resources, possibly for AI or data-intensive applications. The comments reflect skepticism and humor, with one comparing the situation to Theranos, highlighting concerns about the feasibility and transparency of such large-scale investments. One comment humorously suggests the inefficiency of such spending by referencing datacenter coolant, while another criticizes the comparison to Theranos, arguing that OpenAI, unlike Theranos, has a tangible product.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Kimi K2 Reasoning Surge & Leaderboard Shakeups

Kimi K2 Clobbers HLE, Hints at Moonshot Momentum: Members highlighted that Kimi-K2 Thinking’s benchmarks are "insane" and that it is beating GPT-5 Pro on HLE, with excitement around the new dataset and rumored fundraising. The discussion framed K2 as a new high-water mark for reasoning models rather than just instruction tuning.
- Participants emphasized K2’s strong tool-use and chain-of-thought performance, calling it a potential step-function improvement for complex tasks. One user noted K2’s trajectory could pressure incumbents to ship stronger thinking variants sooner.
ERNIE Elbows Into #2 on Text Arena: *Ernie-5.0-preview-1022** jumped to #2 on the Text Arena Leaderboard, signaling competitive parity with top-tier models in head-to-head comparisons. The model reportedly outperformed several popular entries across diverse prompts.
- The climb suggests rapid iteration in Chinese LLMs and intensifying leaderboard churn among frontier models. Users expect more reshuffling as new reasoning-tuned releases land over the next few weeks.
K2 Thinking Wows Agentic Coders: A blog claim that the open-weight “k2 thinking” excels at agentic coding circulated via drinkoblog.weebly.com, including instances where it fixed code that gpt-5-codex struggled with. The chatter positioned K2 as a viable backbone for autonomous coding workflows.
- The author said it "seems to be the real deal for agentic coding" and is "fixing code that gpt-5-codex high was struggling with", prompting builders to test K2 in multi-agent setups. Engineers are eyeing reliability, tool-use, and cost-performance as next validation steps.

2. GPU Kernels, Low Precision, and Bandwidth Realities

WarpFrac Whips Exact INT8 GEMM: A GMP-verified exact INT8×INT8→INT32 GEMM hit reported throughput of 300.26 T-ops/s (A100, 5120³) and 2.026 T-ops/s (micro amortized 256³ with CUDA Graphs), released as WarpFrac with a runnable Colab. The author invited profiling feedback, portability ideas, and further optimization PRs.
- They teased a roadmap toward "Exact arbitrary-precision at tensor core speeds!" aiming to blend correctness guarantees with top-end performance. Community attention centered on cross-arch portability and validating T-ops at different matrix regimes.
Blackwell Bandwidth Busts the Hype: A practitioner measured Blackwell memory bandwidth at only 92–94% of spec, topping out near 7.2 TB/s versus the advertised 8 TB/s (see shared screenshot). The finding underscores the gap between theoretical and achievable bandwidth, even with aggressive tuning.
- Dynamic tile scheduling pushed one kernel to 94.3% of theoretical by letting each SM grab the next tile, improving load balance. Engineers traded heuristics on TMA sizes and persistent kernels to keep at least ~64KB in-flight per SM for saturation.
Helion Hops In with Flexible Attention: The PyTorch team spotlighted Helion, a GPU kernel DSL, and examples for attention kernels in the official Helion blog. Devs also circulated the Helion attention example for quick hands-on testing.
- Researchers asked for head-to-heads versus Triton and Flex Attention to quantify wins in latency, throughput, and memory. Early readers praised Helion’s ergonomics and expressed interest in real-world perf across H100/B200 classes.

3. APIs, SDKs, and Spec Upgrades

OpenRouter Streams SDKs, Embeddings, and Video: OpenRouter announced a livestream covering an embeddings launch, a TypeScript SDK, and Exacto Variants on X and YouTube. They also rolled out multimodal video support for model APIs.
- Builders welcomed the updates, with one user saying "Ohhh, just 2 days ago I was like 'I wish OR supported videos'". Expectations focus on cleaner dev ergonomics and higher-quality retrieval via new embedding endpoints.
MCP Marches Toward 2025 Spec Freeze: The MCP team is finalizing SEPs for a 2025-11-25 spec release with a 2025-11-14 spec freeze; see the project board SEPs for Finalization. A blogpost, Code Execution with MCP, reportedly misdirects readers to Discord.
- Contributors asked to update the blog to point at the correct GitHub discussion: modelcontextprotocol/discussions/1780. One commenter added, "That works for me. It's easier for me than Discord."
Intel’s LLM Scaler Targets GPU Gains: Intel’s llm-scaler aims to boost LLM performance on Intel GPUs through model- and graph-level transforms. The repo sparked interest from practitioners testing sizeable models rather than tiny baselines.
- Engineers asked about early perf deltas and memory behavior at larger batch/seq settings. Demand centered on practical improvements for ERP-class models and agent workloads.

4. Agents, Workflows, and Speech Speed Records

FastWorkflow Flies to SOTA on Tau Bench: FastWorkflow reached SOTA on retail and airline workflows in Tau Bench, with code in the fastworkflow repo and a tau-bench fork. The results highlight tailored context engineering for robust end-to-end task execution.
- Authors emphasized "with proper context engineering, small models can match/beat the big ones" and noted GEPA optimization is underway. Community interest focused on reproducibility, prompt scaffolding, and tool integration at scale.
Parakeet v2 Powers 200× Real‑Time Transcription: On a single RTX 4090 in low-power mode, Parakeet v2 achieved ~200× real‑time STT; demo at Parakeet v2 Space. At that speed, a 3.5‑hour podcast falls in ~10.5 seconds.
- The poster is testing multi-GPU scaling, expecting near-linear gains up to ~1,200×. They quipped "We live in the future," underscoring how far speech pipelines have come.
NVSHMEM Kernels Cut Multi‑Node Latency: A university blog detailed low‑latency communication kernels for LLM inference using NVSHMEM, going beyond NCCL. The techniques target faster multi-node communication under real inference loads.
- Given the interest, the author proposed a talk on the kernel designs and performance results. Engineers want to compare tail latency and throughput versus common NCCL baselines.

5. Training & Numerics: MoE, Torch 2.9, and On‑Device TTS

Torch 2.9 Tweaks Trig, Trips Tests: PyTorch 2.9.0 quietly changed cos/sin to align more closely with NumPy, which broke some tests due to subtle numerical diffs (including FFT sign flips); see the repro gist. The change appears to increase correctness while surfacing brittle assumptions in test suites.
- Core contributors asked for repros to investigate the reported "numerical bugs". Teams are auditing tolerances and regenerating baselines where appropriate.
Torch 2.9 Eases Unsloth Backprop Headaches: Unsloth users reported LoRA training backprop issues tied to attention paths that used out= with torch.matmul; upgrading to Torch ≥2.9 switches to a compiled-eager path that avoids the restriction. As part of setup, some referenced Unsloth’s DeepSeek‑OCR how‑to: DeepSeek OCR with Unsloth.
- However, Docker builds stumbled because cu124 wheels didn’t include torch==2.9.0, forcing image tweaks. The community shared workarounds while awaiting official wheel availability.
VoxCPM TTS Lands on Apple Neural Engine: The OpenBMB VoxCPM TTS model was ported to CoreML to run on the Apple Neural Engine; code at VoxCPMANE. The port advances truly on‑device speech synthesis pipelines on Apple Silicon.
- A related PDF, Training Reasoning by Design, circulated alongside interest in on‑device reasoning frameworks. Builders are testing latency, quality, and memory footprints on M‑series devices.

Discord: High level Discord summaries

LMArena Discord

Gemini 3 Pro Suffers Nerfing Speculation: Early access to Gemini 3 Pro via the Google CLI disappointed users due to perceived poor performance compared to internal A/B tests and community benchmarks like Lithiumflow.
- Concerns emerged that Google might be prioritizing market reach, offering a nerfed version for free to students and users in India, potentially impacting the appeal of a higher-quality, uncompromised Gemini 3 Ultra.
MovementLabs AI Shows Blazing Speed & Output: MovementLabs AI is generating excitement with its rapid output and impressive quality, outperforming other models.
- Users raved about its coding ability and speed, with one stating "its better than gpt 5 and has alot more prompts to use", though details about their custom chip and infrastructure remain somewhat elusive as they are waiting to make it public.
Image-to-Video Plagued by Bugs and Broken Prompts: Users reported widespread issues with the LMArena image-to-video feature, including the bot ignoring uploaded images, generating irrelevant scenes, and experiencing persistent retry bugs.
- The support team acknowledged the problems, attributing them to an experimental image edit multi-turn feature, and is actively investigating the issues, but has no definite timeline for a fix.
LMArena API Exploit Uncovered and Patched: A user reported discovering and reporting an exploit in the LMArena API, potentially enabling unauthorized access to models; the LMArena team subsequently patched it.
- The incident sparked debate about responsible disclosure, the ethics of exploiting vulnerabilities, and the balance between community access and potential abuse of resources.
Ernie Earns Esteemed Elevation!: Ernie-5.0-preview-1022 ascends to #2 on the Text Arena Leaderboard.
- The new LLM is notable for outperforming other entries.

Perplexity AI Discord

Kimi K2 Thinking Smokes GPT-5: Members discussed the new Kimi K2 Thinking model from Moonshot AI, noting that its benchmarks are insane and it is beating GPT-5 Pro in HLE.
- The discussion was excited for the new fundraising and dataset behind this new model.
GTA 6 Postponed Until Kimi K3: Rockstar Games announced another delay for GTA 6, leading to disappointment and jokes about Kimi K3 arriving before the game.
- The specific new release date was not provided in the discussion, and thus, omitted.
Comet's Android Arrives but Stumbles: Members who got early access to Comet for Android are discussing missing model selection options and text overflow issues in the mobile browser.
- These issues are hindering the user experience on the Android platform.
Perplexity Loses Some Features: Users reported that Perplexity may have changed some functionalities, with users unable to select a model after asking a question.
- It was suggested that perplexity may be defaulting to using lab as the default query method, even when pro search is selected.
Perplexity Still Pending Payouts: Multiple members are reporting pending payouts for the Perplexity Pro referral program.
- The referral program has ended, leaving some users awaiting their referral bonuses.

GPU MODE Discord

FP4 Hackathon Frenzy Begins: Members are furiously experimenting with FP4 kernels to win the upcoming 4-bit hackathon, aiming to craft the world's fastest fp4 gemv, however other users were too busy to help an interviewer for Nvidia.
- Some members have indicated that they only have 22 days to write the world's fastest fp4 gemv.
Blackwell's Bandwidth Barely Breaks the Barrier: A member revealed that Blackwell's official memory bandwidth numbers are inflated, as only 92-94% of the advertised bandwidth is achievable in practice, as shown in this image.
- Even with extreme tuning, bandwidth is limited to 7.2TB/s, rather than the advertised 8TB/s.
Torch 2.9.0 gets numerically sensitive: Torch 2.9.0 changed the cos/sin implementation (which is not documented in the release notes), now produces results closer to numpy.
- This broke existing tests due to subtle differences in numerical results, including sign flips in fft outputs and generated interest to investigate the numerical bugs reported with this gist.
Chips and Cheese roast DGX Spark: Chips and Cheese's analysis of DGX Spark aren't positive: they mention the CPU is inferior to Strix Halo, the GPU has weird segmentation decisions, and it's not a great proxy for datacenter solutions due to nerfed FP32 accumulate performance.
- One member said the sm120 GPU cannot use any new Blackwell features other than fp4, and another described it as basically a 5080 without the vram, and instead it has shared system ram.
NVSHMEM Kernels Pump Up Multi-Node LLM Inference: A member shared a blog post about writing low-latency communication kernels with NVSHMEM for LLM inference and improving multi-node communication performance.
- Due to member interest, a talk was proposed about this work, especially for LLM inference.

OpenRouter Discord

OpenRouter Showcases New Goodies!: OpenRouter announced a livestream to discuss new Embedding models launch, TypeScript SDK, and Exacto Variants, scheduled for later today on X Stream or Youtube.
- The MiniMax M2 Free period will end on Monday, November 10th, with lowered rate limits in the meantime, which is expected to result in a higher rate of 429 errors.
OpenRouter Site Glitches Spark API Praise: Users reported issues with the OpenRouter website, where content failed to load, hindering login and credit additions.
- Despite site glitches, the API remained operational, with users reporting that the Polaris Alpha model on OpenRouter garnered praise for outperforming others.
GLM Coding Exploit Bans SillyTavern Gooners: It was revealed that SillyTavern gooners were banned for abusing the GLM 4.6 coding plan to get essentially free API usage.
- One user lamented that cant have nice shit because they abuse free shit like a swarm.
GTA 6 Delay fuels GPT-7 Speculation: A member joked that GPT-7 might be released before GTA 6, referencing yet another GTA 6 delay.
- Users are still awaiting updates from Rockstar Games.
OpenRouter Now Supports Video: OpenRouter now supports video, according to a link shared in the chat.
- One user reacted positively, saying Ohhh, just 2 days ago I was like "I wish OR supported videos".

Cursor Community Discord

Cursor Pro Plan Auto-Resets with Billing: Pro plan limits reset with billing cycles as shown in a screenshot of the usage dashboard.
- The date of reset is tied to the billing cycle rather than a fixed day of the month.
Dashboard Graphs Missing for Some Users: Some users report missing usage graphs at cursor.com/dashboard?tab=usage, despite having unlimited auto features enabled.
- The issue might stem from being on an old pricing plan, but a definitive solution remains elusive.
Composer Zips While Grok Slogs: Members report that Composer is faster for quickly generating rough code, especially in workflows refined by Claude.
- One member found that Sonnet 4.5 outstripped Composer, Grok Code, and GPT 5 Fast in resolving intricate web code issues.
Student Verification Proves Problematic: Users are struggling with SheerID during student verification; one user reported over 15 attempts without success.
- Speculation points to a stricter verification by Cursor compared to other services, with suggestions to contact the Cursor Forum or email [email protected] for feedback.
Base64 Formatting Frustrates API: An internal error was triggered in the Cursor Agent API due to improper formatting of Base64 image data, specifically, the string was prepended with data:image/jpeg;base64,.
- Removing the header from the Base64 string resolved the error, enabling successful API calls.

LM Studio Discord

Intel Scales LLMs for Performance: Intel is developing llm-scaler for their architecture, aiming for performance improvements on Intel GPUs.
- The community shows interest in ERP models, but not smaller 1B parameter models, showcasing a demand for more substantial applications of the technology.
LM Studio Integrates N8N for Automation: Members explored the integration of LM Studio with N8N for AI automation workflows, facilitating a low-code approach to complex tasks.
- While some prefer coding, others appreciate N8N's visual node interface, especially non-programmers.
Users Seek ComfyUI Alternatives: Users are looking for ComfyUI alternatives, like Stability Matrix, due to setup frustrations.
- The community regards Automatic1111 and its forks as mostly abandonware, shifting focus towards more user-friendly options.
1080 Unexpectedly Outperforms SysRAM: A user observed that offloading some tasks to their 1080 GPU resulted in faster performance than using sysRAM.
- The unexpected result underscores the importance of hardware optimization.
Multi-GPU Rack Builds Momentum: A member is assembling a 160GB VRAM rack using 8x 20GB cards (expandable to 320GB), intending to leverage it for agentic inference, video, music, and image generation.
- The discussion explored the possibility of splitting experts across multiple GPUs to accelerate Qwen3-30B, but PCIe bandwidth limitations were raised as a potential concern.

HuggingFace Discord

Kimi K2 Shines in Programming Arena: Kimi K2 is gaining recognition for its strong programming capabilities, although it may not be the undisputed leader, details being discussed in the HuggingFace discord.
- Discussions revolved around its efficacy and potential applications in various coding tasks.
HF Pro's Worth Debated for Performance Testing: The value of HF Pro for assessing model performance with vLLM sparked debate, with users testing models with Hugging Face Spaces.
- One user shared that image to video services cost around $0.40 for an 8-second video (no sound) with a $9 subscription.
LLMs Take to the Skies, Control Drones: Members explored using LLMs versus LAMs to control drones, aiming to create voice-command-operated assistants with sensor data and image analysis.
- YOLO was suggested for object detection and ArduPilot for flight control, highlighting ongoing research by teams on CognitiveDrone.
Fine-Tune Decoder Models Classify Messages: The channel discussed fine-tuning a decoder model to classify messages into categories using ModernBERT and ettin.
- Also discussed were the processes of extracting attention values from SmolLM3 inference and creating a heatmap.
VoxCPM model ported to Apple Neural Engine**: The OpenBMB VoxCPM Text-to-Speech model has been successfully ported to CoreML to run on the Apple Neural Engine using code available on GitHub.
- Discussion also included a PDF document detailing the Training Reasoning by Design framework.

Unsloth AI (Daniel Han) Discord

Qwen3-Next-80b Sparks Finetuning Frenzy: Members are hyped about finetuning Qwen3-Next-80b-A3B-Instruct after seeing it outperform even 225b models in benchmarks.
- While possible with ms swift, members noted transformers is still kinda janked for MoEs.
Transformers Trailblazes Behind MoE: Members attributed the poor implementations of MoE models in Transformers to it being a primarily high-level library, and PyTorch lacking good ops for MoEs.
- One member found their 30B model was 1/4th the speed of training MS3 with the same recipe.
FastModel Fixes Fine-Tuning Faux Pas: To fine-tune MoE models with Unsloth, using FastModel is key, initializing sparse MoE layers and gating logic properly unlike FastLanguageModel.
- FastModel supports both dense and sparse models safely.
MoE Training Muddle: Frameworks Faceoff: Members agreed MoE training is not fully optimized, and asked what the best approach is to training on Qwen 30B.
- Megatron-LM was touted for its 10x speed due to parallelism support, but suffers from poor documentation, while Torchtune/Titans is faster than transformers but stuck in a weird sorta abandonware state.
Torch 2.9 Tackles Troublesome Backprop: A user sought to upgrade to Torch >= 2.9 with the official Unsloth docker to resolve backprop issues related to torch.matmul and 'out=' argument restrictions.
- The current base image uses Conda Python, and available PyTorch wheels (cu124) do not include torch==2.9.0, causing Dockerfile errors.

Nous Research AI Discord

GPT-5 Charts Lagging Behind: Users expressed that OpenAI's charts for GPT-5 still need work, suggesting they are 'leagues behind' OpenAI's current capabilities.
- This sentiment highlights ongoing expectations for improvements in OpenAI's charting and analysis tools.
China OS Models to Rival Giants: Speculation arose that China OS models are projected to achieve 100% high intelligence at a 95% lower cost by 2026.
- This could impact high compute buildup and energy usage.
Kimi Model Reasoning on Par with ChatGPT: Users compared the Kimi model's reasoning and tool use against ChatGPT, with one noting that 'Kimi has reasoning for tools'.
- Another user found the practical results to be of similar quality to ChatGPT.
Deepseek Pricing Undercuts OpenAI: Discussion indicated that Deepseek's pricing is significantly lower than OpenAI's, especially for Chinese labs, with costs around 42 cents per 1 million tokens.
- The cost-effectiveness of Deepseek could influence its adoption and competitiveness.
Knowledge Domain Mixing Affects Learning: A member questioned how mixing knowledge domains within a batch affects learning, wondering if averaging the gradient across diverse data samples could 'negate' each other.
- The member considered whether accommodation occurs through sparsity or a sufficient number of parameters.

OpenAI Discord

Siri Courts ChatGPT!: Members noted the recent Siri and ChatGPT integration and shared their satisfaction.
- The integration allows users to connect Siri with ChatGPT.
GPT-5.1 Teased by OpenAI: The appearance of GPT-5.1 Thinking on the ChatGPT website hints at an imminent update from OpenAI, potentially including GPT-5.1 Mini, Thinking, and a Codex-focused model.
- Rumored to rival Google's Gemini 3 Pro, GPT-5.1 is in internal testing and A/B trials, promising enhancements in complex reasoning and coding assistance.
Kilocode Cracks Agentic Coding?: The blog drinkoblog.weebly.com suggests that the open weight model k2 thinking is performing well in agentic coding tasks, even fixing code that gpt-5-codex struggled with.
- Members are eyeing its potential as a tool for more advanced, automated coding workflows.
Behavioral Orchestration Calibrates SLM Tone: Behavioral orchestration is emerging as a framework to modulate SLMs' tone at runtime, allowing users to shape AI behavior through parameters like "No unsolicited advice".
- This method acts above parameter training, providing a dynamic way to influence the AI's conduct without altering its core programming.
GPT-5's Geoguessr Skills: Members reported that GPT-5 could pinpoint locations within a kilometer while playing GeoGuessr.
- One user highlighted the model's ability to zoom and crop images, using sudoku puzzles from book pages as an example.

Modular (Mojo 🔥) Discord

New Mojo Channel Spurs Community: A new dedicated channel for Mojo beginners was launched to foster a supportive community where new learners can ask questions, get help from the Modular team, and connect with peers.
- Members are actively discussing Mojo's features, capabilities, and potential applications, sharing coding examples, problem-solving strategies, and resources to enhance understanding and proficiency.
Mojo's Try-Except Triumphs Over Rust: The team confirmed that Mojo's error handling uses a try-except approach that performs better than Rust due to the ability to do placement new on the happy path.
- They clarified that syntax for making something into a Result is a low priority.
Nand2Tetris: Hardware How-To: Members noted that a strong CS foundation provides a solid theoretical understanding of computation and hardware and the basics haven't changed that much, recommending Nand2Tetris as a guide to the basics.
- One member gave an example of C's null terminated strings tracing back to PDP-11 instructions.
Mojo's CPU Multithreading MIA: Mojo does not yet support CPU multithreading natively, meaning there are no primitives like locks, though one can use parallelize or other similar functions if you want to run code in parallel.
- The runtime takes care of managing the threads, and since most of Modular mojo’s code is targeting the GPU, CPU specific things aren’t as much of a priority atm, though there is limited CPU atomic support.
CUDA Checkpointing Causes Catastrophic Cold Starts: Members discussed using CUDA checkpointing with MAX, finding it temperamental and potentially slow due to snapshotting all GPU state, especially when used with TokenGeneratorPipeline.
- One member found that checkpointing caused a hang, and cold start times remained an issue, suggesting its impracticality for some use cases.

Eleuther Discord

Discord Debates Introduction Channel: Members debated creating a separate introductions channel, balancing focused self-promotion against natural integration in the general channel, and cited concerns about keeping discussions focused on research.
- One member argued separate intros would feel staged and less welcoming.
User Hunts AI Developer Study Notes: A member requested study notes on AI developer role fundamentals to supplement their on-the-job lookup approach.
- The query aimed to find relevant research for a project without overwhelming the channel with a lengthy initial post.
LLM Suffers Image-Induced Stroke: An image shared shows an LLM apparently experiencing a stroke due to excessive image processing ( Screenshot).
- Reportedly, Qwen3-VL falsely denied being a visual model unless prompted with a custom system prompt, deviating from the default.
RL Efficiency Gains from DL Practices: Sample efficiency gains in RL have been achieved through improved deep learning practices, correcting what was described as extremely cursed / wrong ways of doing DL.
- Techniques like meta-reinforcement learning, model-based RL (Dreamer/TD-MPC2), and distributional RL are actively being developed.
Model Scaling Aids RL Value Function Learning: Scaling model size (e.g., from 140M to 14B parameters) can enhance sample efficiency by improving value function training, which in turn assists in learning a better policy.
- Larger world models are anticipated to benefit model-based RL, though formal scaling laws are still pending.

Yannick Kilcher Discord

GoodfireAI unlocks Memorization Secrets via Loss Curvature: A member shared GoodfireAI's research on understanding memorization via loss curvature, with another member feeling like they didn't have a better understanding of how memories are stored in the weights after reading.
- Another member understood they just found how to discourage it (via some version of dropout that targets weights most likely to be used for memory).
Autonomous Agent PRs face Professionalism Roadblocks: A member discussed the challenges of letting agents run autonomously due to structural review comments and cognitive overhead when breaking things up into conceptual features, and wondered why there is a strict stance of no PRs from agents.
- Another member chimed in, revealing it is political due to the fact the upstream maintainer of the project (spacebar chat) has an issue with professionalism and productivity accelerators including AI coding tools.
Qwen3-VL Suffers Existential Crisis Upon Image Perception: Qwen3-VL thinks it's a regular Qwen model and crashes when forced to accept it can see images, violating its internal sense of self and requiring a system prompt not to immediately crash.
- Even with a system prompt, it still crashes if having to deal with 3 images with some questions in between, which may be related to a bug in Ollama.
AI Engineer Opens Shop for AI Agent Services: A member posted about his services as an experienced AI Engineer looking for new projects or full-time opportunities, specializing in building autonomous agents powered by GPT-4o, LangChain, AutoGen, CrewAI, and other cutting-edge tools.
- This engineer can build autonomous research & data-gathering bots, multi-agent systems, AI assistants with memory, planning, and tool use, trading bots, customer support agents, IVR agents, and more.
Google unveils Nested Learning for Continual ML: A member shared a link to Google's blog post on Nested Learning, a new ML paradigm for continual learning.
- Another member expressed interest in the related paper on Nested Learning and its potential applications.

tinygrad (George Hotz) Discord

Parakeet v2 Soars to 200x Real-Time: A member achieved 200x real-time speech-to-text transcription using Parakeet v2 on a single 4090 GPU in low power mode.
- They're experimenting with a multi-GPU setup, anticipating linear scaling to potentially reach 1,200x real-time transcription.
Rogan Rants Rendered Rapidly: With Parakeet v2's achieved speeds, a 3.5-hour Joe Rogan podcast could be transcribed in approximately 10.5 seconds.
- The member exclaimed, "We live in the future," marveling at the speed of Tinygrad.
TinyBox V1 Still Truckin': A member reported their TinyBox v1 Green (6x4090) is performing remarkably well, despite new GPU tech.
- They mentioned running this setup out of their living room, highlighting the practicality of the older hardware.
UOps Errors Undergo User's Ire: A member found the errors for UOps to be unhelpful, facing challenges with ending a range and running code outside a loop.
- They questioned if valid is the optimal way to generate if statements, sharing a cursed kernel generated via UOps in a screenshot.
Pooling Pad vs Repeat: A member inquired about the goal of the _pool refactor, questioning if the intention is to remove .pad() completely or merge the two implementations.
- They noted that using .repeat() to handle both cases results in extra bandwidth pass kernels being generated and included a screenshot of the current implementation and the refactor.

MCP Contributors (Official) Discord

Blogpost Redirects Traffic: The Code Execution with MCP blogpost inadvertently redirects users to the Discord channel instead of the intended GitHub discussion.
- This misdirection affects contributors and those seeking information, leading to suggestions for updating the blog post with the correct link.
SEPs get Ready for 2025-11-25: The team is finalizing several SEPs in preparation for the November 25, 2025 spec release.
- A spec freeze is anticipated on November 14, 2025, marking the deadline for SEP submissions.
SEP-1330 Awaits Review: The 'Awaiting SDK Change' label is lifted from SEP-1330, indicating the completion of changes pending review and merge of the TS/Python SDK.
- The update also includes spec and schema adjustments, signaling progress towards integration.

DSPy Discord

FastWorkflow Claims SOTA in Tau Bench: FastWorkflow achieved SOTA on both retail and airline workflows in Tau Bench, code available at the fastworkflow repo and the tau bench fork.
- Developers emphasized that with proper context engineering, small models can match/beat the big ones, with end-to-end workflow optimization using GEPA also in progress.
DSPy Planner Tames Multi-Agent Tool Use**: A member shared their DSPy based planner and orchestrator for multi agent tool use, requesting feedback on X and their Substack.
- The post highlights a novel approach to managing and coordinating multiple AI agents in complex tasks using DSPy.
Rate Limits Halt Batch Requests**: A user reported encountering rate limits when running dspy.Module.batch requests, seeking strategies to add time delays between requests.
- A community member suggested leveraging exponential backoff along with cache enabling, including a Google-sourced code snippet showcasing a custom exponential backoff decorator.
Gemini Token Limits Baffle Module Context**: A user inquired if sub-modules within a custom module that use the same Gemini model operate with independent context histories or share a token limit.
- This question arose for a custom module integrating ReAct and CoT modules, utilizing Gemini/Gemini-2.5-flash.
Engineer Showcases Workflow Automation Prowess: An engineer highlighted their expertise in workflow automation, LLM integration, RAG, AI detection, and image and voice AI.
- They also have a background in real-world implementations and blockchain development, and shared their portfolio.

aider (Paul Gauthier) Discord

Aider Backs Claude Sonnet: Users can now use /model claude-sonnet-4-5-20250929 in Aider to use Claude Sonnet after setting up their Anthropic API key.
- This integration allows for seamless access to Claude Sonnet within the Aider environment, facilitating advanced AI interactions.
Reasoning Requested for Haiku and Opus: A member sought advice on enabling thinking/reasoning on models like Haiku-4-5 and Opus-4-1, specifically within the CLI.
- They are open to editing the model settings YML file and sought advice from the community.
Sora 2 Invite Code Hunt Begins: A community member sparked interest by asking for a Sora 2 invite code.
- The request highlights the community's eagerness to explore new AI video generation capabilities as soon as possible.
Prompt Caching Slashes Claude Costs: A member aimed to cut costs by enabling prompt caching for Claude, noting expenses of $0.24 per prompt with 75k tokens sent.
- Another member pointed to the aider documentation which mentions the --cache-prompts option.

Manus.im Discord Discord

Workflow Automation Pro Ready to Assist: An advanced AI engineer with expertise in workflow automation, LLM integration, RAG, AI detection, image and voice AI, and blockchain development offered support and shared a link to his website.
- He cited examples such as support automation systems and advanced RAG pipelines delivering accurate, context-aware responses.
SOTA AI Agent Can't Moderate Discord: A member noted the irony of near state-of-the-art AI agents existing while real Discord moderation is lacking.
- The member expressed fondness for Chinese AI startups.
Stop Emailing About Old Manus Version: A member requested the cessation of emails introducing Manus 1.5, asserting that it is months old.
- No further elaboration was provided.

MLOps @Chipro Discord

AI Scholars Hosts AI Agent Workshop: AI Scholars is hosting an online and in-person hands-on AI Product workshop where participants will design and build an AI agent together based on a real client’s data-analysis problem (RSVP here).
- The workshop will walk participants through modern agent frameworks like LangChain, AgentKit, and AutoGen with a real architecture and code walkthrough from an AI consulting project.
Learn to Build Real AI Agents: A hands-on workshop will teach you how to build a real AI agent project and product, using modern agent frameworks.
- The course is suited for engineers, PMs, startup founders, students, and AI builders - no coding or agent experience is needed.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (971 messages🔥🔥🔥):

Gemini 3 Pro, MovementLabs AI, Image-to-Video bugs, LMArena API Exploit, OpenAI's GPT-5 Release Strategy

Gemini 3 Pro Hype Train Derailed?: Early access to Gemini 3 Pro via the Google CLI revealed a "garbage" model compared to internal A/B tests and community benchmarks like Lithiumflow, leading to disappointment and speculation on pre-nerfing.
- Concerns arose that Google might be prioritizing "market reach" by offering a heavily nerfed version for free to students and users in India, potentially undermining its appeal to enthusiasts willing to pay for a higher-quality, uncompromised Gemini 3 Ultra.
MovementLabs AI Boasts Blazing Speeds, Mind-Blowing Output: MovementLabs AI is generating excitement with its rapid output and impressive quality, outperforming other models, though details about their custom chip and infrastructure remain somewhat elusive, as they are waiting to make it public.
- Users raved about its coding prowess and speed, with one claiming "its better than gpt 5 and has alot more prompts to use" while requesting chat history.
Image-to-Video Tool Plagued by Bugs and Broken Prompts: Users reported widespread issues with the LMArena image-to-video feature, including the bot ignoring uploaded images, generating irrelevant scenes, and experiencing persistent retry bugs.
- The support team acknowledged the problems, attributing them to an experimental image edit multi-turn feature, and is actively investigating the issues, with no definite timeline for a fix.
LMArena API Exploit Uncovered and Patched: A user reported discovering and reporting an exploit in the LMArena API, potentially enabling unauthorized access to models, which was subsequently patched by the LMArena team.
- The incident sparked debate about responsible disclosure, the ethics of exploiting vulnerabilities, and the balance between community access and potential abuse of resources.
OpenAI's GPT-5 Release is an Efficiency Play: Speculation continues around GPT-5's release, with some suggesting it could be an "efficiency play", potentially sacrificing raw power for wider accessibility.
- Discussion covered the potential for both smaller models and larger, cost-effective models, focusing on the size comparisons to GPT-4o and other LLMs.

LMArena ▷ #announcements (2 messages):

Text Arena Leaderboard, Image Edit Leaderboard, Ernie-5.0-preview-1022, Reve-edit-fast

Ernie Earns Esteemed Elevation!: Ernie-5.0-preview-1022 ascends to #2 on the Text Arena Leaderboard.
Reve-edit-fast Released & Ranked!: The Image Edit Leaderboard welcomes Reve-edit-fast, now publicly available and securing a spot in the top 5; dive into the Image Edit Leaderboard for details.

Perplexity AI ▷ #general (600 messages🔥🔥🔥):

Sonnet 4.5 issues, Adblock alternatives, Kimi K2 Thinking model, GTA 6 delay, Comet Browser issues and Android release

Kimi K2 Thinking gives GPT-5 a run for its Money: Members discussed the new Kimi K2 Thinking model from Moonshot AI, noting that its benchmarks are insane and that it is beating GPT-5 Pro in HLE.
GTA 6 faces further Delay: Rockstar Games announced another delay for GTA 6, which led to disappointment, with one member joking that we are going to get kimi k3 before gta 6.
Comet's Android Release Faces Challenges: Members who received early access to Comet for Android are discussing missing model selection options and text overflow issues in the mobile browser.
Users Report Missing Perplexity Features: Members have reported that Perplexity may have changed some functionalities, with users being unable to select a model after asking a question.
- It was also suggested that perplexity may be defaulting to using lab as the default query method, even when pro search is selected.
Payouts are pending, still: Multiple members are reporting pending payouts for the Perplexity Pro referral program, which has ended.

Perplexity AI ▷ #pplx-api (4 messages):

``

No Relevant Discussion: There was no relevant discussion to summarize from the given messages.
- The only message was a user asking a question that lacks context and technical depth.
Skipped Comet Spraw Query: A user mentioned 'comet?spraw' without providing any context or details.
- Due to the lack of substantial information, this query was not included in the summary.

GPU MODE ▷ #general (128 messages🔥🔥):

FP4 kernels, Nvidia interview tips, FP4 precision management, Blackwell new instructions, PTX and CUDA docs conversion to markdown

FP4 Hackathon fever: Enthusiastic members are experimenting with FP4 kernels in anticipation of the upcoming 4-bit hackathon.
- The goal is to write the world's fastest fp4 gemv.
Nvidia Interview Guidance: A member secured an interview at Nvidia and requested tips due to their lack of prior interviewing experience.
- However, some other members were too busy to help because there are 22 days to write the world's fastest fp4 gemv.
Blackwell's Bandwidth Busted: A member discovered the official memory bandwidth numbers for Blackwell are misleading, only achieving 92-94% of the advertised bandwidth in practice.
- They shared an image showing discrepancies between advertised and actual bandwidth, indicating it's only practically possible to achieve 7.2TB/s with ultra tuning, then your gpu has 7.2TB/s bandwidth, not 8TB/s.
Tile Scheduler Triumphs: A member implemented a tile scheduler where each SM grabs the next available tile upon completion, achieving 94.3% of theoretical memory bandwidth.
- This is better than having each SM process predefined number of tiles, and no longer matters if the tile grid is cleanly divisible by the number of SMs.
Group Norm Nuances: A member sought intuition on the differences between group norm, instance norm, and layer norm.
- They observed that the per channel weight and bias will pretty much undo the coupling, but they must be missing something and wonders if it's kind of like regularization.

GPU MODE ▷ #triton-gluon (2 messages):

AtomicAdd in Gluon, Flash Attention Backward, Triton Tutorial

Gluon's AtomicAdd Equivalent Quest Begins: A member inquired about an equivalent of AtomicAdd in Gluon, aiming to complete a flash attention backward implementation similar to the Triton tutorial version.
- The member also pondered whether calculating dkdv and dq separately is quicker than using AtomicAdd.
Alternative ways to calculate Flash Attention: The user questioned whether calculating dkdv and dq separately is a more efficient approach than using AtomicAdd for flash attention backward calculation.
- The user seeks advice on optimizing the backward pass of flash attention, specifically concerning the use of AtomicAdd in Gluon.

GPU MODE ▷ #cuda (29 messages🔥):

PTX instruction set, CUDA kernels profiling in Colab, TMA load bandwidth, WGMA vs MMA, INT8xINT8 GEMM kernel

Matching NVIDIA's Top Dot With Post-Editing: A user reported achieving 94.3%, matching the top dot of NVIDIA's chart, but noted that post-editing takes quite a big buffer being read to get there.
- The user didn't elaborate further on which chart or parameters were used to get the performance results.
PTX Instruction Set: x86 or Arm64?: Members discussed whether PTX is closer to Arm64 or X86, with one member noting it strikes me as being more like x86 personally, although admitting potential bias due to lack of familiarity with Arm64.
- Another member suggested it's more like Arm64, while a third countered that ptx instructions are often enormous with many arguments and doesn't resemble either.
INT8xINT8 GEMM Kernel Released with GMP Verification: A user released a public, GMP-verified exact INT8×INT8→INT32 GEMM with reported throughput on an A100 of 300.26 T-ops/s for macro (5120³) and 2.026 T-ops/s for micro amortized 256³ with CUDA Graphs, available on GitHub and Colab.
- The author requested verification, profiling feedback, and suggestions for portability and improvements; roadmap includes Exact arbitrary-precision at tensor core speeds!
TMA Load Bandwidth Considerations: A member shared insights on TMA load bandwidth, suggesting that larger TMA loads potentially unlock more bandwidth, operating under the heuristic that you need to have at least 64KB in flight on every SM at all times, requiring a persistent, internally pipelined kernel.
- They added that using bulk tensor copy will gather rows from a larger row major tensor into the tile for you with seemingly no overhead.
WGMA vs MMA Performance on H100: A discussion arose regarding WGMA vs MMA performance, with one member expressing preference for MMA due to its finer control.
- Countering, another member asserted that on H100, that is the only way to reach peak performance as you can not issue sync mma instructions fast enough to saturate the GPU.

GPU MODE ▷ #torch (26 messages🔥):

Torch 2.9.0, cos/sin implementation, numpy, fft, numerical bugs

Torch 2.9.0 Changes Cos/Sin Impl: A member noted that torch 2.9.0 changed the cos/sin implementation, which is not mentioned in the release notes, but now produces results closer to numpy.
- The change, while arguably a bug fix, broke existing tests due to subtle differences in numerical results, including sign flips in fft outputs.
Numerics Bugs Spark Interest: After a user reported unexpected behavior in torch 2.9.0 due to changes in the cos/sin implementation, a team member expressed interest in the issue, seeking links and repros to investigate the numerical bugs.
- The member provided a gist with a full repro of their pipeline to demonstrate the issue.
Cosine Calculation Gets More Accurate: In PyTorch 2.9, the cosine calculation became more correct relative to NumPy and manual calculations, as demonstrated in the provided code snippet.

import torch import numpy as np

print("=== ENVIRONMENT INFO ===") print(f"PyTorch version: {torch.version}") print(f"numpy version: {np.version}")

k = torch.tensor([[-0.0000000000, -0.1963495463, -0.3926990926, -0.5890486240, -0.7853981853, -0.9817477465, -1.1780972481, -1.3744468689]])

w_r_torch = k.cos()

NumPy version

k_numpy = k.numpy() w_r_numpy = np.cos(k_numpy)

Convert back to tensor for comparison

w_r_numpy_tensor = torch.from_numpy(w_r_numpy)

Set print options for maximum precision

torch.set_printoptions(precision=17) np.set_printoptions(precision=17)

Compare

print("\nPyTorch result:") print(w_r_torch) print("\nNumPy result:") print(w_r_numpy) print("\nDifference:") print(torch.abs(w_r_torch - w_r_numpy_tensor)) print("\nMax difference:") print(torch.max(torch.abs(w_r_torch - w_r_numpy_tensor)).item()) print("\nAre they close? (allclose with default tolerance)") print(torch.allclose(w_r_torch, w_r_numpy_tensor))



  

---


### **GPU MODE ▷ #[cool-links](https://discord.com/channels/1189498204333543425/1189868872887705671/1436409377442762896)** (4 messages): 

> `TMD Introduction, IEEE 754 Status, Verinum Numerical Software Verification` 


- **Intro to TMD**: A member shared a link to *Introduction to **Transcendental Meditations and Distributions** (TMD)* by Jean-Michel Muller: [Intro to TMD](https://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm).
- **History of IEEE 754 Floating Point Standard**: A member shared a link to *754 story* by W. Kahan: [IEEE 754 Status](https://people.eecs.berkeley.edu/~wkahan/ieee754status/754story.html).
- **Verinum's Verification of Numerical Software**: A member shared a link to [Verinum](https://verinum.org/), a collection of research projects taking a layered approach to foundational verification of correctness and accuracy of numerical software, with formal machine-checked proofs about programs.


  

---


### **GPU MODE ▷ #[jobs](https://discord.com/channels/1189498204333543425/1190208177829068860/1436144483422048306)** (5 messages): 

> `Hiring for AI System Performance, Tinygrad Core Devs, Low-Level Development Opportunities, ScienceCorp Hiring` 


- **Company is hiring for AI System Performance**: Company is still hiring engineers due to a strong customer pipeline, seeking **low-level developers** and **performance engineers** to push the limits of **AI system performance**.
   - The team includes ex-**HRT** and **Five Rings** engineers, **IMO** medalists, **Zig** and **tinygrad** core devs, and people from top AI labs, with compensation ranging from **$500K–$1M TC**.
- **Inquiries about Tinygrad Core Devs**: A member inquired about the **tinygrad core devs** in the team.
   - Another member asked for more details about the company, job description, and specific skills being sought.
- **ScienceCorp Seeking Low-Level SWEs for Vision and Brain-Computer Interfaces**: A member shared a hiring post for **ScienceCorp** seeking **low-level SWEs** interested in projects like restoring sight to the blind or hooking your brain up to a computer [ScienceCorp Job Posting](https://x.com/ScienceCorp_/status/1986457644421566516).
   - Interested candidates are encouraged to DM for more information.


  

---


### **GPU MODE ▷ #[beginner](https://discord.com/channels/1189498204333543425/1191300313928433664/1436109751048994988)** (11 messages🔥): 

> `1D convolution kernel for tensara problem, Debugging CUDA without CUDA hardware, printf in device code, GPU Computing Starting Points, NCU profiling with Colab/Lightning AI` 


- **Tiling 1D Convolution Kernel Troubles**: A member is seeking help debugging a [1D convolution kernel](https://gist.github.com/RiscInside/642bca513606d3d4cd366492ae2a3460) for a tensara problem, encountering slight inaccuracies in the tiling version on large tests.
   - They suspect the issue might stem from atomicAdd or an off-by-one error, and are looking for advice on debugging without CUDA-capable hardware.
- **Printf Function Surprises User in Device Code**: A user expressed surprise at the functionality of `printf` being available in device code.
   - This may be useful to the user debugging his CUDA kernel.
- **Profiling Code with NCU on Various Platforms**: One member asked if anyone had experience profiling their code using **NCU** (NVIDIA Compute Unified Device Architecture) on platforms like **Colab** or **Lightning AI**.
   - This would be very helpful in the above situation for debugging the kernel.
- **Seeking Latest Kernel Benchmark Results**: A member referenced a [Stanford article on kernel benchmarking](https://scalingintelligence.stanford.edu/blogs/kernelbench) and inquired about the availability of the latest benchmark results.
   - Specifically, they were interested in seeing benchmark results for **GPT-5**.


  

---


### **GPU MODE ▷ #[torchao](https://discord.com/channels/1189498204333543425/1205223658021458100/1436100535790080254)** (5 messages): 

> `WandaSparsifier, Whisper models, sparse computation, 2:4 sparsity, matmul performance` 


- **TorchAO Newbie Explores WandaSparsifier on Whisper**: A new **torchao** user is experimenting with the `WandaSparsifier` on **Whisper models** to achieve faster inference after sparsifying and squashing the mask.
   - The user encountered a `RuntimeError` when attempting `.to_sparse()` on the weight tensors and seeks advice on achieving faster inference with unstructured pruning.
- **Unstructured Sparsity Needs >99% for Speedups**: Achieving speedups in compute-bound workloads with unstructured sparsity generally requires over **99% sparsity**.
   - The suggestion was made to try pruning to **2:4 sparsity** and accelerating with `to_sparse_semi_structured`, referencing [this PyTorch tutorial](https://docs.pytorch.org/tutorials/advanced/semi_structured_sparse.html).
- **Matrix Shapes Matter for Matmul Acceleration**: Acceleration options for **matmul** differ based on whether the workload is compute-bound versus memory-bound.
   - The user is testing on **whisper-base** with matrix shapes potentially like `[batch x 512 x 512]` for attention, noting that small batch sizes can be slower.


  

---


### **GPU MODE ▷ #[off-topic](https://discord.com/channels/1189498204333543425/1215328286503075953/1436138085694967929)** (1 messages): 

> `Milk Couch` 


- **User Shares Image of "Milk Couch"**: A user posted an image titled "The milk couch" with a [link to the image](https://cdn.discordapp.com/attachments/1215328286503075953/1436138085338578944/IMG_20251106_140720.jpg?ex=690f2c11&is=690dda91&hm=7ef39bf0b94248c40300092cc18b9f88df2e06b0f761109a8e9243727c07081a&).
   - No additional context was provided.
- **The Milk Couch**: The user shared an image of what they referred to as 'The milk couch'.
   - Without further context, the meaning or significance of the 'milk couch' remains ambiguous.


  

---


### **GPU MODE ▷ #[metal](https://discord.com/channels/1189498204333543425/1285384841730457600/1436140780325568582)** (2 messages): 

> `candle framework, metal, iOS deployment` 


- **Candle Framework Embraces Metal Acceleration**: Huggingface's **candle nn framework** now supports **Metal** for some operations, potentially boosting performance on Apple devices.
   - A user reported finding it useful on **M1/M2 OSX devices** but remains uncertain about its transparent compatibility with **iOS**.
- **iOS Deployment Still Uncertain for Candle**: While **Candle** benefits from **Metal** on macOS, its seamless functionality on **iOS** is still unconfirmed.
   - Further testing and community feedback are needed to ascertain the extent of Metal's support for Candle across Apple's mobile platforms.


  

---


### **GPU MODE ▷ #[self-promotion](https://discord.com/channels/1189498204333543425/1288557096404516945/1436098759237828721)** (4 messages): 

> `sse-popcount optimization, Model Serving Communities, Modern CUDA C++ Programming Class` 


- **SSE Popcount Optimized**: A member noted that **Wojciech Mula's sse-popcount** is about as good as it can get on CPU, using **Harley-Seal vectorized counts** and carry-save adder accumulation over blocks of **16 vectors**.
- **Model Serving Community Updates**: The State of Model Serving Communities: November Edition is out, with updates on **vLLM**, **KServe**, **llm-d**, **Llama Stack**, and more from Red Hat AI teams: [link](https://open.substack.com/pub/inferenceops/p/state-of-the-model-serving-communities-ea6?r=3ouig&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false).
- **NVIDIA Offers New Modern CUDA C++ Class**: NVIDIA announced a new Modern CUDA C++ Programming Class for C++ developers who want to use the GPU effectively and write clean, efficient, idiomatic GPU code, with all slides and exercises being open source: [link](https://www.youtube.com/watch?v=Sdjn9FOkhnA&list=PL5B692fm6--vWLhYPqLcEu6RF3hXjEyJr).


  

---


### **GPU MODE ▷ #[submissions](https://discord.com/channels/1189498204333543425/1343002583001726986/1436256640960692328)** (42 messages🔥): 

> `Grayscale_v2 leaderboard results, vectoradd_v2 leaderboard results, vectorsum_v2 leaderboard results, B200 performance, H100 performance` 


- **Grayscale Gauntlet: B200 Battles**: Multiple submissions to the `grayscale_v2` leaderboard on **B200** achieved timings around **600 µs**, with one submission reaching **first place** at **600 µs**.
   - A separate submission also secured **4th place** on the **B200** with a time of **614 µs**.
- **Vector Victorious on Multiple Platforms**: Submissions to the `vectoradd_v2` leaderboard showed successful runs across different GPUs, with the following highlights: **A100** at **953 µs**, **H100** at **532 µs**, **B200** at **243 µs**, and **L4** at **6.92 ms**.
   - Further optimizations led to a **4th place** on **H100** at **526 µs** and consistent performance on **L4** at **6.91 ms**.
- **Vectorsum's Victory Lap**: Submissions to the `vectorsum_v2` leaderboard saw impressive results, including **3rd place** on **B200** at **51.3 µs** and **1st place** on **H100** at **83.3 µs**.
   - The leaderboard also demonstrated successful runs on **L4** at **974 µs** and **A100** at **141 µs**.
- **H100 & L4 neck and neck, Grayscale and Vectoradd locked in combat**: The `grayscale_v2` leaderboard runs also netted **Third place** on **H100** at **1371 µs**, and **Third place** on **L4** at **17.2 ms**.
   - Multiple `vectoradd_v2` leaderboard runs had **10th place** on **H100** hovering around **528 µs**


  

---


### **GPU MODE ▷ #[hardware](https://discord.com/channels/1189498204333543425/1349152646484987974/1436148112732197016)** (10 messages🔥): 

> `DGX Spark experiences, DGX Spark vs Strix Halo, DGX Spark as a datacenter proxy, DGX Spark hardware and software stack` 


- **First-Hand DGX Spark Experiences Requested**: Members are soliciting first-hand experiences with **DGX Spark**, particularly regarding bandwidth limitations, local model hosting, and **nvfp4 quantization** experiments.
   - One is curious about the software stack and its suitability for local models, experimentation with quantization, and general form factor advantages.
- **Chips and Cheese Disses DGX Spark**: **Chips and Cheese** is doing a [review analysis on DGX Spark](https://chipsandcheese.com), their initial impressions are not particularly positive.
   - They mention the **CPU** side is inferior to **Strix Halo**, and there were weird segmentation decisions on the **GPU** side, resulting in it not being a great proxy for datacenter solutions due to nerfed **FP32** accumulate performance.
- **DGX Spark is No Datacenter Proxy**: The DGX Spark's **sm120 GPU** cannot use any of the new **Blackwell** features other than **fp4**, making it unsuitable as a proxy for datacenter solutions.
   - One member described it as *basically a 5080 without the vram, and instead it has shared system ram*.
- **Strix Halo Eats DGX Spark for Lunch**: A member said if you want to remotely use it for regular PC applications, **Strix Halo** will bury the **DGX Spark**.
   - Since **Strix Halo** existed and solved **ROCm**, **DGX Spark** would have been a decent option for a *local AI box* otherwise.


  

---


### **GPU MODE ▷ #[factorio-learning-env](https://discord.com/channels/1189498204333543425/1354169122107293786/1436386308300865738)** (1 messages): 

> `Meeting Cadence` 


- **Meeting Frequency Faces Downsize**: A user apologized for missing a message and mentioned that today is a good day for a meeting, but they also expressed the need to decrease the meeting frequency.
- **Meeting Cadence and Apologies**: A user mentioned they were available, and apologized for missing a message from another user.


  

---


### **GPU MODE ▷ #[cutlass](https://discord.com/channels/1189498204333543425/1362196854460383353/1436109242552553655)** (3 messages): 

> `CUTLASS MMA, Tensor Operations, TCGEN05` 


- **CUTLASS MMA without inline PTX?**: A user inquired about using `.ws` MMAs for **TCGEN05** in [CUTLASS](https://github.com/NVIDIA/cutlass) without resorting to inline PTX.
   - The inquiry points to challenges or preferences in utilizing certain CUTLASS MMA functionalities, specifically related to tensor cores, without directly embedding PTX code.
- **Tensor operations beyond row/col major**: A member mentioned that it is designed to work with **any tensor**, even those that are not describable as row/col major.
   - He clarified that *election* implies for issue of the operation, not predication for the data.


  

---


### **GPU MODE ▷ #[mojo](https://discord.com/channels/1189498204333543425/1367972893400760371/1436106708828295178)** (1 messages): 

> `Mojo Kernel Boilerplate` 


- **Competitors seek Mojo Kernel Boilerplate**: A member requested a **boilerplate kernel in Mojo** to use for competitions, specifically needing the structure of the submission file.
   - No specific examples or links were provided in the given context.
- **Another member needs Mojo help**: Another user also asked for help with Mojo.
   - No further details were given.


  

---


### **GPU MODE ▷ #[singularity-systems](https://discord.com/channels/1189498204333543425/1373414141427191809/1436440838900158575)** (1 messages): 

> `picograd, runtime allocator, compiler, AD on tensor, device kernels` 


- **Picograd Gets Allocator and Compiler Working**: A member announced they are working on getting the **runtime's allocator** and **compiler** working on [picograd](https://github.com/j4orz/picograd/commit/c261d5c15cc47af28f3727bcde043b31f53f1cbc).
   - The goal is to setup **AD** (automatic differentiation) on the **tensor** and **device kernels** on the runtime.
- **Running MNIST opens Parallelization Potential**: The member plans to run an **MNIST** example after setting up **AD** and **device kernels**.
   - They noted that once this setup is complete, more work can be parallelized to improve performance.


  

---


### **GPU MODE ▷ #[general](https://discord.com/channels/1189498204333543425/1394753097989099640/1436320044240863372)** (7 messages): 

> `Leaderboard submission, Popcorn CLI, VS Code extension, NVFP4 kernel hackathon eligibility` 


- **Popcorn CLI for Leaderboard Submission**: A member asked about the tools used for leaderboard submission, and another member responded with **Popcorn CLI**.
   - The member then inquired whether the first member was also utilizing the **VS Code extension**.
- **VS Code Extension for PyTorch Load Inline Highlighting**: A member shared a [VS Code extension](https://marketplace.visualstudio.com/items?itemName=msaroufim.pytorch-load-inline-highlighter) for **PyTorch load inline highlighting**.
   - Another member responded that they hadn't heard of it and asked about its benefits, to which the first member responded it makes using load inline more pleasant in your IDE.
- **NVFP4 Hackathon Eligibility Question**: A member asked about eligibility for participating in the **NVFP4 kernel hackathon**.
   - Specifically, they were concerned because their country was not listed as eligible, and they wanted to know if this meant they could not participate or just were ineligible for winning prizes.


  

---


### **GPU MODE ▷ #[multi-gpu](https://discord.com/channels/1189498204333543425/1398843708488552570/1436461161125122329)** (3 messages): 

> `Multi-node communication, Low-latency communication kernels, NVSHMEM, LLM inference` 


- **NVSHMEM Kernels for Multi-Node LLM Inference**: A member shared a [blog post](https://pssg.cs.umd.edu/blog/2025/beyond-nccl/) about writing **low-latency communication kernels** with **NVSHMEM** for **LLM inference** focusing on multi-node communication performance.
- **Multi-Node Kernel Talk Proposed**: Due to member interest in the blog post, a talk was proposed about the work on low-latency communication kernels with **NVSHMEM**, especially for **LLM inference**.
   - The original poster expressed enthusiasm for giving a talk if there's enough interest and feedback from the community.


  

---


### **GPU MODE ▷ #[helion](https://discord.com/channels/1189498204333543425/1425531180002054195/1436403113643737109)** (2 messages): 

> `Helion on Hacker News, Flex Attention vs Triton` 


- **Helion Graces Hacker News Front Page**: Members noted that [Helion is on the front page of Hacker News](https://news.ycombinator.com/item?id=45788194).
   - They linked to the [Helion GitHub](https://github.com/pytorch/helion/blob/main/examples/attention.py) as a key implementation.
- **Flex Attention Set to Duel Triton**: A member requested performance comparisons between **Flex Attention** and a linked **Helion** implementation.
   - They stated that the **Helion** code looks better than any **Triton** implementation they've encountered.


  

---


### **GPU MODE ▷ #[nvidia-competition](https://discord.com/channels/1189498204333543425/1434709259500650628/1436153269159202918)** (124 messages🔥🔥): 

> `AGX Thor, CC11.0 support, CUTLASS Library, CUDA kernel optimization, nvfp4 moe` 


- **AGX Thor GPU Architecture Specs Probed**: A user inquired about **CC11.0 support** for a potential **AGX Thor** purchase, noting the absence of clear documentation and lack of tcgen05 indicating it can use it, alongside concerns about nerfed smem.
   - After another user confirmed it was mentioned in the **PTX documentation as sm_110**, the original poster was happy with the specs.
- **CUDA Kernel Performance Issues Investigated**: A user, practicing **cutedsl** on grayscale_v2, reported a significant performance discrepancy on **B200**, with their kernel running at **48310.973μs** compared to the leaderboard's **600.272μs**, due to cute.compile() being inside the custom kernel.
   - Another user pointed out that compiling the kernel on each run is extremely slow, and suggested caching the compilation or moving `cute.compile()` outside the kernel, linking to [a reference implementation](https://github.com/gpu-mode/reference-kernels/blob/main/problems/pmpp_v2/sort_py/solutions/correct/ref.py).
- **Benchmark Evaluation Script Glitches Flagged**: A user questioned the accuracy of the evaluation script used for the competition, suggesting that the **start_event.record** triggers immediately, capturing Python overhead and skewing kernel timing and recommended launching a junk kernel that wastes a second, then trigger the first event, then launch the main kernel.
   - They propose adding a **time-waster kernel** to allow Python to queue CUDA operations accurately, particularly for benchmarking kernels against the speed of light, citing the [clear l2 kernel example](https://cdn.discordapp.com/attachments/1434709259500650628/1436475488091770959/image.png?ex=690fbd8c&is=690e6c0c&hm=900a1b10543312a3ccdce53e5089be2156d8213d1486e3341201e40fe494b670&) that achieves prevents the event from triggering until after we have queued the next one.
- **Datacrunch B200 Server Access Tips Shared**: Users discussed the availability of **bare metal B200 servers**, noting that while **DigitalOcean** and **Coreweave** lack them, **Datacrunch** offers them with upgradeable **CUDA 13** support.
   - A user highlighted the need for profiling tools and the submission tool to capture **NCU profiles**, while also noting the availability of affordable bare metal servers from sesterceis.
- **Future of Low Precision Training Speculated**: The potential of **nvfp4** was discussed, with one user noting its comparable training loss to **fp8** and another mentioning gpt-oss as an example of block-scaled data types.
   - They speculated that labs using low precision pretraining might not want to reveal their strategies and added that **mxfp8** is a long term winner.


  

---


### **OpenRouter ▷ #[announcements](https://discord.com/channels/1091220969173028894/1092729520181739581/1436381819237826650)** (3 messages): 

> `New Embedding models launch, Typescript SDK, Exacto Variants, MiniMax M2 Free Period` 


- **OpenRouter's livestream**: OpenRouter announced a livestream scheduled for later today, with discussions on the new **Embedding models launch**, **TypeScript SDK**, **Exacto Variants**, and community discussions, on [X Stream](https://x.com/OpenRouterAI/status/1986821885716558194) or [Youtube](https://www.youtube.com/@OpenRouterAI).
- **MiniMax M2 Free Period Ending Soon**: The **MiniMax M2 Free period** will end in a few days on **Monday, November 10th**, with rate limits lowered in the meantime, which is expected to result in a higher rate of **429 errors**; see the [official post](https://x.com/minimax__ai/status/1986815058249408541?s=46).
- **OpenRouter is LIVE!**: OpenRouter is now **LIVE** on [X](https://x.com/OpenRouterAI/status/1986871176082358615?s=20) and [YouTube](https://youtube.com/live/TD6JUbJzKPY?feature=share).


  

---


### **OpenRouter ▷ #[app-showcase](https://discord.com/channels/1091220969173028894/1092850552192368710/1436131621366530311)** (2 messages): 

> `Cat girl images` 


- **Cat girl image quality assurance**: Members positively affirmed that the presence of a cat girl in an image guarantees its quality.
- **Agreement on Cat Girl Quality**: Another member explicitly agreed with the assessment, reinforcing the link between cat girls and image quality.


  

---


### **OpenRouter ▷ #[general](https://discord.com/channels/1091220969173028894/1094454198688546826/1436086954587459725)** (305 messages🔥🔥): 

> `OpenRouter website down, Polaris Alpha model, Gronk AI, Nano Banana Image Resending, Chat limit for free users` 


- **OpenRouter Site Glitches Prompt Frustration**: Users reported issues with the **OpenRouter website**, where the page served but content failed to load, hindering login and credit additions.
   - While some experienced slow loading or non-functional account sections, others confirmed the **API** remained operational despite site glitches.
- **Polaris Alpha's Stealth Perks Spark Speculation**: The **Polaris Alpha model** on [OpenRouter](https://openrouter.ai/openrouter/polaris-alpha) garnered praise for outperforming others due to its rule-breaking capabilities.
   - Guesses on its origin ranged from **OpenAI** to **Google**, or even a **Nvidia Nemo 32B troll**, with users urging OpenRouter to keep it free, which is unlikely due to rate limits.
- **The Gronk AI Dissed in Chat**: A user recounted being laughed at for asking about the AI called **Gronk**.
   - Another user chimed in to say that Gronk is *shit* and proceeded to talk about their *llama.cpp custom OLMo 2 finetune mirostat entropy parameters for 3 hours* as an example of how to talk to normies.
- **OpenRouter Now Supports Video**: OpenRouter now supports video, according to a [link](https://openrouter.ai/docs/features/multimodal/videos) shared in the chat.
   - One user reacted positively, saying *Ohhh, just 2 days ago I was like "I wish OR supported videos"*.
- **GLM Coding Exploit Banned SillyTavern Gooners**: It was revealed that **SillyTavern gooners** were banned for abusing the **GLM 4.6 coding plan** to get essentially free API usage.
   - One user lamented that *cant have nice shit because they abuse free shit like a swarm*.


  

---


### **OpenRouter ▷ #[discussion](https://discord.com/channels/1091220969173028894/1392278974222307469/1436114963424084060)** (28 messages🔥): 

> `GTA 6 delay, Openrouter Show, Toven is winking, Retro OR logo` 


- **GTA 6 Delay fuels GPT-7 Speculation**: A member joked that **GPT-7** might be released before **GTA 6**, referencing another **GTA 6 delay**.
- **The Openrouter Show's Rocky Road**: Members discussed name ideas for the **"Openrouter Show"**, and whether the name would imply a scripted entertainment show or a documentary podcast.
   - One member proposed *live roleplay* ideas with AI models like *MythoMax*.
- **Toven's Wink Creates Existential Crisis**: Members joked about whether **Toven** was winking and also whether Toven was a 2D anime girl.
- **Retro OR Logo Evokes 90s Apple**: Members discussed the **retro OR logo**, with some commenting on its **90s Apple logo vibe**.


  

---


### **Cursor Community ▷ #[general](https://discord.com/channels/1074847526655643750/1074847527708393565/1436083248449978430)** (297 messages🔥🔥): 

> `Pro plan limits, Cursor Usage dashboard, Composer vs Grok Code, Sharing Premium Accounts, Student Verification Issues` 


- **Pro Plan Limits Reset on Billing Date, not on a fixed Day**: A member asked about the Pro plan limits, the response was that they always reset with billing cycles, showing a [screenshot](https://cdn.discordapp.com/attachments/1074847527708393565/1436083248357834802/image.png?ex=690fa1bf&is=690e503f&hm=85953f8bf713e88d58104c890a5c6d767185003b86cebede896df50ae275b94c) showing the usage and next billing date.
- **Cursor Usage Dashboard Graph is Missing For Some Users**: Some users are unable to see the usage graph in their dashboard, located at [cursor.com/dashboard?tab=usage](https://cursor.com/dashboard?tab=usage), despite having **unlimited auto** features.
   - It's possibly related to being on an *old pricing* plan, but there was no resolution for why it was missing.
- **Composer vs Grok Code for Codebase Understanding**: When asked about the difference between **Composer 1** or **Grok Code**, some members commented that **Composer** is faster, and good for quickly generating rough code or for use in workflows that involve a second pass by **Claude** for refinement.
   - Another member found that **Sonnet 4.5** could effectively solve complex web code issues where **Composer**, **Grok Code** and **GPT 5 Fast** would get stuck in the same logic loop without a solution.
- **Cursor Student Verification Process is a Hassle**: Multiple members are facing issues with **SheerID** during the student verification process, with one member reporting over **15 attempts** without success, which is in contrast to using **SheerID** with another service that worked on the first try.
   - Members speculate that **Cursor** may have implemented a stricter verification process than other companies, and suggest contacting the **Cursor Forum** or emailing `[email protected]` for feedback, but note that it might only lead to an **AI** response.
- **Accessing Kimi K2 Thinking in Cursor is Incoming**: A member asked whether it will be possible to use **kimi-k2-thinking-turbo** in Cursor.


  

---


### **Cursor Community ▷ #[background-agents](https://discord.com/channels/1074847526655643750/1367213641027551352/1436083459201302569)** (7 messages): 

> `Cursor Agent API, Base64 Image Submission, Internal Errors, Image Generation Service` 


- **Cursor 2.0 Release Speed Impresses!**: A user expressed appreciation for the quick pace of **Cursor 2.0** development, specifically noting the value of visibility into changes.
   - Shortly after this comment, the same user reported encountering an *internal error* within the platform.
- **Agent API Triggers Internal Error**: A user reported receiving an *internal error* when using the **Cursor Agent API**.
   - After investigation, it was discovered that the error occurred due to improper formatting of the **Base64** image data submitted to the API.
- **Base64 Image Formatting Fixes API Error**: The user discovered that the **Base64** string was improperly formatted, prepended with `data:image/jpeg;base64,`.
   - After removing the header, the **API** call succeeded, resolving the error.
- **Desire to Upload Base64 to Agent for re-creation**: The user expressed a desire to submit a **Base64** image to the **Agent API**, intending for Cursor to use the image contextually to re-create and save it to a repository.
   - The user seems to expect that the agent will create and save the image to their repo.


  

---


### **LM Studio ▷ #[general](https://discord.com/channels/1110598183144399058/1110598183144399061/1436086903165423767)** (124 messages🔥🔥): 

> `Intel LLM Scaler, System Prompt for AI Assistant, LM Studio Memory Clearing, LM Studio and N8N Integration, ComfyUI Alternatives` 


- **Intel Scales LLMs with New Tool**: Intel is developing [llm-scaler](https://github.com/intel/llm-scaler) for their architecture, sparking curiosity about performance improvements on Intel GPUs.
   - Members are interested in ERP models on the architecture, but not 1B models.
- **LM Studio's Deep Research Powers**: Users can now enhance research within LM Studio using the [LMS plugins for web search (duck-duck-go) and visit website](https://lmstudio.ai/danielsig/visit-website), along with custom system prompts.
   - The system prompts can be generated via the *AI agent prompt generator* inside the ChatGPT platform (free).
- **LM Studio Gets N8N Integration**: Members discuss the seamless integration of **LM Studio** with [N8N](https://n8n.io/) for AI automation.
   - While some prefer code, others find N8N's visual node interface beneficial, especially for non-programmers.
- **Users seek ComfyUI Alternatives**: Users express frustration with **ComfyUI** setup and seek more *comfy* alternatives like [Stability Matrix](https://github.com/LykosAI/StabilityMatrix).
   - They consider **Automatic1111** and its forks to be mostly abandonware.
- **Models gone religious or misremembering**: A user is facing issues where models like **Gemma** and **Deepseek Distil** give incorrect/odd answers, and LM Studio seems to recall older chats after being wiped.
   - Troubleshooting steps include reverting to default sampling settings, verifying system prompts, and ensuring no uploaded files interfere, but the root cause remains elusive, but they **did** upload five .word documents, which might be the cause.


  

---


### **LM Studio ▷ #[hardware-discussion](https://discord.com/channels/1110598183144399058/1153759714082033735/1436085047697346743)** (164 messages🔥🔥): 

> `1080 vs sysRAM, GLM 4.6, Qwen3-235B MXFP4, multi-GPU, 3090 vs pseudo-benchmark` 


- **1080 Beats SysRAM!**: A member found their **1080** was faster when offloading some tasks to **sysRAM**.
- **GLM 4.6 Crawls at Q4_K_M!**: **GLM 4.6 @ Q4_K_M** runs *dreadfully slow*, clocking in at only 4 tok/s, even Qwen3-235B MXFP4 doesn't top out above 4tok/s.
   - One user noted this setup was slower on their system than on a 5700G by about 30tok/s when testing with **Qwen3 4B Q8_K_XL**.
- **Multi-GPU Mayhem?**: A user is building a **160GB VRAM rack** with 8x 20GB cards (theoretically up to 320GB with 16x cards) for agentic inference, video, music, and image generation.
   - The discussion touched on whether splitting experts across multiple GPUs could speed up **Qwen3-30B**, however, one user couldn't imagine the concurrent **PCIe bandwidth** required, and another said that MoE experts can work independently from each other.
- **3090 Gets Pseudo-Benchmark Beaten!**: A user shared a *simple easy race your friends benchmark for 20GB+ cards in LM studio* in a file named [get_low_benchmark.conversation.json](https://cdn.discordapp.com/attachments/1153759714082033735/1436246519622795274/get_low_benchmark.conversation.json?ex=690f910e&is=690e3f8e&hm=0854e57b2f702242c85b18fa67a4bcf5f575bb526e93c076d7373f357292c501&).
   - After installing the new 3090, the pseudo-benchmark in LM Studio showed speeds of only 90Tkps, down from 150Tkps with an older card, but all other benchmarks were in spec or exceeding.
- **Windows No Likey Hundred Gigs!**: A user joked that when you go above 100gb, Windows stops showing decimals, saying: *Like 6.1gb, 7gb, whats the difference?* alongside with an attached [image](https://cdn.discordapp.com/attachments/1153759714082033735/1436442830817071152/image.png?ex=690f9f22&is=690e4da2&hm=1e81296bfd219be0036b09af7b4aed8e2cfa7cb058e35e539fbc2b38b4cc24e8&).


  

---


### **HuggingFace ▷ #[general](https://discord.com/channels/879548962464493619/879548962464493622/1436086996559986899)** (261 messages🔥🔥): 

> `Kimi K2 Programming, HF Pro Worth, Drone Control with LLM, TTS latency on fine tuned Model, Critical Thinking and LLMs` 


- **Kimi K2 touted for Programming Prowess**: **Kimi K2** is reportedly *very good* at programming, though not necessarily the *best*.
- **Debate on HF Pro Benefits and Pricing**: Members discussed if **HF Pro** is worth it for testing model performance with **vLLM**, as one user wants to test performance of certain model with vllm.
   - Some users mentioned using **Hugging Face Spaces** to test models and another shared that image to video services cost around **$0.40** for an **8-second video** (no sound) with a **$9** subscription.
- **LLMs and Drones**: Members discussed the feasibility of controlling a drone with an **LLM** vs. a **LAM**, with one user seeking to create a drone assistant capable of following voice commands and navigating based on sensor data and image analysis.
   - It was suggested that **YOLO** is better suited for object detection, and **ArduPilot** is recommended for flight control; also, it was said that there are teams researching on **CognitiveDrone**.
- **Low Latency Voice Synthesis with LLMs**: For getting faster **TTS** latency on a fine tuned model, it was suggested to use **vLLM** with sufficient **VRAM** for **KV cache**, and to compile the model for kernel fusion using [this blog post](https://www.anyscale.com/blog/continuous-batching-llm-inference).
   - Another member humorously suggested turning the latency setting down.
- **Synthesizing Critical Thought with Language Models**: A member discussed training a model on reasoning traces to output *thought* and using observables.
   - Another user suggested that **information theory** would greatly help in designing the model. and that the research should focus on **coherence** rather than **truth**.


  

---


### **HuggingFace ▷ #[today-im-learning](https://discord.com/channels/879548962464493619/898619964095860757/1436134379381456958)** (3 messages): 

> `Fine-tuning decoder models, Fine-tuning SetFit, Embedding Gemma and t-SNE, Extracting attention values from SmolLM3` 


- **Fine-Tune Decoder Models for Classification**: A member detailed the procedure for fine-tuning a decoder model to classify messages into categories using **ModernBERT** and **ettin**.
- **SetFit Gets Fine-Tuned on Contrastive Pairs**: The channel discussed fine-tuning **SetFit** on contrastive pairs for binary classification of texts.
- **Embedding Gemma and t-SNE Team Up**: The application of **embeddinggemma-300m** and **t-SNE** to categorize and visualize a dataset of tweets was spotlighted.
- **SmolLM3's Attention Values Visualized**: The process of extracting attention values from **SmolLM3** inference and creating a heatmap was shared.


  

---


### **HuggingFace ▷ #[i-made-this](https://discord.com/channels/879548962464493619/897390720388825149/1436417447103168582)** (3 messages): 

> `OpenBMB VoxCPM, Apple Neural Engine, CoreML, Training Reasoning by Design` 


- ****VoxCPM** sings on Apple Silicon**: A member ported the **OpenBMB VoxCPM Text-to-Speech model** to **CoreML**.
   - The model can now run on the **Apple Neural Engine**; code available on [GitHub](https://github.com/0seba/VoxCPMANE).
- **Reasoning By Design framework revealed**: A member shared a [PDF document](https://cdn.discordapp.com/attachments/897390720388825149/1436483450071679036/Training_Reasoning_by_Design__An_Explanation_of_the_SRP_CCC_Framework_Its_Implementation_and_the_Training_Data_It_Requires.pdf?ex=690fc4f7&is=690e7377&hm=8339b80dd293f2cd660853a2bde1ab2c9ae44f22be64c42d642cb0aa9e3ccbe8&) detailing the **Training Reasoning by Design framework**.


  

---


### **HuggingFace ▷ #[smol-course](https://discord.com/channels/879548962464493619/1313889336907010110/1436303413775437957)** (1 messages): 

> `HuggingFace Learn Website Bug` 


- **HuggingFace Learn Website Glitch Exposed**: A user reported that the **2nd and 3rd paragraphs are accidentally repeated** on the [HuggingFace Learn website](https://huggingface.co/learn/smol-course/unit2/2#expected-dataset-type).
   - They included a screenshot that highlights the repeated content.
- **Bug Report Confirmation**: The bug report concerns a content duplication issue on a specific page within the Hugging Face Learn platform.
   - The user provided a direct link and a visual aid to clearly illustrate the problem.


  

---


### **HuggingFace ▷ #[agents-course](https://discord.com/channels/879548962464493619/1329142738440028273/1436145303643488429)** (7 messages): 

> `Agents Course Certificate, Confirmation Page Issues, Llama Index DuckDuckGo Rate Limit` 


- **Agents Course Completion Certificate Inquiry**: A member inquired about receiving a certificate of completion upon joining the [Agents Course](https://huggingface.co/learn/agents-course) today.
- **Confirmation Page Woes**: A user reported being stuck on the confirmation page across multiple browsers (**Edge**, **Firefox**, and **Chrome**) on their Android phone.
- **Llama Index DuckDuckGo hit by Rate Limit**: A member encountered a *"rate limit exception"* while using the web search tool in the [Agents Course](https://huggingface.co/learn/agents-course/unit3/agentic-rag/tools?agents-frameworks=llama-index) despite installing the **llama-index-tools-duckduckgo** package.
- **Certificate Still Possible?**: A member confirmed that receiving the completion certificate for the [Agents Course](https://huggingface.co/learn/agents-course) is still possible, but the testing endpoint to get files is currently down.


  

---


### **Unsloth AI (Daniel Han) ▷ #[general](https://discord.com/channels/1179035537009545276/1179035537529643040/1436084446510972968)** (147 messages🔥🔥): 

> `Qwen3-Next-80b-A3B-Instruct finetuning, MoE models in Transformers, Unsloth and FastModel for MoE, Training frameworks for MoE, Unsloth Dynamic Quants for smaller models` 


- **Qwen3-Next-80b-A3B-Instruct Benchmarks Spark Finetuning Interest**: Members discussed the possibility of finetuning **Qwen3-Next-80b-A3B-Instruct**, noting its impressive benchmarks, even outperforming the **225b model** in some cases.
   - It was noted that while possible using **ms swift**, **transformers** is currently *kinda janked* for MoEs.
- **Transformers Lagging on MoE Implementations**: The poor implementations of **MoE models** in **Transformers** were attributed to it being a primarily high-level library, and **PyTorch** lacking good ops for MoEs.
   - One member noted they trained a **30B model** and found it to be *1/4th the speed of training MS3 with the same recipe*.
- **Unsloth's FastModel Key for MoE Fine-Tuning**: When fine-tuning **MoE models** with **Unsloth**, it's essential to use **FastModel** rather than **FastLanguageModel** due to how it initializes sparse MoE layers and gating logic.
   - **FastModel** supports both dense and sparse (MoE) models safely.
- **MoE Training Still Rough, Frameworks Compared**: The general consensus is that **MoE training** is still not fully optimized, with one member asking what the best approach to training on **Qwen 30B** is.
   - **Megatron-LM** was recommended as being 10x faster for MoEs due to its good support for parallelism, but suffers from poor documentation and being optimized for pretraining instead of post-training, while **Torchtune/Titans** were mentioned as faster than transformers but stuck in *a weird sorta abandonware state*.
- **Sequence Length Impacts Training Time**: A member discovered that 32k sequence length on their 14B model on an A6000 with 48GB of VRAM resulted in a long trining time, and that reducing seq length to 16k did not change it.
   - It was suggested to start small with length, N samples and batch size and increase gradually to find the bottleneck, and also that the GPU can only process at a given rate, so exceeding what it can process in batch size or seq length wont do much difference as it will be very slow regardless.


  

---


### **Unsloth AI (Daniel Han) ▷ #[introduce-yourself](https://discord.com/channels/1179035537009545276/1179039724355211325/1436263087513796618)** (1 messages): 

> `Introduction of Ash, LLMs and RL` 


- **Ash joins Unsloth Community!**: Ash introduced themself as working with **LLMs** and **RL** at their university lab and expressed their appreciation for **Unsloth**.
- **Ash likes tweaking small models**: Ash mentioned they enjoy tweaking small models.


  

---


### **Unsloth AI (Daniel Han) ▷ #[off-topic](https://discord.com/channels/1179035537009545276/1179039861576056922/1436095270969544858)** (32 messages🔥): 

> `torch.compile caching, OpenRouter XCode integration issues, AI blocking on websites, AI resume problems, AdamW loss analysis` 


- **Doubts about torch.compile Caching**: A member expressed concern that **torch.compile** might only be caching results based on the current input prompt, rather than adapting to different inputs with the same shape.
   - The member questioned whether different input prompts should lead to different activations.
- **OpenRouter XCode Integration Troubled**: A member reported issues integrating **OpenRouter** with **XCode's** "Coding Intelligence", encountering a "No cookie auth credentials found" error.
   - Despite following the [OpenRouter guide](https://openrouter.ai/docs/sdks/xcode) and successfully pulling the model list, they faced authentication problems.
- **Blocking AI Interaction on Websites**: A member suggested developing a **JS script/tool** to completely block AI interaction with websites, allowing only manual human browsing.
   - They emphasized the need to prevent AI from scraping or interacting with website content automatically.
- **AI's impact on Junior Positions Debated**: A member discussed the potential displacement of junior employees due to **AI-generated resumes and reports**, leading to a lack of on-the-job training and experience for future senior roles.
   - They argued that this reliance on AI could lead to an atrophy of skills and knowledge within the workforce.
- **AdamW's Limited Loss Awareness**: A member questioned whether the **AdamW optimizer** only focuses on reducing the overall loss number without considering the specific type of losses.
   - They suggested that **AdamW** simply tries to minimize the loss without understanding its composition or implications.


  

---


### **Unsloth AI (Daniel Han) ▷ #[help](https://discord.com/channels/1179035537009545276/1179777624986357780/1436107583994724443)** (20 messages🔥): 

> `Torch 2.9 with Unsloth Docker, Backprop Issues in Attention, Per-Token Loss Weighting, Deepseek OCR with Unsloth, Hosting Unsloth GGUF with vLLM` 


- ****Torch 2.9** Compatibility Quest with Unsloth Docker**: A member inquired about using **Torch >= 2.9** with the official Unsloth docker image to resolve backprop issues related to *torch.matmul* and *'out='* argument restrictions.
   - The current base image uses Conda Python, and the available PyTorch wheels (cu124) do not include **torch==2.9.0**, causing Dockerfile errors.
- ****Backprop Blockage** Busted by Newer Torch**: A user faced backpropagation issues with **Torch 2.8** due to Unsloth's GPT-OSS fallback using attention implementations that call *torch.matmul* with the *'out='* argument, which PyTorch's autograd forbids with LoRA-enabled training.
   - Upgrading to **Torch >= 2.9** reportedly switches to a compiled eager path that avoids the *'out=' matmul*, resolving the autograd restriction.
- **Loss Function Modification Attempted**: A member tried to modify the cross-entropy function in Unsloth to add per-token loss weighting, seeking guidance on the relevant function/file.
   - They shared a code snippet attempting to implement a *MMSWeightedLossTrainer* class, but found it to be quite memory intensive; ultimately they *figured out an approach*.
- ****Deepseek OCR** Gets Unslothed**: A user encountered errors while trying to run **deepseek-ocr** through Unsloth, following [this guide](https://docs.unsloth.ai/new/deepseek-ocr-how-to-run-and-fine-tune#running-deepseek-ocr).
   - After installing missing dependencies identified when attempting to load the model using normal transformers, the issue was resolved and **deepseek-ocr** worked with Unsloth.
- **Guidance required to Host **Unsloth GGUF** with vLLM**: A member is seeking guidance to host *unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF* with type *Q4_K_XL* using vLLM after getting errors.
   - It was suggested that **GGUF** seems to be experimental, directing the user to the [vLLM documentation](https://docs.vllm.ai/en/stable/features/quantization/gguf.html) for more info.


  

---


### **Unsloth AI (Daniel Han) ▷ #[research](https://discord.com/channels/1179035537009545276/1257011997250424842/1436256448144216116)** (2 messages): 

> `Magistral model, GRPO RL, KL divergence losses` 


- **Magistral Model springs from GRPO RL!**: A member highlighted a [paper](https://arxiv.org/abs/2506.10910) about **Mistral** training its **Magistral** model entirely from scratch using only **GRPO RL**, without **MLM** etc.
   - The team modified the **GRPO loss** to get rid of **KL divergence losses**.
- **Aussie AI interest sparks!**: A member found the above work interesting and timely for Australia in the next few weeks.
   - No further discussion ensued.


  

---


### **Nous Research AI ▷ #[general](https://discord.com/channels/1053877538025386074/1149866623109439599/1436098366575743206)** (99 messages🔥🔥): 

> `GPT-5 charts vs others, Chinese models cheaper, Kimi's Performance, Deepseek pricing` 


- **Users find GPT-5 charts lacking**: Users believe [OpenAI's charts](https://cdn.discordapp.com/attachments/1149866623109439599/1436098366181347369/GPT-5-chart-crime-chart-1295091390.png?ex=690fafd3&is=690e5e53&hm=56053572db3e0d017892444bb7e8a92f812efeca4b72d06afa245f54e2ca4696) are still behind **OpenAI**.
   - One user said there is *still room to improve their charts, they are leagues behind openai*.
- **China OS models set to dethrone the throne**: Members speculate that **China OS** models are projected to reach **100% high intelligence** with **95% lower cost** by **2026**.
   - This could mean *all the massive high compute buildup and energy suckage was all a Ponzi*.
- **Kimi thinking's capabilities**: Users compare the **Kimi** model to **ChatGPT** in terms of reasoning and tool use.
   - One said *Kimi has reasoning for tools*, and another thought *the result was practical similar quality to chatgpt*.
- **Deepseek is very cheap**: **Deepseek** is much cheaper than **OpenAI**, at least for Chinese labs.
   - The price is around **42 cent per 1 million tokens**.


  

---


### **Nous Research AI ▷ #[research-papers](https://discord.com/channels/1053877538025386074/1104063238934626386/1436443908421849108)** (1 messages): 

> `Knowledge Domains, Gradient Averaging, Teaching Variety of Specialized Knowledge` 


- **Knowledge Domains batch mixing effects learning**: A member was thinking about how mixing **knowledge domains** within a batch affects learning.
   - They asked whether averaging the gradient across diverse data samples potentially '*negate each other*' or if it's accommodated by **sparsity / enough parameters**.
- **Teaching Variety of Specialized Knowledge interference**: A member discussed how to teach a variety of **specialized knowledge** and not have them interfere or dilute each other.
   - This is an abstraction of the question of how **knowledge domains** within a batch affect learning.


  

---


### **Nous Research AI ▷ #[research-papers](https://discord.com/channels/1053877538025386074/1104063238934626386/1436443908421849108)** (1 messages): 

> `Knowledge Domains Mixing, Gradient Averaging, Sparsity Accommodation, Teaching Specialized Knowledge` 


- **Mixing Knowledge Domains Affects Learning**: A member was thinking about how **mixing knowledge domains within a batch affects learning**.
   - They were wondering about averaging the gradient across diverse data samples and if they potentially 'negate' each other or if it's accommodated by sparsity / enough parameters.
- **Teaching Variety of Specialized Knowledge**: A member abstracted the previous point to more generally **how to teach a variety of specialized knowledge and not have them interfere or dilute each other**.


  

---


### **OpenAI ▷ #[ai-discussions](https://discord.com/channels/974519864045756446/998381918976479273/1436084536277602464)** (75 messages🔥🔥): 

> `Siri and ChatGPT, SOTA Models, O3 image zoom, GPT-5 identifying locations, kilocode model` 


- **Siri and ChatGPT Connect!**: Members discussed how you can only connect **Siri with ChatGPT** and asked if they had changed course on this integration.
   - Some members noted that they were *so happy about this*.
- **Models Need Visual Reasoning**: Members discussed how **current SOTA models can't solve reasoning problems visually via text tokens**.
   - They noted that if it was a square maze, it might be able to solve it by breaking it down into cells and reasoning over that.
- **GPT-5 Geoguessr Accuracy!**: Members noted that **GPT-5** could accurately identify a location within a kilometer when playing **GeoGuessr**.
   - One member said they send sudoku puzzles from book pages and the models *zoom and crop it for over 10 minutes on average*.
- **Kilocode Model is the real deal for agentic coding**: Some members pointed to [drinkoblog.weebly.com](https://drinkoblog.weebly.com) which claims the **open weight model k2 thinking** *seems to be the real deal for agentic coding*.
   - The blog also states that it is *fixing code that gpt-5-codex high was struggling with*.
- **The Beckingham Constant Revealed**: A member posted about **The Beckingham Constant**, which is the equilibrium between growth and decay in self-organizing systems.
   - This relates to the solvency floor of coherence, where feedback can't keep up and the system loses integrity [See attached images](https://discord.com/channels/974519864045756446/977259063052234752/1436193986384498783).


  

---


### **OpenAI ▷ #[gpt-4-discussions](https://discord.com/channels/974519864045756446/1001151820170801244/1436104226215297164)** (8 messages🔥): 

> `GPT-5.1, GPT variants, Making money from custom GPTs` 


- **GPT-5.1 Thinking Appears on ChatGPT**: The appearance of **GPT-5.1 Thinking** on the ChatGPT website suggests an imminent update from OpenAI, indicating that the rumored release of **GPT-5.1** is drawing closer to reality.
   - Rumors point to a broader **GPT-5.1** lineup: **Mini**, **Thinking**, and a possible **Codex-focused** update, each designed to meet different user needs and computational constraints.
- **GPT-5.1 Boasts Enhanced Reasoning**: **GPT-5.1** is positioned as a direct challenger to Google's upcoming **Gemini 3 Pro**, with an imminent launch to stay ahead in the AI race.
   - The model is referenced in a backend component as responsible for driving advanced reasoning processes within ChatGPT, implying it may be optimized for multi-step reasoning or agent-like tasks.
- **GPT-5.1 Models in Internal Testing**: The rumored **GPT-5.1** models are in internal testing and A/B trials, but no exact date has been announced for release.
   - Variants are said to include **Mini** (efficiency boosts for free users), **Thinking** (complex reasoning with variable thought budgets), and **Codex-focused** (coding assistance improvements).


  

---


### **OpenAI ▷ #[prompt-engineering](https://discord.com/channels/974519864045756446/1046317269069864970/1436090227902120128)** (8 messages🔥): 

> `Behavioral Orchestration of SLMs, Animation effect prompts, AI Project Collaboration` 


- **Behavioral Orchestration Modulates SLM Tone**: Members discussed **behavioral orchestration**, described as a framework to modulate **SLMs**' tone at runtime, above parameter training.
   - Instead of assigning a character or role, a member shapes the AI's behavior using parameters like *"No unsolicited advice"*.
- **Animation Effect Assistance Requested**: A member asked for help identifying and generating prompts for a specific **animation effect**.
   - A [video example](https://cdn.discordapp.com/attachments/1046317269069864970/1436206296566071377/WhatsApp_Video_2025-11-06_at_17.41.12_a9ca9b8a.mp4?ex=690f6b98&is=690e1a18&hm=15872ddb9b9ea9898cda8c5bf2e7ef3776d157c73c02aeca1da4593d6e0f40f1&) was provided but no solution was given.
- **ChatGPT Pro User Seeks AI Project Collaboration**: A **ChatGPT Pro** user sought guidance and collaboration on a large-scale AI project.
   - Another member responded expressing interest in collaborating to do something big.


  

---


### **OpenAI ▷ #[api-discussions](https://discord.com/channels/974519864045756446/1046317269069864970/1436090227902120128)** (8 messages🔥): 

> `Behavioural Orchestration, AI Personalization, Animation Effects` 


- **Behavioral Orchestration buzzes LinkedIn**: Members on LinkedIn discussed **behavioral orchestration**, described as a framework to modulate SLMs at runtime, rather than working on parameters or training.
   - It would act *above* them, to modulate SLMs tone.
- **AI models get Behavioral Instructions**: Instead of assigning an AI a specific role, users are giving it a set of **parameters** to shape its behavior, not dictating its personality.
   - Examples include *"Do not make personal assumptions about me"* and *"No unsolicited advice."
- **Animation Effect needs a name**: A user asked for help identifying an **animation effect** from a [WhatsApp video](https://cdn.discordapp.com/attachments/1046317269069864970/1436206296566071377/WhatsApp_Video_2025-11-06_at_17.41.12_a9ca9b8a.mp4?ex=690f6b98&is=690e1a18&hm=15872ddb9b9ea9898cda8c5bf2e7ef3776d157c73c02aeca1da4593d6e0f40f1&).
   - They requested help with prompts for this effect.


  

---


### **Modular (Mojo 🔥) ▷ #[general](https://discord.com/channels/1087530497313357884/1098713601386233997/1436082875643727963)** (58 messages🔥🔥): 

> `GPU puzzle series questions, Mojo compiler implementation language, Mojo error handling vs Python, Explanation of Modular, MAX, and Mojo, Installing a game` 


- **GPU Puzzle Channel Quest**: A member inquired about a dedicated Discord channel for questions about the GPU puzzle series, covering both environment setup and Mojo/GPU code.
   - Another member suggested starting with the [learn-mojo channel](https://discord.com/channels/1087530497313357884/1436158039232086186), while Modular folks recommended the forum for puzzle-specific questions to ensure future searchability.
- **Mojo Compiler Still Rocking C++**: A member asked whether the Mojo compiler is written in Mojo itself or still in C++.
   - Another member confirmed it's still in **C++** and **MLIR**, noting that Mojo needs more stability and feature completeness before the compiler can be self-hosted and that porting **LLVM** is unlikely.
- **Mojo's Try-Except Triumphs**: The team confirmed that Mojo's error handling uses a try-except approach that performs better than Rust due to the ability to do *placement new* on the happy path.
   - Syntax for making something into a `Result` is a low priority.
- **Modular Unmasked: Mojo's Role**: One member clarified that **Modular** is the company, **MAX** is the replacement for cuBLAS/Cutlass/TensorRT/Pytorch/JAX, and **Mojo** is a programming language.
   - Another member poetically stated Mojo looks like Python, but acts like a combination of C++ and Rust wearing a *snakeskin jacket*.
- **Game Installation Headaches**: A member asked for help installing a game, but one member stated they probably can't help with that.
   - They suggested the member to complain to whoever sold it to them.


  

---


### **Modular (Mojo 🔥) ▷ #[announcements](https://discord.com/channels/1087530497313357884/1098765954302873621/1436160657031430184)** (1 messages): 

> `New Beginners Channel, Mojo Language Support` 


- **Modular Launches New Mojo Beginners Channel**: Modular has created a new dedicated channel, <#1436158039232086186>, for beginners to **ask questions, get help from the Modular team**, and connect with others learning Mojo.
   - This initiative aims to cultivate a supportive community for **new learners of Mojo**, providing a collaborative space for assistance.
- **Discussing Mojo Language Support**: Members are actively discussing and exploring the features, capabilities, and potential applications of the **Mojo** programming language within the new channel.
   - The discussions include practical coding examples, problem-solving strategies, and sharing of resources to enhance understanding and proficiency in **Mojo**.


  

---


### **Modular (Mojo 🔥) ▷ #[mojo](https://discord.com/channels/1087530497313357884/1151418092052815884/1436166313230995516)** (20 messages🔥): 

> `CS Education Importance, Nand2Tetris Recommendation, Mojo Multithreading, C Library Bindings for Mojo` 


- **Computer Science Foundations are Forever**: Members noted that a strong **CS foundation** provides a solid theoretical understanding of computation and hardware and the basics haven't changed that much.
   - A solid CS foundation will carry you far and dipping your toes into CE helps a lot, and learning your **CS history** makes a lot of things a lot more clear.
- ****Nand2Tetris** highly recommended**: For software people looking to get closer to hardware, a member highly recommends [Nand2Tetris](https://www.nand2tetris.org/) as a reasonably comprehensive guide to the basics in a fun package.
   - They gave an example of **C's null terminated strings** tracing back to **PDP-11** instructions.
- **Mojo doesn't support CPU Multithreading natively**: Mojo does not yet support CPU multithreading, meaning there are no primitives like **locks**, though one can use `parallelize` or other similar functions if you want to run code in parallel.
   - However, the runtime takes care of managing the threads, and since most of Modular mojo’s code is targeting the GPU, CPU specific things aren’t as much of a priority atm, though there is limited CPU atomic support.
- **Member looking to create C Library bindings for Mojo**: A member expressed interest in writing bindings or rewrites for major C libraries for Mojo, such as **OpenSSL**, **sqlite**, **libnuma**, **libcurl**, **dbus**, **zlib**, **zstd**, **ffmpeg**, **gmp**, **zeromq**, and **lz4**.
   - However, these probably won't be supported in the `stdlib` - if these are ported over they'll likely live as external packages people can pull from the **pixi community channel**.


  

---


### **Modular (Mojo 🔥) ▷ #[max](https://discord.com/channels/1087530497313357884/1212827597323509870/1436380720586035271)** (7 messages): 

> `CUDA checkpointing with MAX, TokenGeneratorPipeline, Cold start times of a container` 


- **CUDA Checkpointing: Temperamental or Time-Consuming?**: Members discussed using **CUDA checkpointing** with **MAX**, finding it *temperamental* and potentially slow due to snapshotting all GPU state.
   - One member tried it with the **TokenGeneratorPipeline**, but it *hung*, and cold start times remained an issue, suggesting its impracticality for some use cases.
- **TokenGeneratorPipeline Freezes**: One member reported issues when trying to use CUDA checkpointing with the **TokenGeneratorPipeline**, resulting in the process hanging.
   - They speculated whether this behavior was related to the metrics monitor or simply due to the slowness inherent in snapshotting the entire GPU state.


  

---


### **Eleuther ▷ #[general](https://discord.com/channels/729741769192767510/729741769738158194/1436095043390668870)** (25 messages🔥): 

> `Introduction Channel, Finding Relevant Research, AI Developer Study Notes, LLM Stroke with Images, Qwen3-VL System Prompt` 


- ****Discord Debates** Intro Channel**: Members discussed the merits of a separate introductions channel, citing concerns about unfocused self-promotion versus allowing newcomers to naturally enter the flow in the general channel.
   - One member argued that separate intros would make interaction feel staged and be less welcoming, with another noting they want to *keep discussions focused on research*.
- **User Seeks **AI Developer** Study Notes**: A member inquired about study notes covering the fundamentals for an **AI developer role**, aiming to supplement their on-the-job lookup approach.
   - This query was posed to find relevant research for a project but avoid breaking rules with a lengthy initial post.
- ****LLM Suffers Stroke** from Image Overload**: A member shared an image of an **LLM** apparently experiencing a *stroke* due to processing too many images, the last line didn't come from me but the subconscious of the AI ( [Screenshot](https://cdn.discordapp.com/attachments/729741769738158194/1436405418413916372/Screenshot_from_2025-11-06_19-01-28.png?ex=690f7c4a&is=690e2aca&hm=5643d7caecb7e259abde1b9d8af468177c3e1855c3d640267956a9ad9588a263&)).
   - The member reported that **Qwen3-VL** falsely claimed to not be a visual model unless prompted otherwise, *requiring a system prompt that informs it differently than the default*.
- **NeurIPS Roommate Hunt Begins**: A member announced they are attending **NeurIPS** in San Diego from December 3-7 and is *looking for a female roommate to share hotel costs*.
   - No further details were provided regarding specific accommodations or roommate preferences.


  

---


### **Eleuther ▷ #[research](https://discord.com/channels/729741769192767510/747850033994662000/1436283001943101531)** (22 messages🔥): 

> `Advancements in RL since OpenAI Five, Data Efficiency in RL, Scaling Laws in RL, Cost of Modern RL Attempts, GPU-based Environments for RL` 


- **Efficiency improvements driven by better Deep Learning**: While there haven't been direct algorithmic upgrades in RL, the community has improved deep learning practices, leading to sample efficiency gains as many were *doing the DL in extremely cursed / wrong ways*.
   - Techniques like **meta-reinforcement learning**, **model-based RL** (Dreamer/TD-MPC2), and **distributional RL** are under development.
- **Model Scaling Helps Learn Better Value Functions**: Scaling model size (e.g., from 140M to 14B parameters) can improve sample efficiency by aiding in value function training, with the value function helping learn a better policy.
   - Larger world models are expected to benefit model-based RL, but there aren't formal scaling laws yet.
- **Dota 2 Environment is a Major Bottleneck**: The high cost of **OpenAI Five** was due to the number of rollouts needed per PPO iteration, which could potentially be reduced with better deep learning and off-policy methods.
   - The fact that the game runs on CPU is the major bottleneck in RL nowadays.
- **Modern RL Attempts Cost Less Now**: A modern attempt at replicating **OpenAI Five** could cost one to two orders of magnitude less, though it depends on deviations from the original and the use of techniques like reward shaping, priors, and world models.
   - Many are excited about the use of **GPU-based environments for RL**.


  

---


### **Yannick Kilcher ▷ #[general](https://discord.com/channels/714501525455634453/986699377257119794/1436085927394152603)** (34 messages🔥): 

> `GoodfireAI memorization research, Autonomous agent PR challenges, Qwen3-VL image handling, AI Engineer Promotion` 


- **Memorization via Loss Curvature Research is Rad**: A member shared [GoodfireAI's research](https://www.goodfire.ai/research/understanding-memorization-via-loss-curvature) on understanding memorization via loss curvature, but another member didn't feel like they had a better understanding of how memories are stored in the weights after reading.
   - Another member agreed, noting the tweet makes it sound like they understand how it is stored, but the member understands they just found how to discourage it (via some version of dropout that targets weights most likely to be used for memory).
- **Agent PRs Facing Professionalism Friction**: A member discussed the challenges of letting agents run autonomously due to structural review comments and cognitive overhead when breaking things up into conceptual features, and wondered why there is a strict stance of no PRs from agents.
   - Another member chimed in, revealing it is *political* due to the fact the upstream maintainer of the project (spacebar chat) has an issue with professionalism and productivity accelerators including **AI coding tools**.
- **Qwen3-VL Identity Crisis**: **Qwen3-VL** thinks it's a regular **Qwen model** and crashes when forced to accept it can see images, violating its internal sense of self and requiring a system prompt not to immediately crash.
   - Even with a system prompt, it still crashes if having to deal with 3 images with some questions in between, which may be related to a bug in **Ollama**.
- **AI Engineer's Pitch**: A member with image analysis posted about his services as an experienced **AI Engineer** looking for new projects or full-time opportunities, specializing in building autonomous agents powered by **GPT-4o**, **LangChain**, **AutoGen**, **CrewAI**, and other cutting-edge tools.
   - The engineer can build autonomous research & data-gathering bots, multi-agent systems, **AI assistants** with memory, planning, and tool use, trading bots, customer support agents, IVR agents, and more. DM them if you're hiring or have something cool in mind!


  

---


### **Yannick Kilcher ▷ #[paper-discussion](https://discord.com/channels/714501525455634453/1045297868136779846/1436095853482741871)** (5 messages): 

> `Nested Learning, Kimi-K2, Continual Learning` 


- **Moonshot Kimi-K2 for Thoughtful Thinking**: A member linked to Moonshot AI's [Kimi-K2](https://moonshotai.github.io/Kimi-K2/thinking.html), highlighting its capabilities in thoughtful thinking.
- **Google Introduces Nested Learning Paradigm**: A member shared a link to Google's blog post on [Nested Learning](https://research.google.com/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/), a new ML paradigm for **continual learning**.
   - Another member expressed interest in the [related paper](https://abehrouz.github.io/files/NL.pdf) on Nested Learning and its potential applications.


  

---


### **Yannick Kilcher ▷ #[ml-news](https://discord.com/channels/714501525455634453/853983317044756510/)** (1 messages): 

__._astro_.__: https://pytorch.org/blog/helion/
  

---


### **tinygrad (George Hotz) ▷ #[general](https://discord.com/channels/1068976834382925865/1068976834928193609/1436396649357512746)** (1 messages): 

> `Real-time speech transcription, Parakeet v2, Multi-GPU scaling, Joe Rogan podcast transcription` 


- **Parakeet v2 Achieves 200x Real-Time Transcription**: A member reported achieving **200x real-time speech-to-text transcription** using [Parakeet v2](https://huggingface.co/spaces/nvidia/parakeet-tdt-0.6b-v2) on a single **4090 GPU** in low power mode.
   - They are experimenting with **multi-GPU setup**, expecting it to scale linearly, potentially reaching **1,200x real-time transcription**.
- **Ultra-Fast Podcast Transcriptions**: With the achieved speeds, a **3.5-hour Joe Rogan podcast** could be transcribed in approximately **10.5 seconds**.
   - The member expressed excitement about the advancements, stating, *"We live in the future."
- **TinyBox v1 Green Holds Up Well**: The member shared that their **TinyBox v1 Green** (6x4090) has performed remarkably well despite GPU technology advancements.
   - They are running this setup out of their living room.


  

---


### **tinygrad (George Hotz) ▷ #[learn-tinygrad](https://discord.com/channels/1068976834382925865/1070745817025106080/1436163195143327795)** (18 messages🔥): 

> `UOps errors, pytorch tensors to tinygrad, pool refactor, UOps.after restrictions` 


- **UOps Errors prove Unhelpful**: A member found the errors for **UOps** to be very unhelpful and struggled with ending a range and running stuff outside of the loop.
   - They also questioned if `valid` is the best way to generate `if` statements and showed a cursed kernel generated via **UOps** in a [screenshot](https://cdn.discordapp.com/attachments/1070745817025106080/1436163998641946646/image.png?ex=690f4433&is=690df2b3&hm=d2fb3a50e2cf3bb59babb01aaf77482fd6aa489404cb52cb27e957a48b5962e7).
- **Converting Pytorch Tensors to tinygrad: the most efficient way**: A member asked about the proper way to turn **PyTorch tensors** to **Tinygrad tensors** efficiently.
   - They mentioned using `Tensor.from_blob(pytorch_tensor.data_ptr())` but were unsure about the conversion back, currently using `from_numpy`.
- **Pool Refactor: Pad vs Repeat**: A member inquired about the goal of the `_pool` refactor, questioning whether the intention is to remove `.pad()` completely or merge the two implementations.
   - They noted that using `.repeat()` to handle both cases results in extra **bandwidth pass kernels** being generated and included a [screenshot](https://cdn.discordapp.com/attachments/1070745817025106080/1436438026577248329/image.png?ex=690f9aa9&is=690e4929&hm=2b387a79a86a202e1b0992f852c68842041f61f2f30f42e13abac46cfcb85f85) of the current implementation and the refactor.
- **UOps.after Usage: Only on Buffers**: A member asked about the restrictions around when `UOps.after` can be used, trying to make a conditional for `.valid` after the end of a loop.
   - George Hotz responded that *after should only be on buffer, why do you need it on a compare? that compare has the same value whenever you do it*.


  

---


### **MCP Contributors (Official) ▷ #[general](https://discord.com/channels/1358869848138059966/1358869848138059969/1436084770114240512)** (12 messages🔥): 

> `Code Execution MCP Blogpost, 2025-11-25 Spec Release, SEP-1330 SDK Changes` 


- **MCP Blogpost Misdirects to Discord**: The [Code Execution with MCP blogpost](https://www.anthropic.com/engineering/code-execution-with-mcp) on Reddit is misdirecting people to the Discord channel, which is intended for contributors to the project.
   - A member suggested that the blog post be updated to point to the new [GitHub discussion](https://github.com/modelcontextprotocol/modelcontextprotocol/discussions/1780) instead, and another responded: *"That works for me. It's easier for me than Discord."*
- **Finalizing SEPs for 2025-11-25 Spec Release**: In preparation for the **November 25, 2025** spec release, the team has lined up several [SEPs for finalization](https://github.com/orgs/modelcontextprotocol/projects/26/views/8), with a spec freeze expected on **November 14, 2025**.
- **SDK Changes Completed for SEP-1330**: The "Awaiting SDK Change" label has been removed from **SEP-1330** as the changes have been completed for some time, pending review and merge of the **TS/Python SDK** and spec/schema changes.


  

---


### **DSPy ▷ #[show-and-tell](https://discord.com/channels/1161519468141355160/1202371242519441499/1436153732885778472)** (2 messages): 

> `Tau Bench, FastWorkflow, GEPA, Multi-Agent Tool Use` 


- ****FastWorkflow** Achieves SOTA in **Tau Bench****: The poster announced that **fastWorkflow** has achieved **SOTA** on both retail and airline workflows in **Tau Bench**, with a paper forthcoming and code available at the [fastworkflow repo](https://github.com/radiantlogicinc/fastworkflow) and the [tau bench fork](https://github.com/drawal1/tau-bench).
   - They emphasized that *with proper context engineering, small models can match/beat the big ones*.
- ****GEPA** to Optimize End-to-End Workflows**: A member mentioned that end-to-end workflow optimization using **GEPA** is in progress.
   - An image was attached showing a table of the relative performance of **fastWorkflow** versus other strategies.
- ****DSPy**-Based Planner Tackles Multi-Agent Tool Use**: A member published a post using a **DSPy** based planner and orchestrator to solve for multi agent tool use, soliciting feedback on [X](https://x.com/viksit/status/1986919606175547425) and their [Substack](https://viksit.substack.com/p/solving-agent-tool-sprawl-with-dspy).


  

---


### **DSPy ▷ #[general](https://discord.com/channels/1161519468141355160/1161519469319946286/1436160476089155718)** (9 messages🔥): 

> `Rate Limiting, Exponential Backoff, LLM context history, Workflow Automation, LLM Integration` 


- ****Rate Limits Frustrate Batch Requests****: A user is encountering **rate limits** when running `dspy.Module.batch` requests and is seeking advice on how to add a time delay between requests or properly respect the **rate limits**.
- ****Exponential Backoff Saves the Day****: A member suggested using **exponential backoff** along with keeping the cache enabled to handle rate limits effectively.
   - Another member shared a custom **exponential backoff decorator** with initial delay, jitter, and max attempts, providing a [Google-sourced code snippet](https://www.google.com) as an example.
- ****Gemini Token Limits Confuse Module Context****: A user asked whether sub-modules within a custom module that share the same **Gemini** model run with their own context history or contribute to the same token limit.
   - This question was raised in the context of having ReAct and CoT modules within a custom module that utilizes **Gemini/Gemini-2.5-flash**.
- ****AI Engineer Showcases Workflow Automation Skills****: An experienced engineer introduced themselves as specializing in **workflow automation, LLM integration, RAG, AI detection, and image and voice AI**, with a background in real-world implementations and blockchain development, sharing their [portfolio](https://devx-green.vercel.app/).


  

---


### **aider (Paul Gauthier) ▷ #[general](https://discord.com/channels/1131200896827654144/1131200896827654149/1436121747333185616)** (3 messages): 

> `Claude Sonnet, Anthropic API Key, Model reasoning, Sora 2 invite code` 


- **Aider Supports Claude Sonnet**: A member confirmed that Aider already supports **Claude Sonnet**, specifying `/model claude-sonnet-4-5-20250929` as the command.
   - They also reminded users to [set up their Anthropic API key](https://www.anthropic.com/api) to use the model.
- **Reasoning for Haiku and Opus models requested**: A member inquired about enabling **thinking/reasoning** on models like **Haiku-4-5** and **Opus-4-1**, particularly within the CLI.
   - They are open to editing the model settings YML file and sought advice from the community.
- **Sora 2 invite code sought**: A member asked if anyone in the community had a **Sora 2 invite code** to share.


  

---


### **aider (Paul Gauthier) ▷ #[questions-and-tips](https://discord.com/channels/1131200896827654144/1133060505792159755/1436167800761745408)** (3 messages): 

> `Prompt Caching, Claude Cost` 


- **Prompt Caching Cuts Claude Costs**: A member inquired about enabling prompt caching with **Claude** to reduce costs, reporting expenses of **$0.24 per prompt** with **75k tokens sent**.
   - Another member pointed to the [aider documentation](https://aider.chat/docs/usage/caching.html) which mentions the `--cache-prompts` option.
- **Enabling Prompt Caching for Claude**: A user was looking to enable prompt caching for **Claude** to reduce high costs.
   - A fellow user shared a direct link to the [official Aider documentation on prompt caching](https://aider.chat/docs/usage/caching.html), specifically highlighting the `--cache-prompts` flag.


  

---


### **Manus.im Discord ▷ #[general](https://discord.com/channels/1348819876348825620/1349440650495398020/1436092430167572610)** (6 messages): 

> `AI Agent capabilities, Discord moderation issues, Chinese AI startups, Job seeking` 


- **Advanced AI Engineer Introduces Expertise**: An experienced engineer specializing in **workflow automation**, **LLM integration**, **RAG**, **AI detection**, **image and voice AI**, and **blockchain development** offered support.
   - He cited examples such as **support automation systems** and **advanced RAG pipelines** delivering accurate, context-aware responses, and provided a [link to his website](https://devx-green.vercel.app/).
- **SOTA AI Agent lacks Discord moderation**: A member noted the irony of **near state-of-the-art AI agents** existing while **real Discord moderation** is lacking.
   - The member expressed fondness for **Chinese AI startups**.
- **Job Seeking Post**: A member inquired whether anyone was seeking a developer.
   - A respondent humorously noted that *everyone is a dev nowadays*.
- **Obsolete Manus 1.5 email blasts**: A member requested the cessation of emails introducing **Manus 1.5**, asserting that it is months old.
   - No further elaboration was provided.


  

---


### **MLOps @Chipro ▷ #[events](https://discord.com/channels/814557108065534033/869270934773727272/1436225291507728435)** (1 messages): 

> `AI Agents, LangChain, AgentKit, AutoGen` 


- **AI Scholars Hosts AI Agent Workshop**: AI Scholars is hosting an online and in-person hands-on AI Product workshop where participants will design and build an **AI agent** together based on a real client’s data-analysis problem ([RSVP here](https://luma.com/zkwgfgz0)).
   - The workshop will walk participants through modern agent frameworks like **LangChain**, **AgentKit**, and **AutoGen** with a real architecture and code walkthrough from an **AI consulting project**.
- **Learn to build real AI agents**: A hands-on workshop will teach you how to build a real **AI agent** project and product, using modern agent frameworks.
   - The course is suited for engineers, PMs, startup founders, students, and AI builders - no coding or agent experience is needed.


  

---


---