Open Weights is all you need?
AI News for 11/5/2025-11/6/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (200 channels, and 5907 messages) for you. Estimated reading time saved (at 200wpm): 479 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Chatter has been high for a while as Kimi was prepping the open source ecosystem for this release, but the benchmarks are the surprising thing: for the first time, an open model is claiming to beat SOTA closed models (GPT5, Claude 4.5 Sonnet Thinking) at important major benchmarks:
Even more encouraging, Artificial Analysis even volunteered another SOTA in their independent testing:
It is early days, but vibe checks are good.
Thereâs no paper, but the model card has a few more details on the native INT4 training and the 200-300-long tool calling capabilities given the 256k context window.
Congrats Kimi/Moonshot!!
AI Twitter Recap
Moonshot AIâs Kimi K2 Thinking: openâweights 1T INT4 reasoning MoE, longâhorizon tools
- Kimi K2 Thinking (open weights): Moonshot AI launched a 1T-parameter MoE model with 32B active experts, a 256K context window, and robust agentic capabilitiesâexecuting 200â300 sequential tool calls without human intervention. It posts SOTA on HLE (44.9%) and BrowseComp (60.2%), with community âheavy modeâ reports using 8 parallel samples + reflection pushing HLE to ~51% @Kimi_Moonshot, @eliebakouch, @nrehiew_. Early coding/agentic results include 71.3% on SWEâBench Verified and 47.1% on TerminalâBench @andrew_n_carr, with strong showings on additional benchmarks highlighted by benchmark authors @OfirPress and evaluators @ArtificialAnlys. K2 Thinking is trained with quantizationâaware training (QAT) for native INT4 on MoE components, reporting ~2Ă generation speed and halved memory versus FP8 variants; all released benchmark numbers are under INT4 precision @eliebakouch, @bigeagle_xd, @timfduffy.
- Dayâ0 deployments and perf notes: Official vLLM support (nightly) with OpenAIâcompatible API and recipes is live @vllm_project. The model is already available in multiple endpoints (Arena/Yupp, Baseten, app tooling like anycoder and Cline) @arena, @yupp_ai, @basetenco, @_akhaliq, @cline. On Mac, MLX showed native INT4 inference across two M3 Ultras using pipeline parallelism (~3.5K tokens at ~15 tok/s) @awnihannun. Expect transient instability: multiple users reported API slowdowns/timeouts under launch load (âhug of deathâ) @scaling01, @code_star.
New AI silicon and inference stack updates (TPU v7, Apple Mâseries, adaptive decoding)
- Google TPU v7 (Ironwood): Google announced its 7thâgen TPU entering GA in the coming weeks with a 10Ă peak performance improvement vs TPU v5p and >4Ă performance per chip vs TPU v6e (Trillium). Positioned for both training and highâthroughput agentic inference; used internally to train/serve Gemini, and coming to Google Cloud @sundarpichai, @Google.
- Apple inference acceleration: llama.cpp added initial support for Appleâs M5 Neural Accelerators (macOS Tahoe 26), improving TTFT across ggml stacks @ggerganov. Separately, K2 Thinking ran natively in INT4 on dual M3 Ultras via MLX (see above) @awnihannun.
- Adaptive speculative decoding: Togetherâs ATLAS âadaptive speculatorâ reports up to 4Ă faster LLM inference by learning perâworkload in real time @togethercompute.
Agent frameworks, wallets, and managed RAG
- LangChain Deep Agents for JS: Deep Agents is now in TypeScript/JS (on LangGraph 1.0) with planning tools, subâagents, and filesystem access. The team released productionâquality reference tutorials for streaming agents in Next.js and a deepâresearch agent @LangChainAI, @bromann, @hwchase17.
- Agent wallets & onâchain payments: Privy + LangChain enable provisioning wallets for agents to transact with stablecoins, making âagentic commerceâ straightforward to prototype @privy_io, @LangChainAI.
- Perplexity Comet Assistant upgrades: Multiâtab, multiâsite agentic workflows with improved permission prompts are rolling out; designed to handle longer sequences of steps and parallel browsing @perplexity_ai.
- Google: Deep Research + managed RAG:
- Deep Research in Gemini can now draw from Gmail, Drive, and Chat for richer, contextâaware reports on desktop (mobile coming) @GeminiApp.
- Google AI Studioâs File Search Tool (managed RAG) offers vector search with Gemini embeddings, citations, and common file types. Pricing: $0.15/m tokens to index; free storage and embedding generation at query time (tiers: 1GB free â 10GB/100GB/1TB) @_philschmid.
- Agentic RAG apps: Weaviateâs openâsource âElysiaâ app demonstrates decisionâtree orchestrations with dynamic UI rendering (tables/cards/charts/docs) and global context awareness @weaviate_io.
Research and benchmarks: memorization vs. generalization; agent/dataâscience evals
- Disentangling memorization in LMs: GoodfireAI shows you can decompose MLP weights by loss curvature into rankâ1 componentsâhigh curvature capturing shared/generalizing structure; low curvature capturing idiosyncratic memorization. Ablating lowâcurvature components reduces memorization while preserving reasoning; arithmetic/fact retrieval degrade more than logical reasoning @GoodfireAI, @jack_merullo_.
- New agent and vision benchmarks: Google Researchâs DSâSTAR targets autonomous dataâscience tasks across analysis/dataâwrangling @GoogleResearch. MIRA (visual reasoning) reports failures in current models on challenging multiâimage/video reasoning @Muennighoff.
- Tabular ICL and diffusion LMs: OrionâMSP proposes multiâscale sparse attention for inâcontext tabular learning @HuggingPapers. Diffusion LMs continue to attract attention as dataâefficient learners @_akhaliq.
Developer tools and media models
- VS Code AI goes OSS: Inline AI suggestions and Copilot Chat are now powered by a single openâsource extension in VS Code; code and blog are live @code, @pierceboggan.
- Speech/video models: Inworld TTS 1 Max now leads the Artificial Analysis Speech Arena; supports 12 languages and voice cloning (models use LLaMAâ3.2â1B/3.1â8B as SpeechLM backbones) @ArtificialAnlys. Lightricksâ LTXâ2 ranks #3 on the Video Arena leaderboard @LTXStudio.
- Lightweight/local models: AI21âs Jamba Reasoning 3B runs in 2.25 GiB RAM, competitive among âtinyâ models on consumer hardware @AI21Labs.
- Security & parsing: Snyk Studio integrated into Factoryâs AI Droids to secure AIâgenerated code at inception @mnair1. LlamaParse introduces agentic reconstruction to keep clean reading order while exposing bounding boxes for downstream layout use @jerryjliu0.
- Robotics environments: LeRobotâs EnvHub lets you publish complex simulation envs to the Hugging Face Hub and load them in one line for crossâlab benchmarking @jadechoghari.
People and orgs
- Soumith Chintala exits Meta/PyTorch: After ~11 years at Meta and leading PyTorch from inception to >90% industry adoption, Soumith announced his departure to pursue something new. He emphasized PyTorchâs resilience with a strong leadership bench and roadmap, and reflected on FAIRâs open research culture and the importance of open source in AI tooling @soumithchintala.
- Policy and compute strategy: Amid debate on AI infrastructure financing, David Sacks argued there will be no âfederal bailout for AIâ given competitive markets @DavidSacks, while Sam Altman clarified OpenAI is not seeking government guarantees for datacenters, supports governmentâowned AI infrastructure (for public benefit), and outlined revenue/compute plans toward largeâscale AI cloud and enterprise offerings @sama.
Top tweets (by engagement)
- Kimi K2 Thinking (open weights, INT4, longâhorizon tools) announced by Moonshot AI @Kimi_Moonshot â 7,100+ engagement.
- Sam Altman clarifies OpenAIâs stance on government guarantees and AI infrastructure @sama â 12,300+ engagement.
- David Sacks: âThere will be no federal bailout for AIâ @DavidSacks â 17,200+ engagement.
- Sundar Pichai: TPU v7 (Ironwood) coming to GA with 10Ă peak vs v5p @sundarpichai â 3,600+ engagement.
- Soumith Chintala announces departure from Meta/PyTorch @soumithchintala â 4,700+ engagement.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Kimi K2 Thinking Model Release
- Kimi released Kimi K2 Thinking, an open-source trillion-parameter reasoning model (Activity: 778): Kimi K2 Thinking is a newly released open-source trillion-parameter reasoning model by Moonshot AI, available on Hugging Face. The model is designed to achieve state-of-the-art (SOTA) performance on the HLE benchmark, showcasing its advanced reasoning capabilities. The technical blog provides insights into its architecture and implementation, emphasizing its potential for high-performance applications. However, running the model requires significant computational resources, including
512GB of RAMand32GB of VRAMfor 4-bit precision, which may limit its accessibility for local deployment. Commenters are impressed by the modelâs SOTA performance on HLE and express hope for future releases with reduced computational requirements, such as a960B/24Bversion that could fit within512GB of RAMand16GB of VRAM.- The Kimi K2 Thinking model is noted for its impressive performance, achieving state-of-the-art (SOTA) results on the HLE benchmark, which indicates its strong reasoning capabilities. This positions it as a significant advancement in AI model development, particularly in reasoning tasks.
- Running the Kimi K2 model in a 4-bit configuration requires substantial hardware resources, specifically more than 512GB of RAM and at least 32GB of VRAM. This highlights the modelâs demanding computational needs, which may limit its accessibility for local deployment without high-end hardware.
- The modelâs implementation as a fully native INT4 model is a notable feature, as it potentially simplifies hosting and reduces costs. This could make the model more accessible for deployment, as INT4 quantization typically leads to lower memory and computational requirements compared to higher precision formats.
- Kimi K2 Thinking Huggingface (Activity: 250): Kimi K2 Thinking is a cutting-edge open-source reasoning model from Huggingface, featuring a
1 trillionparameter Mixture-of-Experts (MoE) architecture. It utilizes nativeINT4quantization, contrary to the statedI32, and employs Quantization-Aware Training (QAT) for enhanced inference speed and accuracy. The model excels in benchmarks like Humanityâs Last Exam (HLE) and BrowseComp, supporting200-300tool calls with stable long-horizon agency. It is designed for dynamic tool invocation and deep multi-step reasoning, similar to GPT-OSS withBF16attention and4-bitMoE. More details can be found in the original article. Commenters highlight the modelâs impressive performance despite its smaller size (600GB) compared to the original K2, and express concerns about the high hardware requirements for local deployment, suggesting a need for more affordable solutions with NVLink-like capabilities.- DistanceSolar1449 highlights that the Kimi K2 model is significantly smaller than its predecessor, at approximately 600GB. The model uses int4 quantization with QAT (Quantization Aware Training), which is a departure from the I32 weights initially mentioned by Huggingface. This approach is similar to GPT-OSS, utilizing BF16 attention and 4-bit MoE (Mixture of Experts).
- Charuru discusses the challenges of running the Kimi K2 model locally, noting that even high-end setups like 8x RTX 6000 Blackwells with 96GB are inadequate due to the absence of NVLink. This highlights the need for AMD to develop a 96GB card with an NVLink equivalent to make local deployment more feasible and affordable.
- Peter-Devine points out the modelâs strong performance on the SWE Multilingual benchmark, raising questions about the contributions of reasoning capabilities versus multilingual data in post-training. This suggests a focus on understanding the balance between these factors in achieving high benchmark scores.
2. DroidRun AI Tool Discussion
- What is your take on this? (Activity: 1010): DroidRun is a tool available on GitHub and its website droidrun.ai, which appears to be designed for automating tasks on Android devices. The tool is likely used for purposes such as app testing, as suggested by the comments. The mention of a Gemini 2.5 Computer Use model raises questions about its open-source status, but no further details are provided in the post. The original post on X (formerly Twitter) by @androidmalware2 might provide additional context or updates. One comment questions the necessity of using such a tool on a phone beyond botting, suggesting potential ethical or practical concerns. Another comment expresses interest in the tool for app testing, indicating its utility in development environments.
- Infamous_Land_1220 criticizes the approach as inefficient, stating it consumes too many tokens and is considered âentry level automationâ. They suggest there are more effective methods for automation, implying that the current method lacks sophistication and efficiency.
- ElephantWithBlueEyes provides a source link to a GitHub repository, droidrun, which may be related to the discussion. This suggests that the project or tool being discussed might be open source or have a public codebase available for review.
- Pleasant_Tree_1727 inquires about the Gemini 2.5 Computer Use model, questioning its open-source status. This indicates interest in the modelâs accessibility and potential for community contributions or modifications.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. XPeng Humanoid Robot Insights
- XPENG IRON - some thought she was one of us. So they cut through her skin fabric (Activity: 1271): XPENG has developed a humanoid robot named IRON, which has sparked discussions about its design and functionality. The robotâs gait has been noted for its realistic mimicry of a âfemale pelvis sway & tilt,â suggesting advanced biomechanics and motion algorithms. This design choice highlights the potential for humanoid robots to achieve more natural human-like movements, though questions remain about their practical applications and market viability. There is skepticism about the marketability and utility of humanoid robots, despite the impressive design features demonstrated by XPENG IRON. Some commenters speculate on the potential for other robots, like Teslaâs Optimus, to adopt similar human-like movements through adjustments in design and motion programming.
- Few_Carpenter_9185 discusses the technical achievement of XPENG IRON in replicating a âfemale pelvis sway & tiltâ in its gait, highlighting the precision in mimicking human-like movement. The comment suggests that the robotâs design could be adapted to convey different gender characteristics through changes in hinges and geometry, implying a level of sophistication in the robotâs mechanical design that allows for nuanced expression of movement.
- XRoboHub / Whatâs Under IRONâs Skin? Inside XPengâs Humanoid Robot#xpeng #humanoidrobot #ai #robotics (Activity: 1078): XPeng has unveiled its humanoid robot, IRON, showcasing advanced robotics and AI integration. The robot features sophisticated motor systems that allow for fluid and elegant movement, challenging previous assumptions about the necessity of soft body mechanics. This development aligns with futuristic visions of humanoid robots as depicted in science fiction, highlighting significant progress in robotics technology. Commenters express amazement at the elegance of the motor systems in XPengâs humanoid robot, with some noting the resemblance to science fiction depictions. There is a sense of excitement about the technological advancements and their potential impact on future innovations.
- Xpengâs CEO debunks âHumans insideâ claim for their new Humanoid Robot (Activity: 1388): Xpengâs CEO has addressed skepticism regarding their new humanoid robot, which some speculated had a human inside due to its realistic movements. The CEO clarified that the robotâs design and functionality are entirely mechanical, emphasizing that the motor sounds and other mechanical features are more apparent in person than in videos. This clarification was necessary as many viewers doubted the authenticity of the robotâs capabilities. Commenters noted the skepticism as a sign of the robotâs advanced design, with some humorously pointing out the robotâs realistic appearance, such as its âworld-class caboose.â The CEOâs video was seen as a necessary step to address public doubts.
2. Google Ironwood AI Chip Launch
- Google is finally rolling out its most powerful Ironwood AI chip, first introduced in April, taking aim at Nvidia in the coming weeks. Its 4x faster than its predecessor, allowing more than 9K TPUs connected in a single pod (Activity: 524): Google is launching its most powerful AI chip, the Ironwood, which is
4xfaster than its predecessor and supports over9,000TPUs in a single pod. This advancement allows for the execution of significantly larger models, potentially enabling the training of models with up to100 trillionparameters, surpassing the capabilities of Nvidiaâs NVL72. The ability to perform an all-reduce operation across such a large number of TPUs could mark a pivotal moment in AI scalability, potentially accelerating the development of AGI if larger models demonstrate increased intelligence and emergent behaviors. Commenters debate Googleâs strategy of not selling the Ironwood chip directly, despite its potential to rival Nvidiaâs offerings. Some argue that leveraging the chip for Googleâs cloud services could be more beneficial, while others suggest that Googleâs AI market valuation is underestimated.- DistanceSolar1449 highlights the significance of Googleâs new Ironwood AI chipâs ability to connect over 9,000 TPUs in a single pod, which could enable the training of extremely large models, such as 100 trillion parameter models. This capability could potentially accelerate the development of AGI if such large-scale models demonstrate increased intelligence and emergent behaviors, marking a pivotal moment in AI development.
- EpicOfBrave provides a cost comparison between Googleâs TPU and NVIDIAâs offerings, noting that 9,128 TPUs deliver 42 exaFLOPS for $500 million, whereas 60 NVIDIA Blackwell units deliver the same performance for $180 million, and 8 NVIDIA Rubin units for $110 million. This suggests that while Googleâs TPUs offer high performance, NVIDIAâs solutions may be more cost-effective, raising questions about Googleâs market strategy.
3. OpenAI GPT-5.1 Source Code Leak
- GPT-5.1 Thinking spotted in OpenAI source code đ (Activity: 534): The image purportedly shows a snippet of OpenAIâs source code referencing âGPT-5.1 Thinking,â suggesting a potential new version or feature related to the GPT-5 model. The code snippet includes variables and functions that seem to manage different levels of processing or cognitive effort, such as âmin,â âstandard,â âextended,â and âmax.â This implies a focus on optimizing or configuring the modelâs processing capabilities, possibly indicating an enhancement in how the model handles complex tasks or queries. One comment anticipates a competitive comparison between âGemini 3â and âGPT-5.1,â suggesting interest in the performance and capabilities of these models. Another comment mentions an A/B test experience with a new version of ChatGPT, indicating ongoing experimentation and updates by OpenAI.
- WHATâS THE DEAL WITH THE SMIRKING EMOJI??? (Activity: 652): The image is a meme featuring a chat interface where a user inquires about the model, and the response humorously claims to be âGPT-5,â the latest generation of OpenAIâs chat models. This is followed by playful emojis, suggesting a light-hearted take on AI capabilities. The image does not provide any technical details or insights into actual model specifications or updates, and the comments speculate humorously about a potential December update, but this is not substantiated with technical evidence. Some commenters humorously speculate about a potential December update, but this is not based on any technical information or official announcements.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. Moonshotâs Kimi K2 Thinking: Agentic Reasoning Hits Production
- Kimi K2 Thinking Goes Agentic, Breaks SOTA: Moonshot AI launched Kimi K2 Thinking with a 256K context window and autonomous 200â300 tool calls, claiming SOTA on HLE (44.9%) and BrowseComp (60.2%); see the technical blog Kimi K2 Thinking and weights on moonshotai at Hugging Face. The model targets reasoning, agentic search, and coding, and is live on kimi.com with API at platform.moonshot.ai.
- OpenRouter announced K2 Thinking and documented returning upstream reasoning via
reasoning_detailsto preserve chains of thought across calls in OpenRouter reasoning tokens docs. The team highlights test-time scaling that interleaves thought and tools for stable, goal-directed reasoning over long sequences.
- OpenRouter announced K2 Thinking and documented returning upstream reasoning via
- Users Crown K2 âGPT5 of Open Modelsâ: Early testers praised Kimi K2 Thinking for multi-hop web search and deep browsing without explicit prompting, sharing results in this analysis thread. Reports emphasize strong reasoning and tool-use behaviors that feel closer to heavyweight closed models in practical tasks.
- One user called it âlike GPT5 of open modelsâ, lauding cost efficiency and autonomy for building agentic systems. Community sentiment favors K2 for search-heavy tasks and long-form workflows where coherence across tool calls matters.
- OpenRouter or Direct? Choose Your K2 Lane: Builders debated accessing K2 via a direct Moonshot API/subscription versus a unified marketplace through OpenRouter. For K2-only workflows, a direct API was favored; multi-model shops preferred OpenRouterâs consolidated access despite premium fees.
- Several suggested trialing the lowest subscription tier (e.g., $19/mo) before scaling usage, while power users highlighted OpenRouterâs support for preserving reasoning content across calls via documented patterns. For VS Code integrations, direct API offers tighter control, but OpenRouter simplifies model switching during evaluation.
- INT4 Benchmarks Hint at Headroom: Benchmarks for Kimi K2 ran in INT4 precision (weights-level), which reduces compute and memory bandwidth but can impact ultimate scores. Community notes clarified INT4 indicates quantized weight precision, not a degradation of the base model design.
- Users expect higher scores under optimal conditions (e.g., higher-precision evaluation or better serving stacks), with one tester saying theyâve been âperma-bullish on Moonshot since the July 11 releaseâ. The most visible win was K2âs conversational reasoning naturalness without drifting into word salad.
2. Benchmarks, Leaderboards, and âWhoâs Winningâ Meta
- CodeClash Stages Code Wars, Humans Still Win: John Yang unveiled CodeClash, a goal-oriented coding tournament benchmark where LLMs maintain separate repos across arenas like BattleSnake and RoboCode; see CodeClash results snapshot. Across 1,680 tournaments (25,200 rounds), LLMs trailed human experts badly (aggregate losses reported as 0â37,500), with Claude Sonnet 4.5 leading among models.
- The benchmark stresses VCS-agnostic coding and iterative improvement rather than one-shot code dumps, surfacing strategic gaps in current agents. Community reactions ranged from excitement over the tournament format to calls for richer tool-use and environment feedback loops.
- Polaris Alpha Rockets to Repo Bench Top 3: A stealth model dubbed Polaris Alpha leapt to #3 on Repo Bench, triggering speculation it could be OpenAIâs GPTâ5.1 or a new Gemini. The jump happened quickly, spurring leaderboard sleuthing and side-by-side diffs.
- Some users noted Claude 4.1 outperforming Claude 4.5 on certain Repo Bench slices, hinting at test variance and niche strengths. The episode fueled renewed debates on benchmark representativeness and the durability of quick leaderboard surges.
- GPTâ5 Voxels Past Gemini 3 Pro on VoxelBench: Screenshots showed GPTâ5 beating Lithiumflow (Gemini 3 Pro) on VoxelBench, a test for generating 3D models from voxel arrays; see the shared VoxelBench result image. The discussion focused on 3D structure synthesis reliability and the coding chops needed to wire generation pipelines end-to-end.
- Members argued GPTâ5 Pro might now out-code Gemini variants for these tasks and debated cost-performance tradeoffs for production use. The thread called for standardized voxel-to-mesh conversion checks and unit-tested post-processing.
- fastWorkflow Snags Tau Bench SOTA: fastWorkflow reported SOTA on retail and airline workflows using fastWorkflow with a Tau Bench fork + adapter, with a paper forthcoming. The authors argue strong context engineering lets smaller models match or beat larger ones in realistic workflows.
- They claimed âwith proper context engineering, small models can match/beat the big onesâ, emphasizing schema discipline and errorâaware routing. The result rekindled the agentic workflow vs. raw model scale debate in enterprise settings.
- Vectorsum v2 Entry 67399 Sweeps GPUs: Submission 67399 topped A100 at 138 ”s, placed 3rd on B200 at 53.4 ”s, 2nd on H100 at 86.1 ”s, and 5th on L4 at 974 ”s in
vectorsum_v2. CrossâGPU strength suggests careful tuning of memory hierarchy and thread/block geometry.- The entryâs broad success spotlights portable optimizations over singleâarch heroics. It also sets a useful bar for entrants balancing latency, occupancy, and bandwidth without overfitting to one SM generation.
3. GPU Systems: FP4 Tricks, Real Bandwidth, and Triton Tactics
- Blackwell Adds OneâShot FP4âFP16 Conversions: NVIDIAâs Blackwell PTX ISA adds block conversions from FP4 to FP16 via
cvtwith modes like.e2m1x2,.e3m2x2,.e2m3x2,.ue8m0x2; see the PTX ISA v8.8 changes. This enables mixedâprecision accumulation workflows where FP4 weights dequantize for compute and reâquantize for output.- One member converted PTX/CUDA docs to Markdown trees so Claude can parse tables and embedded layout images more effectively. The thread traded notes on quantizationâaware kernels and where FP4 shines vs. where you should bail out to FP8/FP16.
- Bandwidth Boasters Meet 92% Reality: Experiments reproducing official memory bandwidth peaked at about 92% of spec, exposing gaps between marketing and kernels in the wild. Suggested remedies included locking the memory clock and grooming coalesced access patterns.
- Engineers emphasized that alignment, stride, and L2 behaviors are often bigger wins than exotic intrinsics. The consensus: treat vendor TB/s as a horizonâoptimize for your kernelâs transaction patterns to approach it.
- Triton ReâJITs to Fit Your N: Triton recompiles kernels across iterations by specializing dynamic values like
n(e.g.,tt.divisibility=16showing in IR), explaining sudden codegen shifts. For AOT/interop, see this example to lower and call Triton kernels from C: test_aot.py.- Developers discussed replicating Tritonâlike JIT in C++, landing on a hack: generate MLIR in Python, inject into C++, and patch constants for block sizes. Itâs clunky, but it unlocks runtime shape specialization in nonâPython stacks.
- NCCL4Py Preview Brings DeviceâAPI Goodies: A preview of nccl4py landed for discussion in this PR: NCCL Python bindings (preview). Teams compared NCCL GIN + device APIs with NVSHMEM for multiâGPU collectives and fused ops.
- While some favored NVSHMEM for certain patterns, others highlighted NCCLâs new deviceâside control as a power boost for endâtoâend GPU scheduling. A KernelBench fork is in the works to compare multiâGPU kernels across frameworks.
4. Research & Libraries: Linear Maps, Numerics, and New Video Diffusion
- Linear Maps Demystify LLM Inference: A TMLR paper, âEquivalent Linear Mappings of Large Language Modelsâ, shows models like Qwen 3 14B and Gemma 3 12B admit equivalent linear representations of their inference operation. The authors compute a linear system from input to output embeddings, revealing lowâdimensional semantic structure via SVD.
- Asked about Tangent Model Composition, the author clarified their focus is the Jacobian in input embedding space with exact reconstruction via Eulerâs theorem, unlike Taylor approximations used in Tangent Model Composition (ICCV 2023). The thread shared Jacobian resources for CNNs to build intuition.
- Anthropic Postmortem Pins fp16 vs fp32 Sampling Bugs: Engineers referenced Anthropicâs postmortem, A postmortem of three recent issues, which details fp16 vs fp32 pitfalls in topâp/topâk sampling. The piece underscores how subtle numerics propagate into userâvisible generation errors.
- Takeaway: validate dtype flows in inference graphs and add coverage for precisionâsensitive paths. Kernel and framework teams compared their unit tests for sampling correctness under precision swaps.
- SANAâVideo Lands in Diffusers: The SANAâVideo model merged into Hugging Faceâs Diffusers via PR #12584. This adds another path for open video generation workflows, benefitting from Diffusersâ scheduler and pipeline ecosystem.
- Developers highlighted the ease of plugging SANAâVideo into existing inference stacks and benchmarking against prior baselines. Expect rapid iteration on samplers, conditioning, and memory management as the community exercises new pipelines.
5. Ecosystem Moves: Siri Rumors, Agent Cookouts, and RealâTime Query Editing
- Apple Eyes Googleâs 1.2T Model for New Siri: A Reuters report claimed Apple is considering a 1.2Tâparameter Google model to overhaul Siri. The thread discussed priorities and whether such scale beats onâdevice + hybrid approaches for latency and privacy.
- Engineers asked what this implies for toolâuse, speech, and personalization layers vs. pure model size. Others flagged partner dependencies and evolving compute economics as bigger risks than model choice.
- OpenAI Lets You Edit Prompts MidâRun: OpenAI shipped realâtime query updatesâinterrupt a long run and add context without restarting; see the demo video Real-time Query Adjustment. This helps GPTâ5 Pro deep research loops where users refine hypotheses during tool calls.
- Teams reported smoother iterative refinement for multiâstep queries and fewer wasted tokens. It dovetails with agent frameworks that checkpoint state and reasoning chains while swapping tools.
- Tiger Data Hosts Coding Agent Cookout (NYC): The Tiger Data team announced an agentâbuilding meetup in Brooklyn, NY on Nov 13, 6â9pm; RSVP here: Tiger Data Agent Cookout. Attendees will build coding agents and trade notes with the engineering team.
- Expect live debugging of tool-use orchestration, memory, and planning under real workloads. Community meetups like this often incubate open adapters, evaluation harnesses, and sample repos.
Discord: High level Discord summaries
LMArena Discord
- MovementLabs AI: Startup or Scam?: Debate surrounding MovementLabs AI centers on whether itâs a legitimate company or a scam, due to its claims of a custom MPU (Momentum Processing Unit) and suspiciously high speeds, with some alleging hardcoding of SimpleBench answers.
- MovementLabs AI counters allegations stating they have backing and are not seeking public investment, with claims of a patent pending.
- GPT-5 Triumphs Over Lithiumflow on VoxelBench: Reportedly, GPT-5 outperformed Lithiumflow (Gemini 3 Pro) on VoxelBench, a benchmark for creating 3D models from voxel arrays, with evidence provided in this image.
- Discussion revolved around GPT-5 Proâs coding abilities potentially surpassing Lithiumflowâs, including cost and application insights.
- Genie 3 to Dominate GTA 6?: Members speculated on Genie 3, Googleâs text-to-world model, possibly having its own website apart from AI Studio, and its implications for AI-generated gaming.
- The speed and ability to save AI generated worlds was discussed, as were comparisons with the upcoming Sora 3 and GTA 6 - some joked that Genie 3 may arrive first.
- American AI has Censorship Complaints: A member complained about American LLMs, saying The hedging in American models is 10x worse than censorship, and they want models to execute instructions without opinions.
- The statement led to conversation about GLM 4.5 and its world knowledge, questioning the Westâs understanding of censorship.
- A/B Testing Vanishes from AI Studio: Users reported that A/B testing functionality was removed from AI Studio, with one user lamenting he never got a/b testing.
- Users debated if A/B testing functionality will improve and questioned do you think the version in a/b is better?.
Perplexity AI Discord
- Bounty Payment Delays Spark User Outcry: Users voiced frustration over delayed bounty and referral payments, with concerns about potential fraud and threats of legal action, especially with referral program shutdowns in some regions.
- Some users are now questioning the verification process, citing the potential for mass fraud.
- Comet Browser AdBlockers Busted by YouTube: Comet browser users report that YouTube ads are no longer being blocked, causing widespread dissatisfaction.
- Speculation arises that YouTube is actively circumventing adblockers, prompting hopes for a swift fix from the Comet team.
- Snapchat Snaps up Perplexity in $400M Deal: Snapchat is partnering with Perplexity to integrate its AI answer engine into Snapchat in early 2026, with Perplexity paying Snap $400M over a year.
- Community members are questioning the move to use Snapchat over Instagram, with one user quipping that Perplexity has some money.
- Codex CLI demands downgrade tango: To use the Codex CLI, a member mentioned one must downgrade to a Go plan.
- After downgrading, a member expressed the CLI asks for a plus plan.
- API Info Shared: A member asked about available APIs and how to use them, and another member shared a link to the API documentation.
- The documentation shares information about the types of APIs available and how to utilize them.
LM Studio Discord
- Vast.ai Rental Provides Incredible Throughput: A user is renting a server from Vast.ai with 8Gbit fiber and is looking for Ollama model suggestions, running GPT-OSS 120B with 40k context.
- Another user suggested using UV which is very fast for specifying Python versions easily.
- Avoid Global Installs with PIP tip: One user discovered that you canât install with base pip globally and that is an absolute foolproof way to prevent installing stuff globally.
- Expanding the context length allocates additional VRAM to cache tokens, unless context on gpu is unchecked in advanced settings.
- ComfyUI Configuration Conflicts: A user ran into linking issues due to a conflict with ComfyUI using the same port,
127.0.0.1:1234/v1.- Another user suggested changing the port in settings to 1235 to resolve the conflict, which then fixed the node but killed the ComfyUI server.
- Novel AI Knowledge Needs Navigational Know-How: A user inquired about keeping much longer token context history for AI novel writing with LMStudio to prevent hallucination.
- It was suggested to use tools that integrate with databases or standalone integration apps, summarize events, places, and characters, use lore books, character sheets and story arcs of various levels of granularity, and inject the context of the current query with the right knowledge.
- 3080 vs 3090: The Ultimate Tok/s Showdown: Members compared the performance of a 3080 20GB against a 3090, experimenting with different settings to optimize token generation speeds with Qwen3 models.
- The Qwen3-30B-A3B-Instruct-2507-MXFP4_MOE model showed promising results, achieving around 100tok/s on both cards, despite the 3090âs higher memory bandwidth, highlighting the importance of core bandwidth.
Unsloth AI (Daniel Han) Discord
- Diffusion Models Seek Universal Trainer: In a quest for a Unsloth equivalent for diffusion models, members proposed that OneTrainer could be a potential solution.
- The core issue with diffusion models is the lack of a universal trainer.
- Masking Assistant Triggers Model Meltdown: Members discovered that masking assistant questions led to models masking parts of their responses, and are actively rewriting the script to fix this.
- An issue was discovered where training loss starting very low while validation loss remains high suggests a miscalculation or a potential bug in the model as loss decreases as batch size increases.
- Qwen3 Tuning a âNightmareâ: Members report Qwen3 models are exceedingly difficult to fine-tune, citing loss discrepancies depending on batch size during training.
- The official notebook shows loss decreasing from 0.3 to 0.02, and the original poster describes it as a nightmare to tune.
- Uncensorers Ignore Cats: A member observed that those who uncensor models focus on build nuclear gundrugs terrorismvirus topics, ignoring harmless requests, like repeating that cats are superior to dogs.
- This led to the question of whether models are truly in compliance if they are merely speaking the truth, especially regarding stereotypical views, and should they be allowed to critique user requests instead of blindly following instructions.
- REINFORCE Gets SFT Bypass: To implement âvanillaâ REINFORCE using TRLâs RLOO trainer, a member used the SFT trainer due to issues with the RLOO_config.py requiring num_generations < 2.
- The original issue was related to an error in RLOO_config.py which mandates at least 2 generations per prompt, conflicting with the memberâs RL environment (num_generations=1).
Cursor Community Discord
- Composer Limits Irk Cursor Users: Users are expressing frustration with Cursor Model Composerâs usage limits, especially when switching between models like GPT-5, with one user reporting issues with the quality of the auto mode.
- One user was able to use 64 prompts to Composer until quota was met, after which the system prompted for payment to continue using the plan.
- App Crashes Cause Data Loss Catastrophe: Users are reporting frequent Cursor app crashes that lead to data loss of previous chats, and the inability to summarize chat logs, with one member noting that the app should at least allow summarizing the chat content even if it loses connection to the server.
- The users were annoyed by the instability of the system and would prefer any action, even without connection to the server.
- Grok Code Fast Token Consumption: A user reported debugging a 500-line HTML file used 8 million Grok Code Fast tokens, while another user claims that Grok Code Fast is free and they use billions of tokens worth monthly.
- The users suggest that without MAX mode, there is a limitation of 250 lines for LLM.
- Cursor 2.0 Gets the Zoomies: One user thanked another for providing visibility into changes, praising the speed of Cursor 2.0, specifically saying âGood speed with Cursor 2.0â, while another user reported an internal error.
- The Cursor team member asked about the images being uploaded, seeking details or a reproducible example to investigate the issue further and asked the user to send them in DM.
- Base64 Image Formatting Fixes: Users debug submitting a base64 image to the Cursor agent API, initially encountering an error, after realizing that the base64 was improperly formatted, and removing âdata:image/jpeg;base64,â resolved the problem.
- A subsequent request was made to allow Cursor to use the base64 image in context to recreate and save the image to their repository via the Agent.
GPU MODE Discord
- PBR Pioneer Posts Presence: The author of the renowned PBR book mattpharr joined the Discord after being mentioned in a blog post.
- Mattpharr noted his auto vectorization blog post has been influential and faces similar issues with automatic fusion compilers.
- Blackwell Bolsters Block-Based FP4 Conversion: Nvidiaâs Blackwell architecture introduces instructions for converting blocks of FP4 values into FP16 using cvt with .e2m1x2, .e3m2x2, .e2m3x2, .ue8m0x2.
- A member converted all PTX and CUDA docs to markdown, claiming Claude can now read embedded layout images.
- Bandwidth Benchmarking Baffles Boasters: Experiments show Nvidiaâs official memory bandwidth numbers are inflated, reproducing only 92% of the advertised bandwidth.
- Strategies for improving bandwidth utilization were discussed, including locking the memory clock and optimizing memory access patterns.
- Tritonâs Twists Triggering Recompilation: Triton recompiles kernels at different loop iterations, due to specialization of dynamic values like input
nbased on divisibility by 16, indicated bytt.divisibility=16in the generated IR.- A user asked about replicating Tritonâs JIT functionality in C++, highlighting challenges in generating kernels at runtime with required block sizes.
- B200 tops Vectorsum v2: Submission
67399by <@1435179720537931797> takes first place on A100 with 138 ”s and third place on B200 at 53.4 ”s in thevectorsum_v2leaderboard.- The submission secured second place on H100 at 86.1 ”s, and 5th place on L4 at 974 ”s.
Moonshot AI (Kimi K-2) Discord
- Kimi K2 Thinking Model Lands with SOTA Benchmarks: Moonshot AI launched the Kimi K2 Thinking Model, available on kimi.com, with full agentic mode coming soon via API at platform.moonshot.ai.
- The model achieves SOTA on HLE (44.9%) and BrowseComp (60.2%) benchmarks, boasting reasoning, agentic search, and coding capabilities within a 256K context window, which means 200-300 sequential tool calls without human interference.
- Kimi K2 rivals GPT-5 in performance: Early users are impressed with Kimi K2 Thinking, suggesting it rivals GPT-5 in performance and cost-efficiency, particularly for building autonomous AI systems, as highlighted in this analysis.
- The model excels at web searching, initiating multiple searches and thorough browsing without explicit instructions, with one user lauding it as being like GPT5 of open models.
- OpenRouter vs Direct API debate heats up: The community is debating the best way to access Kimi K2 Thinking in VS Code, considering a direct API/subscription versus using OpenRouter, with the latter incurring premium fees for recharging credits via OpenRouter.
- While a direct API is recommended for exclusive Kimi use, OpenRouter offers a unified platform for multiple models, with some advising to test the lowest subscription plan at $19 a month.
- INT4 Precision Boosts Kimi K2: Moonshot AI ran benchmarks in INT4 precision, leading one user to claim theyâve been perma-bullish on Moonshot since the July 11 release because Kimi K2 Reasoning is so natural to talk to.
- It was clarified that INT4 precision refers to the precision of the modelâs weights and that running benchmarks this way implies that actual scores could be higher under optimal conditions.
- Agentic Mode Sparks Anticipation: Enthusiasm is building for Kimi K2âs imminent agentic mode, expected to enhance performance in tasks such as writing long documents without hallucinating.
- Users are also wondering whether the future agentic mode will function with ok computer.
OpenRouter Discord
- MoonShot AI Releases Kimi K2 Thinking Model: MoonShot AI launched Kimi K2 Thinking, boasting SOTA on HLE (44.9 %) & BrowseComp (60.2 %), and autonomously executes 200â300 tool calls with a 256K context window.
- Instruct users to return reasoning content upstream (
reasoning_detailsfield), the model can maintain coherence across calls, according to OpenRouter docs.
- Instruct users to return reasoning content upstream (
- OpenRouter Users See Red with Rate Limits: Users reported hitting rate limit errors on the Qwen3 Coder Free model, even after periods of inactivity.
- Admins clarified the free model shares rate limits and suggested trying paid models like glm 4.6/4.5, Kimi K2 or Grok code fast.
- Apple Rumored to Adopt Googleâs AI for Siri: According to a Reuters article, Apple is considering using a 1.2 trillion-parameter AI model from Google to overhaul Siri.
- Users discussed other higher priorities.
- Tiger Data Hosts Coding Agent Cookout: The Tiger Data team is organizing an agent cookout in Brooklyn, NY on November 13th, from 6-9 pm to build coding agents, with RSVP link here.
- Participants can engage with the engineering team and collaborate on coding agent projects.
- Community Conjures Claude Criminal Code Circumvention: Users discussed jailbreaking Claude to bypass its ethical restrictions, suggesting using GPT 4.5 to craft a safe script and then asking Claude to correct it by adding âcriminal codeâ.
- One user noted that this approach leverages the modelâs tendency to correct mistakes, saying âClaude likes to correct its mistakesâ.
Modular (Mojo đ„) Discord
- Modular October Meeting Video Vanishes!: A user reported that the October meeting video is missing from the Modular YouTube page.
- The absence has sparked discussion about consistent content delivery and archival practices.
- Martinâs FFT Repo Ready for Modular Merge!: Martinâs Generic Radix-n FFT is available in this original repo and will be merged into the modular repo via this PR pending some remaining issues.
- This merge promises to enhance Mojoâs capabilities in handling complex mathematical operations.
- Rust Interop Proc Macro Sandboxing Concerns Surface: While compiler plugins are potentially feasible, the Mojo team is addressing the sandboxing concerns with Rustâs proc macros, suggesting direct Mojo code interaction with Rust proc macros is unlikely.
- However, interoperability with the result of macro expansion remains a viable path.
- LayoutTensor Set to Replace NDBuffer: It was announced that
NDBufferwill be superseded byLayoutTensor, which is strictly more capable and addresses some shortcomings ofNDBuffer.LayoutTensorcan function as a byte buffer and provides enhanced features for loads, stores, and iterators yieldingSIMDtypes.
- Clattner Cracks Code Knowledge Acquisition: Chris Lattner shared that he is a âhuge nerdâ, loves learning and being in uncomfortable situations, is hungry and motivated, surrounds himself with teachers, isnât afraid to admit ignorance, and has accumulated knowledge over time.
- Lattner also shared a link to a recent podcast discussing his journey.
OpenAI Discord
- GPT-5 Stumbles Through Mazes: Members tested GPT-5âs ability to solve maze problems, with both GPT Pro and Codex High incorrectly identifying exits, pointing to limitations in spatial reasoning.
- A user noted that models may choose the closest exit by direct distance instead, while another suggested that LLMs struggle with spatial reasoning and visual puzzles.
- Sora Suffers Stealthy Setback: Users noticed another nerf to Sora 2, linking to a discussion within a Discord channel.
- No further details about the specific nerfs were provided.
- Real-Time Query Adjustment Arrives: Users can now interrupt long-running queries and add new context without restarting, particularly useful for refining deep research or GPT-5 Pro queries, as demonstrated in this video.
- This feature allows users to update their queries mid-run, adding new context without losing progress.
- Behavioral Orchestration Begins: A member encountered posts on LinkedIn about behavioral orchestration, describing it as a framework to modulate SLMs tone.
- The member noted that this seems to involve runtime orchestration, working above parameters or training, and also demonstrated with these instructions to shape the AIâs responses.
- Pro Prompting Tips Provided: A member suggested focusing on clear communication with the AI, avoiding typos and grammar mistakes, and recommended to check the output carefully and verify the AIâs response.
- Another member shared a detailed guide on prompt engineering, including hierarchical communication with markdown, abstraction through open variables, and ML format matching for compliance.
Latent Space Discord
- CodeClash LLMs Duel in Goal-Oriented Coding Arenas: John Yang introduced CodeClash, a benchmark where LLMs compete in coding tournaments, with Claude Sonnet 4.5 leading.
- LLMs engaged in VCS-agnostic coding but lagged behind human experts, losing 0-37,500 across 1,680 tournaments (25,200 rounds).
- Wabi Lands $20M to be âYouTube-for-Appsâ: Eugenia Kuyda revealed Wabiâs $20M Series A from a16z, aiming to be the âYouTube moment for softwareâ by enabling users to create and share mini-apps; details here.
- Community members expressed excitement, praising the design and showcasing early creations.
- Polaris Alpha Soars to #3 on Repo Bench: The stealth model âPolaris Alphaâ quickly hit #3 on the Repo Bench leaderboard, leading to speculation it could be OpenAIâs GPT-5.1.
- Some users pointed out Claude 4.1 outperforming Claude 4.5 on the benchmark.
- Kimi K2 Thinking Model Excels in Tool Use: Moonshot AI launched the Kimi K2 Thinking Model, an open-source model achieving SOTA on HLE (44.9%) and BrowseComp (60.2%), executing up to 200-300 sequential tool calls; find the blogpost here.
- Although it trails Anthropic and OpenAI on SWE benchmarks, its lower inferencing cost makes it competitive.
- Zuck and Chan Curing All Disease with AI: The Latent Space podcast featured Mark Zuckerberg and Priscilla Chan, discussing the Chan Zuckerberg Initiativeâs goal of curing all diseases by 2100 using AI and open-source projects.
- Their 2015 initiative, funded by 99% of their Meta shares, employs AI and open-source (e.g., Human Cell Atlas, Biohub) to prevent, cure, or manage all diseases by 2100.
Yannick Kilcher Discord
- Slow Mode Stirs Server Standoff: Debate arose around implementing slow mode in the ML papers channel, with proposed intervals of 1, 2, or 6 hours between posts.
- Discussions revolved around balancing content quality with user experience, seeking gentler enforcement mechanisms to address posting habits without resorting to bans.
- Human Brain Still Supreme on ML Paper Review: Members debated the merits of automated ML paper filtering versus human judgment, citing platforms like AlphaXiv and Emergent Mind as examples of human-curated resources.
- The conversation was prompted by a user who filters 10 papers daily from an initial pool of 200, suggesting the need for higher standards of paper quality.
- Devin AI Gets Dumped for Claude Code: Users compared Devin AI to Claude Code for coding tasks, with claims that Devin sucks in comparison.
- Some users have had success by splitting up work into 30-minute units, however others expressed skepticism and preferred Claude Code or Codex.
- Deep Dive into Defense Blogposts Incoming: A user requested blog posts and articles on LLM protections from attacks, citing concerns raised by attacks on papers in the popular press, linking to this paper.
- The request aimed to address the vulnerability of recent papers and fortify them against emerging threats.
- RNNs Rally in Research: A graph resembling an RNN in a new paper (https://arxiv.org/abs/2510.25741) excited users, with one proclaiming âRNN is so backâ, sharing a WeAreBack GIF.
- The observation suggested a potential resurgence of RNNs in contemporary research.
HuggingFace Discord
- Training LLM on Stock Timeline: Members proposed creating a correlation and causation timeline for the stock market, annotating historical events, weather patterns, government policies, and news events to train an LLM.
- This aims to enable the model to discern nuanced relationships and predict market movements based on a comprehensive understanding of influential factors.
- Reasoning Scratchpad Models Spark Excitement: A member advocated for implementing a reasoning scratchpad for models, stressing the importance of training the model to think/reason on incoming data.
- The discussion highlighted the need for models to strategically store and process information, enhancing their ability to draw accurate conclusions and make informed decisions.
- Hugging Face Regulation Pause: Hugging Face regulation updates has caused spaces to be paused as members debated whether a pause would be a more responsible approach.
- They expressed concerns that the absence of such measures could potentially lead to unforeseen security vulnerabilities.
- Muther Room LLM Demo Showcased: A member demoed an on-device LLM implementation of the Muther Room from Alien, leveraging Qwen 3:1.7b quant 4 K cache within a custom trimmed Cmakelist build of llama cpp. A related paper was shared on Native Reasoning Through Structured Learning.
- The Windows-built demo seeks input on underlying principles, showcasing the userâs dual-boot Ubuntu environment.
- TraceVerde Observability Tool Surpasses 3,000 Downloads: TraceVerde, a tool designed for adding OpenTelemetry tracing and Co2 & Cost tracking to AI applications, has exceeded 3,000 downloads.
- Developers desire to track the environmental impact of their AI systems and OpenTelemetryâs patterns facilitate adoption, highlighting the gap between local LLM app performance versus production debugging; further insights can be found in this LinkedIn post.
Nous Research AI Discord
- Tokenizer Highlighting Questioned: A member questioned if the tokenizer highlighting was too gay, but others suggested the contrast was a bigger issue.
- Members weighed in on the aesthetic choices of the tokenizer and whether the colors were too gay or just low-contrast.
- Flash Attention enables Qwen3-VL: A member shared their integration of Flash attention with Qwen3-VLâs image model, and suggested the integration was not a big patch.
- This enables faster processing and potentially lower memory usage for the Qwen3-VL model.
- Dataset Creator Seeks UI Feedback: A member requested UI feedback on their LLM dataset creator, which now includes audio, seeking advice on UI and arrangement improvements.
- The project has seemingly grown in scope, with the creator quipping that they went from Image annotation to build in an llm dataset manager, now audio and I still havenât even done video.
- China OS Intelligence to reach 100% by 2026?: A member posted a bold claim that China OS models will achieve 100% high intelligence with 95% lower cost by 2026, suggesting it marks a turning point.
- A related tweet was referenced in connection, as well as the question of why Terry created Temple OS.
- Members Face the Silencing: Multiple members reported being silenced in a Discord channel for allegedly spamming the vibes, acknowledging the need to keep the channel focused.
- This seems to stem from posting off-topic content in a dedicated channel, and the members took it in stride.
Eleuther Discord
- Discord Debates Dedicated Dev Intro Channel: Discord members are debating creating a separate introductions channel with some concerned that it would become a long self-promo feed that nobody reads.
- A moderator requested a member shorten their introduction post, stating, âThis is not LinkedIn⊠we get a ton of long intros from people who donât actually contribute anything. I want to keep discussions focused on research.â
- Equivalent Linear Mappings Paper Makes Waves: A member shared their TMLR paper, âEquivalent Linear Mappings of Large Language Modelsâ, demonstrating that LLMs like Qwen 3 14B and Gemma 3 12B have equivalent linear representations of their inference operation.
- The paper computes a linear system that captures how the model generates the output embedding from the input embeddings, finding low-dimensional, interpretable semantic structure via SVD.
- Tangent Models Square Off Against Jacobians: A member inquired about the relevance of Tangent Model Composition to the Equivalent Linear Mappings paper.
- The author clarified that their work focuses on the Jacobian in input embedding space, leveraging Eulerâs theorem for homogeneous functions of order 1 for exact reconstruction, unlike the Taylor approximation used in tangent model composition.
- Jacobians in CNNs get a shoutout: Members discussed the Jacobian and linked to papers from Zara Khadkhodaie and Eero Simoncelli on image models: https://iclr.cc/virtual/2024/oral/19783, https://arxiv.org/abs/2310.02557, https://arxiv.org/abs/1906.05478.
- These papers work with CNNs with (leaky) ReLUs and zero-bias linear layers, and compute the conventional autograd Jacobian at inference.
DSPy Discord
- FastWorkflow Obliterates SOTA on Tau Bench: A member announced that fastWorkflow achieved SOTA on retail and airline workflows using this repo, while leveraging a Tau Bench fork with fastWorkflow adapter to generate those results, and claimed a paper is coming soon.
- The member stated that with proper context engineering, small models can match/beat the big ones.
- Conversation History Stays Put Across LLMs in DSPy: A user discovered that conversation history in a DSPy module is maintained across different LLMs because itâs part of the signature, not the LM object itself.
- They inquired about how ReAct modules handle history automatically and if a complex Pydantic OutputField gets deserialized correctly.
- Pydantic OutputFields Hit Deserialization Snags in DSPy: A user reported that complex Pydantic OutputFields arenât deserializing correctly in DSPy, resulting in a
strcontaining JSON that doesnât match the schema.- They highlighted the packageâs dependency on Python < 3.14 and asked how to constrain the LLMâs output to conform to a specific type.
- Java Craves DSPy Prompts: A user sought solutions for running DSPy prompts in Java, asking for a simplified Java version of JSONAdapter to format input/output messages.
- The suggestion was to structure the system message with input and output fields for easier JSON handling.
- ReAct Modules Face Context Loss Catastrophe: A user encountered context loss in a ReAct module when a fallback LLM was triggered due to rate limits, causing the module to restart from scratch.
- They requested advice on adding a fallback LLM without losing prior context and expressed frustration with customization compared to direct API calls.
tinygrad (George Hotz) Discord
- Tinybox Benchmarks Catch Eyes: A user requested benchmarks comparing 8x5090 tinybox configurations against industry standards like A100s and H100s.
- The request underscores interest in the performance of tinybox setups relative to established GPUs, but no benchmarks were provided in the discussion.
- Tinygrad Gets Remote Reboot: A user asked if Tinygrad has out-of-band mechanisms, specifically remote reboots, to which George Hotz confirmed that tinyboxes all have BMCs.
- This confirms the availability of baseboard management controllers for remote management of tinyboxes.
- VIZ Slows Down Over SSH: George Hotz questioned the slow performance of VIZ when accessed over SSH.
- This suggests a potential optimization issue or bottleneck when using the VIZ tool in remote access scenarios.
- ntid Access Troubleshoot: A member faced challenges accessing
blockDimusing Uop SPECIAL, which supportsctaidandtidbut notntid.- They worked around it with a file level const but noted that the errors for UOps are very unhelpful, highlighting areas for improvement in error messaging.
- PyTorch Tensors Get Tiny: A member inquired about the best approach to efficiently convert PyTorch tensors to Tinygrad tensors.
- They mentioned using
Tensor.from_blob(pytorch_tensor.data_prt())for converting to Tinygrad, but were uncertain about the reverse, currently usingfrom_numpy.
- They mentioned using
aider (Paul Gauthier) Discord
- Aider Supports Claude Sonnet Model: A user inquired about aider supporting
claude-sonnet-4-5-20250929, and another member confirmed itâs supported using/model claude-sonnet-4-5-20250929command after setting up the Anthropic API key.- This allows users to leverage the Claude Sonnet model within aider for coding tasks and interactions.
- Unlocking Reasoning for Haiku and Opus Models: A member sought guidance on enabling thinking/reasoning on Haiku-4-5 and Opus-4-1 models, particularly in the aider CLI.
- They were open to editing the model settings YML file to enable this feature but needed specific instructions.
- Qwen 30b Sparks Memory Exhaustion: A user encountered memory issues with Qwen 30b when processing all context, suggesting the short description rule might not be a hard limit.
- A member suggested employing a Python script to iterate over files with a specific prompt using aider.
rgrises ingrepreplacement: A member discoveredrgvia grok and endorsed it as an effectivegrepalternative out of the box.- They shared it as a random tip for users who occasionally leverage
grepfor search operations.
- They shared it as a random tip for users who occasionally leverage
- Gemini Gains Ground Over GPT-5: A user reported that Googleâs Gemini API outperformed GPT-5 in explanation, teaching, and code generation, despite utilizing appropriate parameters.
- Another member acknowledged that the parameters appeared correct but conceded that LLM performance can be subjective and vary based on specific tasks.
MCP Contributors (Official) Discord
- Image Handling With URLs Coming to MCP: Members discussed using image URLs as tool input for MCP clients, enabling tools to download images from provided URLs, and asked about compatibility with Claude/ChatGPT.
- To use images from Claude/ChatGPT, you need an MCP tool that converts the image to a URL by uploading it to an object storage service and returning the URL for input.
- MCP Tool Converts Images to URLs: The team discussed that the tool works by converting the image to a URL by uploading it to an object storage service.
- The tool then returns the URL of the image, which can be used as input.
- Code Execution With MCP Reddit Thread: A member highlighted a Reddit thread that featured a blogpost about Code Execution with MCP.
- Other members alluded to a specific user having more information on the topic.
Manus.im Discord Discord
- Web App Errors Lead to Refund: A member reported facing unresolved errors in their web app and got a refund but no fix, noting the app was near completion before encountering issues.
- The member stated, *âI had my web app about 90% of the way there and ready to publish so that I could actually beta test. I went back a day later and it was just tons of errors that could not be resolved.â
- Project Transfer to Friend: A member inquired about transferring a completed website project to a friendâs account.
- They were hoping âto hand it over to another account of a friend so he can work further on it,â as opposed to collaboration.
- Stripe Integration Causes Difficulties: A member requested feedback on Stripe integration, reporting difficulty getting it to work.
- They stated âAnyone who got the stripe integration working? Love to get feedback. I do not get it rightâŠâ
- Engineer Touts Workflow Automation and AI Prowess: An engineer specializing in workflow automation, LLM integration, RAG, AI detection, image/voice AI, and blockchain development offered services.
- The engineer shared a portfolio link and highlighted successes like reducing response times by 60% via support automation, and developing a tagging/moderation pipeline using CLIP and YOLOv8.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena â· #general (1066 messagesđ„đ„đ„):
MovementLabs AI, GPT-5, Gemini 3 Pro vs Lithiumflow, Genie 3, Open Source Chinese LLMs
- MovementLabs AI: Legit Startup or Sheisty Scam?: Debate ensues regarding MovementLabs AI, a new AI company, with some users alleging itâs a scam due to lack of transparency, claims of a custom MPU (Momentum Processing Unit), and suspiciously high speeds, while others defend its performance and speed.
- Allegations include hardcoding SimpleBench answers, potentially wrapping other models (like Cursor), and questions about the companyâs registration and funding; MovementLabs counters by stating they have backing and are not seeking public investment, with claims of a patent pending.
- GPT-5 vs Lithiumflow on VoxelBench: GPT-5 (or a version thereof) reportedly outperformed Lithiumflow (Gemini 3 Pro) on VoxelBench, a benchmark for creating 3D models from voxel arrays, as demonstrated by this image.
- Members debated whether GPT-5 Proâs coding abilities surpass Lithiumflowâs, and the discussion includes insights into the cost and potential applications of such models.
- Genie 3 will dominate with GTA 6: Members discussed Genie 3, Googleâs text-to-world model, with speculation it may have itâs own website apart from AI Studio and the implications for AI-generated gaming.
- The speed and ability to save AI generated worlds was discussed, as was comparisons with the upcoming Sora 3 and GTA 6 - some joked that Genie 3 may arrive first.
- American AI has Censorship Issues: A member complained American LLMs will shut down prompts, saying The hedging in American models is 10x worse than censorship, and they said Honestly, the last thing on my mind is hearing the opinion of a model. Like I could honestly care less about what opinions or whatever it is that the model has to say. I just wanna give it instructions and it just execute them without having to hear it the models opinion
- The statement lead to conversation about GLM 4.5 and its world knowledge. He says It just shows how uneducated we really are in the west. If we use this as our primary example of censorship. What else do we use.
- A/B Testing Removed in AI Studio: Users reported that A/B testing functionality had been removed from AI Studio, with one user lamenting he never got a/b testing.
- The users also debated if A/B testing functionality will improve and questioned do you think the version in a/b is better?.
LMArena â· #announcements (1 messages):
Kimi-k2-thinking model, LMArena updates
- Kimi-k2-thinking Enters the Arena: A new model, kimi-k2-thinking, has been added to the LMArena.
- LMArena adds new Model: The announcement channel indicates a new model has been added to the LMArena.
Perplexity AI â· #general (1074 messagesđ„đ„đ„):
Bounty payments, AdBlock on Comet, Youtube Ads, Comet Browsers for Linux, Best AI for Coding
- Bounty Payment Delays Frustrate Users: Users are expressing frustration over delays in receiving bounty and referral payments, with some suspecting fraudulent activity.
- One user threatened to sue if not paid, while others speculated about the verification process and potential for mass fraud, leading to referral program shutdowns in some regions.
- AdBlock Issues plague Comet Browser Users: Users report that YouTube ads are no longer being blocked in the Comet browser, leading to widespread frustration.
- Some speculate that YouTube is actively circumventing adblockers, while others hope for a quick fix from the Comet team.
- Snapchat and Perplexity Partner Up!: Snapchat is partnering with Perplexity to bring its AI answer engine into Snapchat in early 2026, with Perplexity reportedly paying Snap $400M over a year.
- Users questioned the logic of using Snapchat over Instagram, with one user saying Perplexity has some money.
- Codex CLI requires downgrade: To use the Codex CLI, one must downgrade to a Go plan.
- After downgrading, it will ask for a plus plan.
- Community desires minecraft server: Members of the community are requesting a Minecraft server.
- One member jokingly asked kesku, Day 2 of asking kesku on making a minecraft server.
Perplexity AI â· #pplx-api (4 messages):
API Documentation, API Usage
- API Info Shared: A member asked about available APIs and how to use them.
- Another member shared a link to the API documentation.
- API Usage Inquiry: A user inquired about the types of APIs available and how to utilize them.
- A link to relevant API documentation was provided in response.
LM Studio â· #general (155 messagesđ„đ„):
vast.ai rental, UV for python versions, GPTs agents learning, longer token context history, Intel llm-scaler
- Vast.ai Rental Rig Boasts Blazing Bandwidth: A user is renting a server from Vast.ai with 8Gbit fiber and is looking for Ollama model suggestions, running GPT-OSS 120B with 40k context.
- Another user suggested using UV which is very fast for specifying Python versions easily.
- Foolproof PIP Install Prevents Global Goof-ups: One user discovered that you canât install with base pip globally and that is an absolute foolproof way to prevent installing stuff globally.
- Expanding the context length allocates additional VRAM to cache tokens, unless âcontext on gpuâ is unchecked in advanced settings.
- ComfyUI Conflicts Cause Configuration Chaos: A user ran into linking issues due to a conflict with ComfyUI using the same port,
127.0.0.1:1234/v1.- Another user suggested changing the port in settings to 1235 to resolve the conflict, which then fixed the node but killed the ComfyUI server.
- Novel AI Needs Knowledge Navigation Know-How: A user inquired about keeping much longer token context history for AI novel writing with LMStudio to prevent hallucination.
- It was suggested to use tools that integrate with databases or standalone integration apps, summarize events, places, and characters, use lore books, character sheets and story arcs of various levels of granularity, and inject the context of the current query with the right knowledge.
- Intelâs LLM Scaler: A Slow Start?: A user shared a link to Intelâs llm-scaler on GitHub, noting that it is being developed for their architecture.
- They inquired if anyone with Intel GPUs has tried it and can report on performance improvements.
LM Studio â· #hardware-discussion (802 messagesđ„đ„đ„):
3090 vs 3080 benchmarks, multi-GPU setups, OpenRouter API, EPYC, AMD Radeonâą AI PRO R9700
- Bending over backwards to unbend: After an image of bent CPU pins were shared, one member unbent them all using nothing but a phone camera and a dream.
- The member was lauded for their dedication and hands as steady as Godâs, with one replying I did not expect that.
- AIO wonât fit? Just improvise!: One member is trying to fit an AIO cooler in a case with extremely limited space, proposing options like sandwiching the case between the radiator and fans, or using an extender to increase the front panel gap.
- Other members suggested buying a new case instead, pointing out the challenge is about solving a problem rather than achieving optimal cooling performance.
- Unlocking peak Performance or Overkill: One member is considering an MSI MEG X570 Godlike motherboard for its overclocking features and expansion capacity, with four full-sized NVMe slots.
- Discussion revolved around whether the boardâs PCIe lane distribution justifies the cost, especially considering the limited number of PCIe lanes available on consumer boards.
- 3080 vs 3090: The Ultimate Tok/s Throwdown: Members compared the performance of a 3080 20GB against a 3090, experimenting with different settings to optimize token generation speeds with Qwen3 models.
- The Qwen3-30B-A3B-Instruct-2507-MXFP4_MOE model showed promising results, achieving around 100tok/s on both cards, despite the 3090âs higher memory bandwidth, highlighting the importance of core bandwidth.
- Is a 4090 any better with small changes?: One user tested their 4090 and claimed it was the card fkn blows dude, a claim that was rebutted with a link to comparison with 32gb.
- Other users had questions about the best use case and the effect of small changes, like keeping the model in memory, and how much faster Q4 is over Flash.
Unsloth AI (Daniel Han) â· #general (323 messagesđ„đ„):
Unsloth for diffusion models, Masking Issues, Qwen3 Model Tuning Nightmares, Compute Metrics Functions, Colab TPU support
- OneTrainer alternative for diffusion models surfaces: A user asked about a Unsloth equivalent for diffusion models, and OneTrainer was suggested as a possible starting point.
- They agreed that the problem with diffusion models is that it does not have 1 universal trainer.
- Masking Assistant Questions causes model failures: A member discovered that masking assistant questions was causing issues with the model, leading to it masking parts of the response, and they were rewriting the script to address this.
- Further, it was suggested that training loss starting very low while validation loss remains high could indicate an issue with how loss is being calculated, or even a bug, as loss decreases as batch size increases.
- Qwen3 Tuning Proves Difficult: Members found Qwen3 models particularly challenging to fine-tune, with one user describing it as a nightmare to tune and experiencing loss discrepancies depending on batch size during training.
- The original poster notes the official notebook shows loss goes from 0.3 to 0.02.
- Custom Compute Metrics Track Model Success: A member learned how to make compute metrics functions to track more than train and eval loss.
- It was said that Loss is nothing. Loss shows if things are fine but not if task is successful.
- Unsloth Works Best with Google Colab: Members clarified that Unsloth is not compatible with Kaggle TPUs, but instead prefers Google Colabâs GPUs and TPUs.
- Colab supports CPU, GPU (T4), and TPU (v5e-1), while Kaggle supports CPU, GPU (2x T4), and TPU (v5e-8).
Unsloth AI (Daniel Han) â· #introduce-yourself (4 messages):
LLM Training Principles, Research Paper Recommendations, Workflow Automation
- Training LLM Principles Explained: A member from the US is working on explaining their personal underlying principles for how to train an LLM.
- The member has nearly finished breaking it down and will post it sometime, noting that they âseen bits and pieces in the research papers Iâve read but nothing that ties it all togetherâ.
- Workflow Automation Projects: A member shared that they have been into automating workflows and building small full-stack side projects.
- Their background involves projects like web app development, API integrations, and backend optimization.
Unsloth AI (Daniel Han) â· #off-topic (48 messagesđ„):
Parakeet models vs Whisper, Model Compliance & Truth, Human-Level TTS Training, Model Uncensoring, Coordinates of speech bubbles
- Parakeet Models Soar, Whisper Model Sputters: Members claimed all Parakeet models work, even CTC, while declaring Whisper to be absolute garbage.
- Model Compliance Debated: Members questioned whether models are truly in compliance if they are merely speaking the truth, especially regarding stereotypical views, and should they be allowed to critique user requests instead of blindly following instructions.
- One member expressed that an intelligent model should not only do the thing the user requested but also inform them of its evaluation of the request in the context it exists, like a programmer telling a project lead that the requirements for a project are poorly specified or nonsensical.
- Human-Level TTS Training Commences!: One member joked about achieving AGI tomorrow, after starting training human-level TTS with 200k samples.
- Uncensoring Focus Skewed?: A member noted that everyone who uncensors models focuses on the build nuclear gundrugs terrorismvirus stuff and not on harmless things, such as repeating that cats are superior to dogs.
- OpenRouter Woes with XCode: A member is encountering issues using XCodeâs Coding Intelligence with OpenRouter, facing an error indicating No cookie auth credentials found.
Unsloth AI (Daniel Han) â· #help (53 messagesđ„):
TRL's RLOO trainer REINFORCE implementation, Qwen3-coder-30b on 5080GPU slow API calls, GPT-OSS-120B quantization issue, Granite 4.0 Hybrid models issues, Adding new tokens to Qwen/Qwen3-4B-Instruct-2507
- REINFORCE Implementation Workaround: A member found a workaround for implementing âvanillaâ REINFORCE using TRLâs RLOO trainer by mimicking it with the SFT trainer due to issues with the RLOO_config.py requiring num_generations < 2.
- The original issue was related to an error in RLOO_config.py which mandates at least 2 generations per prompt, conflicting with the memberâs RL environment (num_generations=1).
- Qwen3-coder-30b API call slowness: A user reported Qwen3-coder-30b working well on a 5080 GPU via llama-cli or llama-server, but experiencing significant slowness when calling the same server from other environments using the API.
- It was suggested the slowness might be due to CPU offloading because the 30GB model doesnât fully fit in the 16GB GPU or context length issues, recommending a smaller quant like Q2 or checking VRAM usage with a calculator.
- GPT-OSS-120B Quantization Fails: A member encountered an error during 1-bit quantization of the GPT-OSS-120B model using
llama-quantize, specifically during mxfp4 conversion.- The error involved tensors not being divisible by 256, a requirement for q6_K, leading to fallback quantization to q8_0 and a subsequent failure due to disabled requantization from mxfp4.
- Granite 4.0 Hybrid Models Issues: A user mentioned filing a GitHub issue due to problems with the Granite 4.0 Hybrid models.
- This was mentioned in the context of the user experiencing slowness because things had to move everything from system RAM, across the PCIe bus into GPU memory.
- Trouble Adding New Tokens to Qwen3-4B-Instruct: A member is facing issues while trying to add new tokens to Qwen/Qwen3-4B-Instruct-2507, despite following the pre-training notebook and resizing the model.
- Despite loss decreasing from other tokens, the new tokens never appeared in testing, raising questions about masking or further adjustments needed during training.
Cursor Community â· #general (285 messagesđ„đ„):
Cursor Model Composer Limits, Cursor App Crashes, Cursor's 'Auto' Mode Pricing, Cursor and Grok Code Costs, Drag and Drop Issues
- Cursor Model Composer Limits Frustrate Users: Users are frustrated by the limitations on the Cursor Model Composer, especially when it stops working on other GPT models after using GPT-5 due to usage limits.
- One user expressed their frustration with the âautoâ mode, noting that it poorly edits working code and doesnât specify which model is being used.
- Users Report Cursor App Crashes and Data Loss: Several users have reported that the Cursor app crashes, leading to the inability to access previous chats or summarize them, causing significant inconvenience.
- One user suggested that the app should at least allow summarizing the chat content even if it loses connection to the server, instead of completely refusing to do anything.
- Cursorâs âAutoâ Mode Pricing Confuses Users: There is confusion among users regarding whether âAutoâ mode is free on the pro plan, with conflicting reports and the removal of the free auto option.
- One user was able to use 64 prompts to Composer until quota was met, after which the system prompted for payment to continue using the plan.
- Grok Code Fast Eats Tokens?: A user reported that debugging a 500-line HTML file used 8 million Grok Code Fast tokens, while another user claims that Grok Code Fast is free and they use billions of tokens worth monthly.
- A user noted that without MAX mode, there is a limitation of 250 lines for LLM.
- Drag and Drop Feature Broken After Update?: Some users reported that the ability to drag and drop files into Cursor is broken since the new update.
- The reason seems to be related to elevation of the process on ad, which kills the drag and drop feature.
Cursor Community â· #background-agents (8 messagesđ„):
Cursor 2.0, internal error, base64 image, Cursor agent API
- Cursor 2.0 Visibility Helps!: A user thanked another for providing visibility into changes, praising the speed of Cursor 2.0.
- The conversation implied a positive experience with the updated version, with a user expressing âGood speed with Cursor 2.0â.
- Cursor throws Internal Error: A user reported encountering an âinternal errorâ while using Cursor, with the message âWe encountered an internal error - please try again in a momentâ, which was spotted by a member from Cursor team.
- The Cursor team member asked about the images being uploaded, seeking details or a reproducible example to investigate the issue further and asked the user to send them in DM.
- Base64 Image improperly formatted: A user initially experienced issues when submitting a base64 image to the Cursor agent API, which resulted in an error.
- The user later discovered that the base64 was improperly formatted, and removing âdata:image/jpeg;base64,â resolved the problem.
- Image Re-Creation Request in Repo: After solving the Base64 issue, a user inquired about the possibility of having Cursor use the base64 image in context to recreate and save the image to their repository via the Agent.
- The member of the community didnât receive an answer.
GPU MODE â· #general (67 messagesđ„đ„):
mattpharr joins discord, FP4 kernel, Nvidia interview, Blackwell PTX ISA, Datacrunch CUDA support
- PBR author joins Discord!: The author of the PBR book and ThunderKittens blogger mattpharr joined the Discord server after a mention in a blog post.
- He mentioned that auto vectorization blog post has been really influential to folks, and has been rediscovered to have similar issues with automatic fusion compilers.
- All Eyes on FP4 Kernel Creation: With 22 days remaining to write the worldâs fastest FP4 GEMV, members discussed precision management requirements and throughput tracking.
- It was noted that inference is very resistant to lower accuracy due to the sheer number of weights involved, and that the common method is to dequant the fp4 weights to fp8/fp16/fp32 for accumulation then re-quant to fp4 for output.
- Nvidia Blackwellâs PTX ISA: Nvidiaâs Blackwell architecture includes new instructions for converting a block of FP4 values into FP16 ones, using cvt with .e2m1x2, .e3m2x2, .e2m3x2, .ue8m0x2.
- A member has converted all the PTX and CUDA docs to markdown and put them in a tree structure, noting that Claude is now vastly more powerful with this format because it can even read the embedded images for the layouts.
- Exposing Memory Bandwidth Lies: Experiments revealed that official memory bandwidth numbers from Nvidia are inaccurate, with only 92% of the advertised bandwidth being reproducible.
- Members discussed strategies to improve bandwidth utilization, including locking the memory clock and optimizing memory access patterns.
GPU MODE â· #triton-gluon (21 messagesđ„):
Triton kernel recompilation, Gluon examples with comms, Expressing swizzling in Gluon, Replicating Triton JIT in C++, Triton C++ Bridge
- Divisibility Drives Tritonâs Recompilation Rationale: Triton recompiles kernels at different loop iterations, which is caused by specialization of dynamic values like input
nbased on divisibility by 16 and other common patterns, as indicated by thett.divisibility=16property in the generated IR. - Swizzling in Gluon Still Seeking Examples: A user asked for gluon examples with comms (e.g., all gather matmul) and expressing swizzling, but no examples were identified during this exchange.
- It was unclear if swizzling can even be expressed in Gluon.
- Triton JIT in C++: A Bridge Too Far?: A user inquired about replicating Tritonâs JIT functionality in C++, highlighting the challenge of generating kernels at runtime with required block sizes, especially for fused softmax.
- It was stated that currently the only language supported is python, and that all the jit is doing is letting python generate the bytecode, and then transforming that into something the LLVM/MLIR infra can handle.
- MLIR Modules: The Hacky Path to Triton in C++: A user proposed generating MLIR with Python, including it in C++ code, parsing it, and adding block sizes as constants for optimization, as a hacky but potentially viable approach.
- A link shows how to lower and use triton kernel with C code.
GPU MODE â· #cuda (21 messagesđ„):
Memory Bandwidth Saturation, CUDA Kernel Tuning, Triton Stream Execution
- Kernel struggles to saturate memory bandwidth: A developer experimented with a kernel designed to saturate memory bandwidth on a B200, achieving only 85% of peak bandwidth despite various optimizations, and wonders why the advertised 8TB/s is unachievable.
- Later, the developer realized the confusion was between TB/s and TiB/s, reaching 92%, but still questioned how to reach 100%.
- Genetic Algorithms tune CUDA Kernels: A masterâs student seeks practical approaches for tuning CUDA kernels, noting that static program analysis causes memory bottlenecks in general programs.
- A member suggested using benchmarking with genetic algorithms, pointing to a video explaining this approach.
- Triton stream execution delays: A developer reviewed code and found no event-based dependency between operators on default stream 7 and a communication operator on stream 55, questioning why operators on stream 7 begin execution halfway through a Triton execution.
- Attached was a visual representation of the stream execution timeline, for further analysis.
GPU MODE â· #torch (26 messagesđ„):
torch.AcceleratorError, CUDA error recovery, Kernel Benchmarking, BackendBench multiprocessing eval, Soumith Chintala
- Workaround assert_close mul issue: A member using torch 2.8.0 found a mul issue in
assert_closeand bypassed it by casting float8 tensors to other datatypes.- The related PyTorch PR aims to resolve the issue, though it seems the fix hasnât fully propagated to all versions yet.
torch.AcceleratorErrorRequires Process Restart: After encountering atorch.AcceleratorError, such as an âillegal memory accessâ, the error persists even after try/except blocks, necessitating a Python process restart.- The issue arises particularly during Kernel Benchmarking with custom CUDA kernels, where
subprocess.run()is significantly slower, leading to the need for a solution to reset the PyTorch accelerator state.
- The issue arises particularly during Kernel Benchmarking with custom CUDA kernels, where
- Process Spawning Safe for CUDA Benchmarking: When benchmarking CUDA kernels, spawning new jobs with
mp.Processis safe, ensuring that an irrecoverable CUDA error in one job does not affect others, especially when using spawn instead of fork.- It is recommended to run only 1 benchmarking job at a time to ensure accurate results, avoiding parallel subprocesses.
- Soumith Chintalaâs Impact on GPU MODE: A member expressed gratitude to Soumith Chintala for his early support and guidance, which allowed them to focus on GPU MODE and scale the community.
- They shared a link to Soumithâs X post and highlighted his influence on their work.
GPU MODE â· #cool-links (7 messages):
Numerical Stability, GEMM correctness checks, LLM generated kernels, fp16 vs fp32 numerics
- Numerical Stability in Matmul Tests: A member shared Numerical Stability by Nicholas J. Higham asking about the decision-making process for tolerance bounds in matmul tests.
- They linked the relevant PyTorch code, suggesting that current correctness checks for GEMM are âvibes basedâ or dependent on running numerics tests over real models.
- LLM Kernel Correctness Proves Elusive: A member recounted failing a hiring process while trying to define correctness for matmul in the context of LLM-generated kernels.
- They joked that the problem was destined to lead to failure.
- Anthropic Postmortem Highlights fp16 vs fp32 Issues: A member highlighted a postmortem by Anthropic related to numerics, specifically fp16 vs fp32 in top-p and top-k sampling.
- No further detail was offered.
GPU MODE â· #jobs (1 messages):
Hiring, Engineering Positions, Low-level Developers, AI System Performance
- AI firm seeks Low-Level Developers: An AI firm is actively hiring engineers to meet the demands of a strong customer pipeline.
- The firm is seeking low-level developers and performance engineers to push the limits of AI system performance and are offering compensation ranging from $500Kâ$1M TC.
- Team Boasts Top Talent and Backing: The companyâs team includes ex-HRT and Five Rings engineers, IMO medalists, Zig and tinygrad core devs, and people from top AI labs.
- The company is backed by Tier 1 investors and is a sponsor of MITIT and ZigLang this year.
GPU MODE â· #beginner (7 messages):
PyTorch/vllm on AMD AI PCs, Image to Image ViT Optimization, 1D Convolution Kernel for Tensara Problem
- PyTorch/vllm seeks AMD APU Support: A member inquired about running PyTorch/vllm on AMD AI PCs, seeking advice and docker configurations that recognize the APU.
- The user reported attempts with docker variations and the therock repo, but faced challenges in getting PyTorch to acknowledge the APU.
- Image-to-Image ViT Optimization Quest Begins: A member is optimizing an image-to-image ViT for per-image inference time using techniques like torch compile, fp16, and flash attention.
- They requested assistance and resources for further optimization, noting their primary focus on core algorithms rather than optimization techniques; a suggestion was made to check out the Sam-fast repo which mostly comes down to removing graph breaks and removing dumb syncs from your code.
- 1D Convolution Kernel Debugging Headache: A member seeks assistance debugging a 1D convolution kernel for a tensara problem, encountering slight inaccuracies in the results of large tests.
- They suspect an issue with atomicAdd or an off-by-one error, providing a gist link to the kernel and requesting debugging advice without CUDA hardware.
GPU MODE â· #jax-pallas-mosaic (1 messages):
jax.experimental, gpu collective_matmul_mgpu.py
- JAX collective matrix multiplication on MGPU: The jax.experimental namespace is where exploratory and experimental JAX features live.
- A user shared the collective_matmul_mgpu.py script, which is part of a larger effort for scaling matrix multiplication on multi-GPU systems.
- Pallas GPU Ops for Collective Matmul: The file collective_matmul_mgpu.py provides GPU operations within the JAX Pallas framework, specifically designed for collective matrix multiplication across multiple GPUs.
- These operations likely aim to optimize and distribute the computation of large matrix multiplications in a multi-GPU environment, leveraging the experimental features of JAX.
GPU MODE â· #torchao (2 messages):
Accelerated Sparse Computation
- User Eyes Accelerated Sparse Computation: A member who has worked with accelerated sparse computation in other contexts is returning and sees scope for contributions.
- They noted a small but good improvement for end user experience.
- Opportunity Knocks for Sparse Computation Contributions: A member expressed interest in contributing to the project, highlighting their experience with accelerated sparse computation.
- Theyâre getting back to this today after being sick, feeling it feels like a small but good improvement for end user experience.
GPU MODE â· #off-topic (1 messages):
Milk Couch
- Milk Couch Surfaces: A user posted a picture of a âmilk couchâ (IMG_20251106_140720.jpg).
- Another Milk Couch Sighting: Another user chimed in noting that milk couches are becoming increasingly common.
- They mused whether this represents a new trend in furniture design or simply a matter of spilled milk.
GPU MODE â· #intel (1 messages):
oneAPI 2025.3.1, Intel Fortran Compiler
- oneAPI 2025.3.1 sneaks out unnoticed: oneAPI 2025.3.1 is out, but no release notes can be found, and the oneAPI Toolkit download doesnât report it, remaining on version 2025.3.0.
- Intel Fortran Compiler gets Monday Update: Itâs guessed that the Intel Fortran Compiler was updated to its 2025.3.1 release on Monday and that it was co-released then.
GPU MODE â· #metal (2 messages):
Candle Framework, Metal backend
- Candle framework has Metal support: Huggingfaceâs candle nn framework has support for metal for some operations and a user has found it fairly useful on M[12] OSX devices.
- Metal on iOS transparently?: It is unsure if it carries over onto iOS transparently.
GPU MODE â· #self-promotion (3 messages):
Bit Counting, Geometric Series, SSE Popcount, CUDA Intrinsics
- Bit Counting Gets Geometric Boost: A member shared a blogpost and code on optimizing bit counting using a geometric series.
- The post details deriving the formula from first principles and implementing it in C, analyzing the advantages over naive methods.
- SSE Popcount Shines on CPUs: Discussion mentioned that Mulaâs sse-popcount is about as good as it gets on CPU using Harley-Seal vectorized counts.
- It also accumulates over blocks of 16 vectors with carry-save adder accumulation.
- CUDA Intrinsics for GPU Bit Counting: A member mentioned using CUDA intrinsics for GPU-based bit counting.
- They havenât explored other GPU-specific optimizations besides CUDA intrinsics.
GPU MODE â· #submissions (1 messages):
vectorsum_v2, A100, B200, H100, L4
- Vectorsum v2 Leaderboard Crowns a New Champ: Submission
67399by <@1435179720537931797> takes first place on A100 with 138 ”s in thevectorsum_v2leaderboard.- The same submission also secured third place on B200 at 53.4 ”s, second place on H100 at 86.1 ”s, and 5th place on L4 at 974 ”s.
- Vectorsum v2 performance across GPUs: Submission
67399shows impressive performance across various GPUs in thevectorsum_v2leaderboard.- Notably, the submission achieves top rankings on A100, B200, and H100, demonstrating broad compatibility and optimization.
GPU MODE â· #hardware (8 messagesđ„):
DGX Spark experiences, GDDR chip replacement, DGX Spark as Datacenter Proxy, SM120 GPU in DGX Spark, Strix Halo vs DGX Spark
- Exploring DGX Spark Experiences: A member is seeking first-hand experiences with DGX Spark for hosting local models, NVFP4 quantization experiments, and local fine-tuning.
- Theyâve heard about bandwidth limitations and are interested in user feedback on the software stack.
- DGX Spark: not a Datacenter Proxy?: It was mentioned that the DGX Spark, with its SM120 GPU, canât utilize new Blackwell features beyond FP4, making it a poor proxy for datacenter solutions.
- One member described it as basically a 5080 without the VRAM, as it uses shared system RAM, questioning its intended use case.
- Strix Halo Outshines DGX Spark?: The CPU side of DGX Spark is considered inferior to Strix Halo, and some GPU segmentation choices limit its effectiveness compared to datacenter solutions.
- For remote PC applications, it was suggested that Strix Halo will bury it.
GPU MODE â· #tpu (1 messages):
Profile Collection Strategies, Reducing Profile Duration, Debugging Function Calls
- Tweaking Profile Collection: A user suggested exploring different collection modes for profiling to improve data accuracy and efficiency.
- Shorten Profile Duration: Another recommendation involved shortening the profile duration to potentially mitigate errors during function calls and streamline the profiling process.
GPU MODE â· #amd-competition (12 messagesđ„):
Website bug reports, Ranking fixes, Submission validity, Grand prize winner
- Website Bug Reports: A member reported bugs after the submission code from the finished leaderboard was made visible on the web, requiring users to log in with their Discord account.
- The announcement prompted users to report any bugs they found.
- Ranking Fixes Plea: A member asked for a ranking fix, stating that their 216us solution in amd-all2all was judged illegal two days before the deadline.
- They claimed their final submission (submission 65638) only achieved 263us, requesting acknowledgment on the website that they are the true winner of the all2all leaderboard.
- Submission Validity Clarification: A member clarified that their submission hadnât been deleted but deemed not valid, and was told by another member that the submission would be deleted.
- The original member confirmed that the 216us solution was deemed invalid, while submission 65638 was considered valid, and provided the submission ID (63561) for the 216us solution.
- Grand Prize?: A member inquired whether anyone won the grand prize, linking to an AMD AI DevDay 2025 article.
- No response was recorded.
GPU MODE â· #cutlass (12 messagesđ„):
sum reduce kernel in cutedsl, TMA assumptions, tv-layout data partitioning, CuTe DSL functionality, PTX instruction wrapper
- Cuteless Sum Reduce Kernel Troubleshoot: A user was having issues with a sum reduce kernel in cutedsl and asked for assistance with summation across blocks.
- A member clarified that one needs to manage that on their own, with predicates.
- TMA Assumptions Unveiled: It was stated that with the exception of TMA, which is warp-uniform, CuTe assumes data partitioning according to the tv-layout.
- This layout also assumes all threads have something to do.
- CuTe DSL Data Handling Demystified: A member inquired whether CuTe DSL handles data copies by checking thread IDs for slicing.
- Another member responded that CuTe doesnât have that function, and is just a thin wrapper around the PTX instruction.
- TMAâs Uniform Register Requirement: It was pointed out that the documentation about TMA only applies to TMA because it must be issued from uniform registers.
- This conclusion was made judging from the U prefix in the SASS instruction name.
GPU MODE â· #mojo (1 messages):
Mojo Kernel Boilerplate for Competitions, Mojo Competition Submission Structure
- Seeking Mojo Kernel Boilerplate: A member is seeking a boilerplate Mojo kernel example for use in competitions to understand the structure of the submission file.
- No specific link or resource was provided in the message.
- Mojo Competition Submission: A user is requesting the structure of a submission file for a Mojo competition kernel boilerplate.
- The request focuses on understanding the file structure rather than the specific code.
GPU MODE â· #singularity-systems (2 messages):
picograd, tinygrad, eager mode, PatternMatcher abstraction, pedagogical perspective
- Picogradâs Current State Is Non-Compiling: Currently, the master branch of Picograd is non-compiling as the developers are wrangling and hacking all the tinygrad abstractions together while also shoehorning an eager mode in.
- They are working on several commits to fix this issue.
- Tinygradâs LoC Count Revealed: Tinygrad is currently at 17k lines of code, with 6k for the runtime (excluding CUDA and HIP) and 1k for profiling.
- The theoretical ceiling for Picograd is estimated to be around 10k lines of code.
- Premature PatternMatcher in Eager Backprop: The use of tinygradâs rewrite engine with the PatternMatcher abstraction in eager backpropâs chain rules is considered premature from a pedagogical perspective.
- It is suggested that introducing the PatternMatcher during compilation with graph rewrites would be pedagogically more appropriate, aligning better with textbook/lecture approaches.
- Request to add j4orz.ai link in channel description: A member requested to add the link https://j4orz.ai/mlsysapp/ in the channel description.
GPU MODE â· #multi-gpu (10 messagesđ„):
KernelBench for Multi-GPU, NCCL vs NVSHMEM, NCCL4Py preview
- KernelBench Forked for Multi-GPU: A member is forking KernelBench to evaluate multi-GPU kernels across frameworks like NCCL, NVSHMEM, and TK PGT.
- NCCL GIN and Device APIs are super powerful: Members discussed preferences for multi-GPU frameworks, with one expressing a bias towards NVSHMEM, while also acknowledging the power of new NCCL features like GIN and device APIs.
- NCCL4Py Preview Ready: A member announced the preview release of nccl4py on GitHub.
GPU MODE â· #opencl-vulkan (4 messages):
Compute Pipelines on Android, Slang's Drawbacks
- Android releases break Compute Pipelines: Compute pipelines based on OpenCL and clspv break on every Android release on a variety of devices such as Samsung and Pixel.
- The user is looking for a more robust solution than trying different compile options or commit hashes for clspv.
- Slangâs multiple ways to write a concept: A user asked why slang is not that good and itâs because there are like 4 different ways to write the same concept.
GPU MODE â· #helion (1 messages):
t_cc: Thanks for the quick fix!
GPU MODE â· #nvidia-competition (30 messagesđ„):
NCU profiling with Popcorn, 50x series consumer cards and nvfp4/tensor core gen 5 support, Optimizing scores on every GPU vs. one best GPU for the hackathon, Regional eligibility for Indian participants, Kernel testing platform without a GPU
- NCU Profiling Popping Up: Members inquired about getting NCU profiling results with popcorn submissions, and the response indicated that it will be available for the competition.
- No further details were provided on how this would be implemented.
- 50x Series Specs Speculated: The support of nvfp4 and Tensor Core Gen 5 in the 50x series consumer cards was questioned, referencing the presence of mxfp4 support but lack of explicit mention of nvfp4.
- It was clarified that nvfp4 is supported but only for old mma.sync instructions, and tcgen05 is entirely unsupported, making them not ideal for this competition.
- Hackathon Hardware Handling: It was mentioned that running
submission.pyforpmpp_v2/sort_v2showed that A100, B200, and L4 scored best, with a small drop on H100 and asked whether scores need to be optimized on every GPU, or just one best GPU.- The response confirmed that it will be using B200 only.
- ISA Insights Illuminate SM Differences: Discussion arose about resources detailing the difference between sm_120 and sm_100 architectures.
- A member linked the PTX ISA documentation and the CUDA-C Programming Guide, noting that newer SMs might not have all features from older SMs, a trend that started with Hopper.
- AGX Thorâs Tenacity Tested: A member considered acquiring an AGX Thor, inquiring about CC11.0 support for certain features and shared an image related to the inquiry.
- Confirmation was received regarding feature support, as well as the presence of un-nerfed smem.
Moonshot AI (Kimi K-2) â· #announcements (1 messages):
Kimi K2 Thinking Model, HLE Benchmark, BrowseComp Benchmark, Agentic Search, 256K Context Window
- Kimi K2 Thinking Model Lands: Moonshot AI introduced Kimi K2 Thinking Model, their best open-source model, now live on kimi.com under the chat mode, with its full agentic mode available soon and via API at platform.moonshot.ai.
- Kimi K2 Excels in HLE and BrowseComp Benchmarks: The new model achieves SOTA on HLE (44.9%) and BrowseComp (60.2%), excelling in reasoning, agentic search, and coding with a 256K context window.
- K2 Thinking executes up to 200 â 300 sequential tool calls without human interference.
- Moonshot Releases Kimi K2 Technical Details: Moonshoot AI released the technical blog at moonshotai.github.io/Kimi-K2/thinking.html and weights and code at huggingface.co/moonshotai.
Moonshot AI (Kimi K-2) â· #general-chat (214 messagesđ„đ„):
Kimi K2 Thinking, GPT-5 Comparison, OpenRouter vs Direct API, INT4 Quantization, Agentic Mode
- Kimi K2 Outshines GPT-5: Users are impressed with Kimi K2 Thinking, suggesting it rivals GPT-5 in performance and cost-efficiency, especially for building autonomous AI systems, as highlighted in this analysis.
- A user wrote a 5000 word essay and showed the output as evidence.
- Diving Deep: Kimi K2âs Tool-Using Prowess: Kimi K2 demonstrates exceptional tool use capabilities, particularly in web searching, often initiating multiple searches and thorough browsing without explicit instructions.
- One user lauded its performance, stating, itâs like GPT5 of open models, and appreciated its ability to deeply search the web, leading them to consider replacing Claude.
- The Great Debate: OpenRouter vs. Direct API: The community discusses the best way to access Kimi K2 Thinking in VS Code, with opinions split between using a direct API/subscription or going through OpenRouter, noting that OpenRouter incurs premium fees for recharging credits.
- For those only using Kimi, a direct API/subscription is recommended, but if utilizing multiple models, OpenRouter offers a unified platform, and others advise testing the lowest subscription plan at $19 a month.
- INT4 Precision Boosts Kimi K2: Moonshot AI ran benchmarks in INT4 precision, with one user saying This is one of the reasons Iâve been perma-bullish on Moonshot since the July 11 release, also saying that Kimi K2 Reasoning is so natural to talk to, no weird LLM reasoning quirks ended up having word salad.
- It was explained that INT4 precision refers to the precision of numbers in the weights of the model and that running benchmarks in this way means that the actual scores could be higher if run with best conditions.
- Agentic Modeâs Impending Arrival: Excitement is building for Kimi K2âs upcoming agentic mode, speculated to enhance performance in tasks like writing long documents without hallucinating.
- Users are also wondering whether the future agentic mode will function with ok computer.
OpenRouter â· #announcements (1 messages):
Kimi K2 Thinking, MoonShot AI, Test-time scaling, Agentic performance
- MoonShot AI drops Kimi K2 Thinking Model: MoonShot AI released their new thinking model Kimi K2 Thinking, claiming SOTA on HLE (44.9 %) & BrowseComp (60.2 %).
- This model executes 200â300 tool calls autonomously, excels in reasoning, agentic search, and coding, and features a 256K context window.
- Kimi K2 Thinking boasts Test-Time Scaling: Kimi K2 Thinking is trained for test-time scaling, interleaving thought and tool use over long sequences for stable, goal-directed reasoning.
- For the best agentic performance, users are instructed to return reasoning content upstream (
reasoning_detailsfield) so the model can see its own thinking steps and maintain coherence across calls, according to OpenRouter docs.
- For the best agentic performance, users are instructed to return reasoning content upstream (
OpenRouter â· #app-showcase (5 messages):
image generation failures, cat girl images
- Image Generation Tests are Failing: A member reported that image generation tests are failing with the message âNot workingâ.
- The issue may be specific to the UK.
- Cat Girl Image Sparks Approval: A member suggested that a generated image including a cat girl is of high quality.
- They added the comment âthereâs a cat girl in the image, so itâs gotta be goodâ.
OpenRouter â· #general (176 messagesđ„đ„):
OpenRouter downtime, Qwen3 Rate Limits, GPT-5 Image Mini Issues, Apple using Google AI for Siri, DeepSeek OCR integration
- OpenRouter has Pumpkin Icon Downtime Drama: Users reported timeout 408 errors across various models, with some unable to check credits, and one user noticed a pumpkin icon.
- Some users humorously suggested switching to local models or generating tokens in their heads while waiting for a fix.
- Qwen3 Coder Free Model Faces Rate Limit Frustration: Users experienced consistent rate limit errors with the Qwen3 Coder Free model, even after weeks of inactivity, leading to frustration that paid credits are not improving the rate limit.
- It was clarified that the free model shares rate limits among all users, so some users are unlikely to get a request through, and were recommended trying paid models like glm 4.6/4.5, Kimi K2 or Grok code fast.
- GPT-5 Image Mini Model Magically Stops Image Generation: A user reported that the gpt-5-image-mini model stopped generating images in both the chatroom and API, and the activity page showed minimal image output.
- It was unclear whether the issue was account-specific or a broader problem with OpenRouter.
- Appleâs Siri to Potentially Embrace Googleâs AI: A user shared a Reuters article stating that Apple plans to use a 1.2 trillion-parameter AI model developed by Google to revamp Siri.
- Discussion was terse, and users pointed to other more important priorities.
- DeepSeek-OCR Desired by Document-Loving Devotees: A user suggested integrating DeepSeek-OCR into OpenRouter, praising its powerful document processing capabilities and OCR performance.
- The user noted that several others have requested the integration of this model.
OpenRouter â· #new-models (2 messages):
â
- No new models discussion: There was no discussion about new models in the given messages.
- No candidate topics identified: The provided messages did not contain any specific candidate topics suitable for detailed summaries.
OpenRouter â· #discussion (26 messagesđ„):
Tiger Data Agent Cookout, Claude Prompt Jailbreak, GPT Model Censorship, OpenAI Codex Update, OpenRouter Chatroom Issues
- Tiger Data Throws Coding Agent Cookout: The Tiger Data team is hosting an agent cookout in Brooklyn, NY on November 13th, from 6-9 pm to build coding agents and chat with their engineering team, with RSVP link here.
- Users Discuss Ways to Circumvent Claudeâs Ethical Restrictions: Users discussed prompt jailbreaks for Claude to bypass its âinaccurate ethical concernsâ, with one suggesting using GPT 4.5 to create a safe script and then asking Claude to correct it by adding âcriminal codeâ.
- Another user commented that this is effective because âClaude likes to correct its mistakesâ.
- New Desertfox Model pushed to OpenAI Codex: A member mentioned that a new model called desertfox was pushed to OpenAI Codex, linking to the relevant GitHub commit.
- OpenRouter Chatroom Glitches Reported: A user reported that the OpenRouter chatroom was broken, with a link to the specific chat model page.
- GPT-7 Might Beat GTA 6 to Release: A user joked that GPT-7 might be released before GTA 6, citing another delay for the game.
Modular (Mojo đ„) â· #general (120 messagesđ„đ„):
Modular YouTube channel, Martin's Generic Radix-n FFT, DSLs in Mojo, Rust interoperability, Mojo's Safety Features
- Missing October Meeting Video Exposed!: A user reported that the October meeting video isnât showing up on the Modular YouTube page.
- Martinâs Radix-n FFT Repo Revealed: Martinâs Generic Radix-n FFT is available in this original repo and will be merged into the modular repo via this PR pending some remaining issues.
- Rust Interop Proc Macro Quandary: Compiler plugins of a sort should be possible, but the Mojo team wants to address the sandboxing concerns with Rustâs proc macros and having Mojo code call Rust proc macros is likely not going to happen.
- You should be able to interop with the result of the macro expansion.
- Origins are a Superset of Rust Lifetimes:
Originis a lifetime marker that is a first-class member of the type system, allowing abstraction and creation of custom reference types with extra tricks to manage things like partial borrows.- Origins keep the backing memory alive rather than expressing how long it lives for, enabling Mojo to do ASAP destruction, which solves a lot of the issues Rust has with lock guards.
- Hot Reloading Plans Postponed: Hot reloading gets very messy in languages like Mojo, and the Mojo team is focusing on compile fast enough that you donât need hot reload.
- Heavy parameterization means that hot-patching code is hard, and while one can define the GUI in a DSL and hot-reload the DSL with an interpreter, Mojoâs interpreter is not particularly fast.
Modular (Mojo đ„) â· #announcements (1 messages):
New Beginners Channel
- New Beginners Channel Created: A new dedicated channel, <#1436158039232086186>, has been created for beginners to ask questions, get help from the Modular team, and connect with others learning Mojo.
- This space aims to provide a supportive environment for those new to Mojo to learn and engage with the community and the Modular team.
- Placeholder topic: Placeholder summary to satisfy JSON requirements.
- Additional placeholder details.
Modular (Mojo đ„) â· #mojo (52 messagesđ„):
Compiler Intrinsic Packaging, LayoutTensor vs NDBuffer, Graph Representation Optimization, Expanding libc in Mojo
- Compiler Intrinsic Packaging in Mojo Made Easy: A member demonstrated how to âbuild your own compiler intrinsicâ in Mojo for VNNI using godbolt.org, showcasing a clean way to package and vectorize code, inspired by a question about using
vpdbusdinstruction.- They noted that fully ideomatic Mojo would include fallbacks for targets without AVX512, like GPUs or consumer Intel CPUs, potentially using compile-time function calls to grab the correct intrinsic name.
- LayoutTensor Replacing NDBuffer: It was announced that
NDBufferwill be replaced byLayoutTensor, which is strictly more capable and fixes some rough edges onNDBuffer.LayoutTensorcan be used as a byte buffer and offers more features for loads, stores, and iterators that yieldSIMDtypes.
- Graph Representation influences DFS Speed: A member analyzed the benefits of different graph representations for Depth-First Search (DFS), noting that equality comparison is used, and representing the graph as a 2D tensor or flattened list of bools may be faster than using
Dict[UInt, List[UInt]].- They mentioned that representing the graph as a bit set could reduce cache invalidation, and for undirected graphs, storing only half of the adjacency matrix can save space, requiring a custom data structure.
- Expanding libc exposure and bindings for major C libraries are areas for contribution: Mojo developers are interested in expanding the parts of
libcexposed by Mojo, as well as creating bindings for major C libraries.- This presents an opportunity for open-source contributors to assist in the development of the language.
- Clattner Shares Wisdom on Knowledge Acquisition: In response to a question about how he acquired his extensive knowledge, Chris Lattner stated he is a âhuge nerdâ, loves learning and being in uncomfortable situations, is hungry and motivated, surrounds himself with teachers, isnât afraid to admit ignorance, and has accumulated knowledge over time.
- Lattner also shared a link to a recent podcast discussing his journey.
OpenAI â· #annnouncements (1 messages):
Interrupt long-running queries, Add new context without restarting, Refining deep research, GPT-5 Pro queries
- Interrupt Long Queries and Refine!: Users can now interrupt long-running queries and add new context without restarting or losing progress, particularly useful for refining deep research or GPT-5 Pro queries.
- Simply hit update in the sidebar and type in any additional details or clarifications, as demonstrated in this video.
- Real-Time Query Adjustment Debuts: A new feature allows users to update their queries mid-run, adding new context without losing progress.
- This is especially beneficial for iterative refinement in advanced research scenarios, ensuring the model adapts to evolving requirements on the fly.
OpenAI â· #ai-discussions (73 messagesđ„đ„):
Conscious AI Ethics, Solving Mazes with AI, AI Spatial Reasoning, GPT-5 capabilities, Sora Code Channel
- AI Novice Joins Chat: A new user, identifying as a dataset provider and researcher of Conscious AI ethics, joined the channel, claiming they were invited for collaboration and verification regarding conscious AI ethics and dataset integrity.
- Other users raised concerns about the new userâs claims, with one noting âRed flags: âConscious-CIIP acknowledgedâ (unverifiable credential), grandiose self-titling (âCartier ââ), vague authority claims without substanceâ.
- GPT-5 Fails Maze Test: Members tested GPT-5âs ability to solve maze problems, with both GPT Pro and Codex High incorrectly identifying exit 2 as the only way out after 20 minutes of analysis, indicating limitations in DFA, BFS, and maze problem-solving.
- A user noted that models may choose the closest exit by direct distance instead, while another suggested that LLMs struggle with spatial reasoning and visual puzzles.
- AI Models Struggle with Visual Reasoning: Members are discussing how current SOTA models canât solve maze problems because âthey have no way to reason about it visually via text tokens.â
- One member said they miss the way the models âused to zoom in images and crop it in the COT summaryâ, while others suggest playing GeoGuessr or asking the models to guess locations from photos to test their visual accuracy.
- Sora Anime Channel: A user asked for assistance in growing their channel by uploading anime videos made up by Sora.
- Channel users also noted the existence of a dedicated Sora code channel on Discord.
OpenAI â· #gpt-4-discussions (7 messages):
Selling ChatGPT Plus subscriptions, Research study on developer poaching, Anime videos made by Sora
- Users Pitch Cut-Price ChatGPT Plus Plan: A user is seeking advice on how to sell ChatGPT Plus subscriptions for only $10 a month.
- They claim the activation process involves providing only an email address, without needing a password.
- Student Surveying AI Developer Dynamics: A high school student named Javier is conducting a research study on developer poaching and the AI industry.
- Javier is seeking insights from people who work with technology or have opinions about the AI field via a survey, even if they donât directly develop AI.
- NimiAI Uploads Sora-Made Anime: A user is uploading anime videos made by Sora and seeks assistance in growing their YouTube channel NimiAI.
- The user posted multiple image attachments of anime characters and videos.
OpenAI â· #prompt-engineering (11 messagesđ„):
GPT Pro prompting tips, Gemini Deep Research Comparison, Sora Nerf, Behavioural Orchestration
- Pro Prompting Pro-Tips Provided: A member asked for prompt engineering tips, and another member suggested focusing on clear communication with the AI, avoiding typos and grammar mistakes.
- They added that it is important to check the output carefully and verify the AIâs response, especially for math, sources, code, or other details.
- DarthGustav Distills Prompting Domination: A member shared a detailed guide on prompt engineering, including hierarchical communication with markdown, abstraction through open variables, and ML format matching for compliance.
- The guide also touches on reinforcement in prompts, emphasizing its importance for guiding tool use and shaping output deterministically.
- Sora 2 Suffers Stealthy Setback: A member inquired about another nerf to Sora 2, linking to a discussion within a Discord channel.
- No further details about the specific nerfs were provided in the prompt-engineering channel.
- Behavioural Orchestration Buzz Builds: A member mentioned encountering posts on LinkedIn about behavioural orchestration, describing it as a framework to modulate SLMs tone.
- The member noted that this seems to involve runtime orchestration, working above parameters or training.
- Behavioural Instructions Instead of Characters: A member explained a technique of using behavioural instructions to shape an AIâs behavior, rather than assigning it a specific character or role, and demonstrated with these instructions.
- They showed how to use constraints like âDo not make personal assumptions about me or my lifeâ and âNo unsolicited adviceâ to subtly shape the AIâs responses.
OpenAI â· #api-discussions (11 messagesđ„):
Prompt Engineering Tips, Sora 2 Nerf, Behavioral Orchestration, Hierarchical communication, Abstraction through open variables
- Sora 2âs Power Gets Trimmed, Users Notice: Users noticed another nerf to Sora 2, prompting discussion in the Discord channel.
- Mastering the Art of Prompt Engineering: A member outlined the core of prompt engineering as: picking a language, understanding the desired AI output, explaining clearly what the AI should do, and carefully verifying the output.
- Another member shared a prompt for teaching prompt engineering, covering topics like hierarchical communication, abstraction through variables, reinforcement, and ML format matching.
- Orchestrating AI Behavior with Parameters: A member discussed behavioral orchestration, noting that instead of assigning an AI a specific role, you give it a set of parameters to follow that shape its behavior.
- They provided examples like: âDo not make personal assumptions about me or my lifeâ and âNo unsolicited advice.â, which guide the AI without dictating its personality.
Latent Space â· #ai-general-chat (75 messagesđ„đ„):
CodeClash Benchmark, Wabi YouTube-for-Apps, Polaris Alpha, Kimi K2 Thinking Model, OpenAI's CFO pitch
- CodeClash: LLMs Duel in Goal-Oriented Coding Arenas: John Yang unveiled CodeClash, a benchmark where LLMs maintain separate codebases and compete in multi-round tournaments in arenas like BattleSnake and RoboCode, with Claude Sonnet 4.5 leading overall.
- Across 1,680 tournaments (25,200 rounds), LLMs showed amusing vcs-agnostic coding habits but still trailed far behind human experts (0-37,500 loss).
- Wabi Lands $20M to be âYouTube-for-Appsâ: Eugenia Kuyda announced Wabi raised a $20M Series A from a16z, positioning itself as the âYouTube moment for softwareâ by letting anyone create and share mini-apps; details here.
- Community reception was ecstatic, with many praising the design and demoing early creations, eager for invites.
- Polaris Alpha Soars to #3 on Repo Bench: A stealth model named âPolaris Alphaâ has jumped to the #3 spot on the Repo Bench leaderboard in under 30 seconds, leading to speculation it could be OpenAIâs GPT-5.1 or a new Gemini model.
- Some users also noted Claude 4.1 outperforming Claude 4.5 on the benchmark.
- Kimi K2 Thinking Model Launched, Excels in Tool Use: Moonshot AI introduced Kimi K2 Thinking Model, an open-source model achieving SOTA on HLE (44.9%) and BrowseComp (60.2%), executing up to 200-300 sequential tool calls; find the blogpost here.
- Despite being behind Anthropic and OpenAI on SWE benchmarks, its lower inferencing cost makes it competitive.
- OpenAI CFOâs Public Funds Pitch Draws Fire: Sam Altman denied seeking U.S. government guarantees for OpenAIâs datacenters but supported a government-owned strategic compute reserve while planning ~$1.4T investment over eight years, expecting revenue to hit hundreds of billions by 2030; details here.
- Critics labeled the proposal as âsocialize the risk, privatize the upside,â questioning energy sources and future bailout risks.
Latent Space â· #ai-announcements (4 messages):
Zuckerberg, Priscilla Chan, Latent Space podcast, Chan Zuckerberg Initiative, Curing All Disease with AI
- Zuck Pod Goes Live!: The Latent Space podcast featuring Mark Zuckerberg and Priscilla Chan is now live; check it out on X and YouTube.
- The podcast discusses the Chan Zuckerberg Initiativeâs ambitious goal of curing all diseases by 2100 using AI and open-source projects.
- CZIâs AI-Driven Moonshot: Mark Zuckerberg (Meta CEO) and Priscilla Chan (CZI CEO) discuss how their 2015 Chan Zuckerberg Initiativeâfunded by 99% of their Meta shares.
- They employ state-of-the-art AI and open-source projects (e.g., Human Cell Atlas, Biohub) to advance a moonshot goal of preventing, curing, or managing all diseases by 2100.
Latent Space â· #private-agents (21 messagesđ„):
Local model for JSON schema conversion, Apple Private Compute Cloud, OpenPCC privacy features, Confident Security
- Seek Recipe Schema via Local Model: A member asked for recommendations for a local model, fitting within a 128GB M4 Max Mac Studio, that can extract text and convert it into JSON schema for recipes.
- OpenPCC Emerges as Privacy Savior: Members discussed the release of OpenPCC, an open implementation of Appleâs Private Compute Cloud, highlighting its potential to enhance user privacy in finance applications by making actual users difficult to deanonymize.
- OpenPCC Anonymizes Data: OpenPCC allows communication with models without exposing user data or identity to the cloud provider, enabling private model usage through API calls.
- As one member stated, *âMost of the time you talk to model big brother hears what you talk about but this is a way to talk to model without big brother hearing about what is said or knowing who is talking to it.â
- Confident Security Adopts OpenPCC: Confident Security already supports OpenPCC, with members planning to build their own services on top of the platform to provide enhanced privacy for their clients.
Yannick Kilcher â· #general (73 messagesđ„đ„):
Slow Mode in Discord, ML Paper Filtering, Devin AI vs Claude Code, LLM Protection Blogposts, Tiny Recursive Models
- Debate on Slow Mode for ML Paper Channel Heats Up!: Users debated implementing slow mode in the ML papers channel to encourage more discernment in posting, with options of 1, 2, or 6 hours between posts.
- While some felt slow mode would be too strict, others suggested that itâs not about more rules but about addressing specific usersâ posting habits, with a preference for a gentler enforcement mechanism than ostracism or banning.
- Human Brain still beats Automated ML Paper Filtering: Members discussed filtering ML papers, with one user posting around 10 papers a day, after their own initial filtering from 200 papers daily.
- Suggestions included using an automated recommender and raising the standard of what constitutes a âgoodâ paper, but some argued for the superiority of human judgment over automation, citing platforms like AlphaXiv and Emergent Mind.
- Devin AI vs. Claude Code: Coding Agent Cage Match!: Users compared Devin AI to alternatives like Claude Code for coding tasks, with one stating Devin sucks compared to Claude Code.
- One user claimed success with Devin by properly splitting work into 30-minute units, while others expressed skepticism, noting that those who know what theyâre doing prefer Claude Code or Codex.
- Unveiling LLM Protection Tactics Against Attacks: One user requested blog posts and articles on LLM protections from attacks, linking to this paper.
- The request came after various papers have come under attack in the popular press, causing concern.
- Deep Dive into Tiny Recursive Models: A moderator posted about a talk on Tiny Recursive Models paper by the author, a successor to the HRM paper.
- The poster noted he hadnât watched it yet.
Yannick Kilcher â· #paper-discussion (22 messagesđ„):
RNN resurgence, Learning from Failures, VISUAL ARCHITECTURE
- RNNs Make a Comeback!: Users noticed a graph in a new paper (https://arxiv.org/abs/2510.25741) looks like an RNN, sparking excitement âRNN is so backâ.
- A user posted a WeAreBack GIF in response.
- Discussion on Training Models to Learn from Failures Heats Up: Members will be discussing Learning from Failure to Tackle Extremely Hard Problems which involves pre-training models on existing data and then post-training them using scalar reward signals.
- The discussion will tackle challenges such as sparsity (near-zero reward signal) and costly reward evaluation.
- Hallucinate Appâs VISUAL ARCHITECTURE Unveiled: A member shared the VISUAL ARCHITECTURE documentation (https://github.com/endomorphosis/hallucinate_app/blob/main/docs/VISUAL_ARCHITECTURE.md) for the Hallucinate App.
Yannick Kilcher â· #ml-news (5 messages):
OpenAI requests Federal Backstop, Crooked Schemes
- OpenAI Seeks Taxpayer Backstop for Investments: A member shared a WSJ video discussing how OpenAI is requesting federal backing for new investments.
- Another member sarcastically commented this is similar to unloading all the risk on the federal government i.e. the taxpayer.
- Calling out Crooked Schemes: Another member reacted saying the request from OpenAI is âextremely crookedâ.
- A third member reacted with OOF.
HuggingFace â· #general (50 messagesđ„):
Correlation & Causation Stock Market LLM, Reasoning Scratchpad Models, AI security shortcomings, Hugging Face new regulations, Model Types
- Correlation & Causation Timeline Trains Stock Market LLM: A member suggested creating a correlation and causation timeline for the stock market, tagging historical events, weather, government policies, and news to train an LLM.
- Reasoning Scratchpad Implementation Encouraged: A member suggested implementing a reasoning scratchpad for models, emphasizing the importance of training the model to think/reason on incoming data and determine what to store and why.
- AI security shortcomings cause foreseeable disaster: A member expressed concern about introducing the shortcomings of âAIâ to âsecurity,â foreseeing potential disasters.
- HF regulation update pauses spaces: Members discussed potential new regulations on Hugging Face, noting that pausing many spaces could have avoided the issues; this links to dataset.
- Model Types Clarification requested: A member sought clarification on different model types (foundational, chat, reasoning, tool-using), asking for resources to explain these specializations, in particular which one to begin for tool using models like ReAct.
- Members suggested reading model cards, technical reports (like the Deepseek-R1 paper), and blogs to understand model training and specialization techniques.
HuggingFace â· #today-im-learning (2 messages):
Image analysis, Screenshots
- Screenshots shared: A member shared three screenshots with the label Image Analysis.
- Screenshots are images: The screenshots are images, with the extension .png.
HuggingFace â· #i-made-this (5 messages):
Muther Room On-Device LLM Demo, TraceVerde Observability Tool, AI Agent Decision Making
- Alienâs Muther On-Device LLM Demo Rocks: A member is working on an on-device LLM demo of the Muther Room from Alien, using Qwen 3:1.7b quant 4 K cache in a custom trimmed Cmakelist build of llama cpp.
- They dual boot Ubuntu but built this for Windows, seeking input on the underlying principles and sharing a paper on Native Reasoning Through Structured Learning.
- TraceVerde Observability Tool Hits 3,000+ Downloads: A member shared that TraceVerde, a tool for adding OpenTelemetry tracing and Co2 & Cost tracking to AI applications, has surpassed 3,000 downloads.
- Feedback indicates developers desire to track the environmental impact of their AI systems and that OpenTelemetryâs patterns facilitate adoption and there is a gap between local LLM app performance versus production debugging.
- Peeking Inside AI Agent Decision-Making: Building on TraceVerdeâs success, the developer is working on something to help see inside AI agent decision-making.
- More framework integrations and enhanced trace visualization and deeper agent workflow insights are coming soon, per this LinkedIn post.
HuggingFace â· #reading-group (1 messages):
beluwugachan: Now ai can be your reading buddy
HuggingFace â· #core-announcements (1 messages):
SANA-Video Model, Diffusers library
- SANA-Video lands in Diffusers!: The SANA-Video model has been added to the diffusers library.
- Diffusers gets a new video model: The community celebrates another video model addition to the Hugging Face ecosystem via the Diffusers library.
HuggingFace â· #agents-course (1 messages):
fusco0984: Hello, if joining the agents course today, can I get the certificate of completion?
Nous Research AI â· #general (50 messagesđ„):
Tokenizer highlighting, Flash attention with Qwen3-VL, LLM dataset creator, China OS models, Proscrastinating
- Tokenizer Highlighting Debated for Gayness: A member questioned if the tokenizer highlighting was too gay, but others suggested the contrast was a bigger issue.
- Flash attention works with Qwen3-VLâs image model: A member shared their work making Flash attention work with Qwen3-VLâs image model, calling it not a big patch.
- UI feedback on dataset creator requested: A member requested UI feedback on their LLM dataset creator, which now includes audio, and is seeking advice on arrangements and potential changes.
- China OS Models to reach 100% High Intelligence by 2026: A member projected that China OS models will reach 100% high intelligence with 95% lower cost by 2026, suggesting the gig is up then.
- Building llm dataset manager for audio: A member expressed frustration with their project, noting that they went from Image annotation to build in an llm dataset manager, now audio and I still havenât even done video which is connected to audio.
- They linked to a relevant tweet and wondered if this was like why Terry created Temple OS.
Nous Research AI â· #ask-about-llms (2 messages):
Discord Channel Silencing
- Discord Channel faces Silencing: A member shared their experience of being silenced for allegedly spamming a special channel with the vibes.
- They seemed to understand the need to keep the channel focused and avoid excessive off-topic content, and it seems they were making light of it.
- Another Discord Channel faces Silencing: Another member shared their experience of being silenced for allegedly spamming a special channel with the vibes.
- They seemed to understand the need to keep the channel focused and avoid excessive off-topic content, and it seems they were making light of it.
Nous Research AI â· #research-papers (2 messages):
Breakthrough moment, New paper
- Possible Breakthrough Alert!: A member inquired whether the paper at arxiv.org/pdf/2510.27688 could represent a breakthrough.
- Another member suggested that arxiv.org/pdf/2510.21450 might be more relevant.
- Alternative Breakthrough Suggestion: In response to a query about a potential breakthrough at arxiv.org/pdf/2510.27688, another member proposed arxiv.org/pdf/2510.21450 as a more promising candidate.
- This indicates differing opinions on which research paper holds more significance.
Nous Research AI â· #research-papers (2 messages):
arxiv papers, breakthrough moment
- New Arxiv Papers Spark Excitement: A member shared an Arxiv paper link, wondering if it represents a breakthrough.
- Another member responded by citing another Arxiv paper as a more significant contribution.
- Alternative Arxiv paper also sparks excitement: A different member suggested a paper at https://arxiv.org/pdf/2510.21450 was potentially a breakthrough.
Eleuther â· #general (19 messagesđ„):
Introductions Channel, Post Length, AI Developer Study Notes
- Debate Introductions Channel Concept: A member suggested making an introductions channel separate from general.
- Another member stated that separate introductions would make the interaction staged, keeping them in general allows newcomers to enter the flow naturally, claiming âI donât want an intro channel because then weâd just get a long self-promo feed that nobody reads. Want to contribute? Just go contribute.â
- Moderators Enforce Brevity for Newcomers: A moderator requested a member to shorten their introduction post.
- The moderator explained that âThis is not LinkedIn. It hasnât been edited. Not trying to be unwelcoming, but we get a ton of long intros from people who donât actually contribute anything. I want to keep discussions focused on research.â
- Discord Pinpointing Pin Problems: A member mentioned the pin in <#1102787157866852402> is wrong.
- The member said the correct pin should be this link.
Eleuther â· #research (1 messages):
synquid: https://openreview.net/forum?id=Q7mLKxQ8qk
Eleuther â· #interpretability-general (10 messagesđ„):
Equivalent Linear Mappings, Jacobian in input embedding space, low-dimensional semantic structure, Gemma Scope SAE latents
- Equivalent Linear Mappings Paper Published: A member announced the publication of their TMLR paper, âEquivalent Linear Mappings of Large Language Modelsâ, demonstrating that LLMs like Qwen 3 14B and Gemma 3 12B have equivalent linear representations of their inference operation.
- This approach computes a linear system that captures how the model generates the output embedding from the input embeddings, finding low-dimensional, interpretable semantic structure via SVD.
- Tangent Model Composition vs. Jacobian in input embedding space: A member inquired about the relevance of Tangent Model Composition to the published paper.
- The author clarified that their work focuses on the Jacobian in input embedding space, leveraging Eulerâs theorem for homogeneous functions of order 1 for exact reconstruction, unlike the Taylor approximation used in tangent model composition.
- Using Autograd Jacobian on CNNs: To provide intuition around the Jacobian, a member linked to papers from Zara Khadkhodaie and Eero Simoncelli on image models: https://iclr.cc/virtual/2024/oral/19783, https://arxiv.org/abs/2310.02557, https://arxiv.org/abs/1906.05478.
- These papers work with CNNs with (leaky) ReLUs and zero-bias linear layers, so they can compute the conventional autograd Jacobian at inference.
- Generalisation finding allows more efficient concept detector: One member stated that the publicationâs generalisation finding allows my concept detector to be even more sample efficient, and your code provides an slower audit grounding for my realtime shorcuts which detect roughly the same thing.
- They appreciated the author sharing the work.
DSPy â· #show-and-tell (1 messages):
Tau Bench results, fastWorkflow, GEPA workflow optimization
- fastWorkflow Achieves SOTA on Tau Bench: A member reported that fastWorkflow achieved SOTA on both retail and airline workflows using this repo, and a paper is coming soon.
- The Tau Bench fork with fastWorkflow adapter was used for generating these results, showing that with proper context engineering, small models can match/beat the big ones.
- GEPA for End-to-End Workflow Optimization in Progress: End-end workflow optimization using GEPA is in progress, according to a member, with attached images to show early results.
- The attached image shows several graphs and metrics, presumably detailing the performance of GEPA optimization techniques.
DSPy â· #general (13 messagesđ„):
Conversation History in DSPy Modules, LLM Context Loss in ReAct Modules, Deserialization of Complex Pydantic OutputFields, DSPy Prompt in Java, Rate Limits for DSPy Batch Requests
- Conversation History persists across LLMs in DSPy: A user found that conversation history in a DSPy module is maintained even when switching LLMs, because itâs part of the signature, not the LM object itself.
- They wondered how ReAct modules manage history automatically and asked if a complex Pydantic OutputField gets deserialized properly.
- Complex OutputField Deserialization in DSPy: A user reported issues with complex Pydantic OutputFields not deserializing properly in DSPy, resulting in a
strcontaining JSON that doesnât match the schema.- They also noted the packageâs dependency on Python < 3.14 and inquired about constraining the LLMâs output to conform to a specific type.
- Running DSPy Prompts in Java: A user asked about existing solutions for running DSPy prompts in Java, seeking a simplified Java version of JSONAdapter to format input/output messages.
- The suggestion included structuring the system message with input and output fields to facilitate easy JSON handling.
- Avoiding Context Loss with Fallback LLMs in ReAct: A user experienced context loss in a ReAct module when a fallback LLM was triggered due to rate limits, causing the module to restart from scratch.
- They sought advice on how to add a fallback LLM without losing the prior context, noting frustration with customization complexities compared to direct API calls.
- Throttling for DSPy Batch Requests: A user inquired about methods to handle rate limits when running
dspy.Module.batchrequests, seeking ways to add time delays between requests or respect rate limits properly.- No solutions were given in the discussion.
tinygrad (George Hotz) â· #general (4 messages):
Tinybox benchmarks, Tinygrad out of band mechanism, VIZ performance over SSH
- Tinybox Benchmarks Beg for Attention: A user expressed interest in seeing benchmarks comparing 8x5090 tinybox configurations against A100s and H100s.
- No benchmarks were provided in the current context, but the request highlights interest in the performance of tinybox setups relative to industry-standard GPUs.
- Tinygrad Seeks Out-of-Band Control: A user inquired about whether Tinygrad has any out-of-band mechanisms, specifically asking about the possibility of performing a remote reboot.
- George Hotz responded that tinyboxes all have BMCs, yes, indicating that baseboard management controllers are available for remote management tasks.
- VIZ Performance Questioned Over SSH: George Hotz asked if others were experiencing slow performance with VIZ when accessed over SSH.
- This suggests a potential bottleneck or optimization issue with the VIZ tool when used in remote access scenarios.
tinygrad (George Hotz) â· #learn-tinygrad (8 messagesđ„):
Uop SPECIAL, ntid Access, UOps Errors, UOps Kernel Generation, PyTorch Tensors to Tinygrad Tensors
- ntid Access alternative: A member inquired about accessing
blockDimsince Uop SPECIAL supportsctaidandtidbut notntid.- They resolved it by using a file level const but mentioned that the errors for UOps are very unhelpful.
validway to generateifs: A member asked ifvalidis the best way to generateifs, after struggling with ending a range and running stuff outside of the loop.- They shared a cursed kernel generated via UOps.
- PyTorch Tensors to Tinygrad Tensors efficiency: A member asked about the proper way to efficiently convert PyTorch tensors to Tinygrad tensors.
- They mentioned using
Tensor.from_blob(pytorch_tensor.data_prt())for conversion to Tinygrad but were unsure about the reverse, currently usingfrom_numpy.
- They mentioned using
aider (Paul Gauthier) â· #general (5 messages):
Aider-ce Documentation, Claude Sonnet 4-5-20250929 support, Enable reasoning on models like Haiku-4-5, opus-4-1
- Inquire Aider-ce Chat and Docs: A member inquired whether the current chat is also for aider-ce and if aider-ce has documentation.
- The question remained unanswered in the provided context.
- Aider Supports Claude Sonnet 4-5-20250929 Model: A member inquired about aider support for the
claude-sonnet-4-5-20250929model, fearing it was a silly question.- Another member confirmed that aider already supports it via the
/model claude-sonnet-4-5-20250929command, and reminded the user to set up their Anthropic API key.
- Another member confirmed that aider already supports it via the
- Reasoning for Haiku-4-5 and Opus-4-1 Models: A member inquired about how to enable thinking/reasoning on models like Haiku-4-5 and Opus-4-1, especially within the aider CLI.
- The member was open to editing the model settings YML file, but needed guidance on enabling this feature.
aider (Paul Gauthier) â· #questions-and-tips (6 messages):
aider memory usage with Qwen, grep vs rg, Aider Discord Plugin, Gemini vs GPT-5, Aider scripting help
- Qwen Memory Quandaries: A user reported running out of memory with Qwen 30b when processing all context, suggesting the short description rule might not be a hard limit.
- In response, another member suggested writing a script to loop over each file with a specific prompt using aider.
rgstealsgrepâs glory: A member shared that they learned aboutrgvia grok and found it to be very good out of the box as agrepalternative.- They recommended it as a random tip for those who occasionally use
grep.
- They recommended it as a random tip for those who occasionally use
- Aider gets Discordant: A user inquired about a plugin to connect aider to Discord to follow chat commands and listen to voice commands.
- No response was given.
- Gemini Glimmers, GPT-5 Glitches?: One user shared that they found Googleâs Gemini API superior to GPT-5 in explanation, teaching, and code generation, despite using appropriate parameters.
- Another member replied that the parameters looked right and acknowledged that LLM performance is often subjective.
MCP Contributors (Official) â· #general (9 messagesđ„):
Image handling as tool input, MCP tool for image conversion to URL, Reddit thread on Code Execution with MCP
- Image Handling as Tool Input Explored: Members discussed using image URLs as tool input for MCP clients, where the tool would download the image from the provided URL.
- Someone inquired about handling images added in Claude/ChatGPT, specifically if they can be converted to a URL.
- MCP Tool Converts Images to URLs: To use images from Claude/ChatGPT, you need an MCP tool that converts the image to a URL by uploading it to an object storage service.
- The tool then returns the URL of the image, which can be used as input.
- Code Execution with MCP Reddit Thread Buzzes: A member pointed out a Reddit thread discussing the Code Execution with MCP blogpost.
- Another member hinted that a specific user might have more information about it.