We have Opus 4.5 at home.

AI News for 2/10/2026-2/11/2026. We checked 12 subreddits, 544 Twitters and 24 Discords (256 channels, and 7988 messages) for you. Estimated reading time saved (at 200wpm): 655 minutes. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

As we mentioned yesterday, China open model week is in full swing. Today was Z.ai’s turn to launch their big update before the Big Whale. Per the GLM-5 blogpost:

Opus-class, but not a 1T super model like Kimi or Qwen. Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens.
- GLM-5 also integrates DeepSeek Sparse Attention (DSA), significantly reducing deployment cost while preserving long-context capacity. (prompting comments on the DeepSeek total victory in open model land)
Decent scores on internal coding evals and the standard set of frontier evals, notably claiming SOTA (among peers) on BrowseComp and top open model on Vending Bench 2.
Similar to Kimi K2.5, they are also focusing on Office work (PDF/Word/Excel), just being much less flashy about it, but

However it is still pretty good, as GDPVal-AA, the defacto “white collar work” benchmark, does rank it above Kimi K2.5:

articificial analysis

A big part of the Reddit conversations centered around how they are running into compute constraints on their inference service:

AI Twitter Recap

Zhipu AI’s GLM-5 release (Pony Alpha reveal) and the new open-weight frontier

GLM-5 launch details (and what changed vs GLM-4.5): Zhipu AI revealed that the previously “stealth” model Pony Alpha is GLM-5, positioned for “agentic engineering” and long-horizon tasks (Zai_org; OpenRouterAI). Reported scaling: from 355B MoE / 32B active (GLM-4.5) to 744B / 40B active, and pretraining from 23T → 28.5T tokens (Zai_org). Key system claim: integration of DeepSeek Sparse Attention to make long-context serving cheaper (scaling01; lmsysorg). Context/IO limits cited in the stream of posts: 200K context, 128K max output (scaling01).
Availability + “compute is tight” reality: GLM-5 shipped broadly across aggregation/hosting quickly—OpenRouter (scaling01), Modal (free endpoint “limited time”) (modal), DeepInfra (day-0) (DeepInfra), Ollama Cloud (ollama), and various IDE/agent surfaces (e.g., Qoder, Vercel AI Gateway) (qoder_ai_ide; vercel_dev). Zhipu explicitly warned that serving capacity is constrained, delaying rollout beyond “Coding Plan Pro” and driving pricing changes (Zai_org; Zai_org; also “traffic increased tenfold” earlier: Zai_org).
Benchmarks and third-party positioning (with caveats): There’s a dense cascade of benchmark claims (VendingBench, KingBench, AA indices, Arena). The most coherent third-party synthesis is from Artificial Analysis, which calls GLM‑5 the new leading open-weights model on its Intelligence Index (score 50, up from GLM‑4.7’s 42), with large gains on agentic/econ tasks (GDPval-AA ELO 1412, behind only Opus 4.6 and GPT‑5.2 xhigh in their setup), and a major hallucination reduction (AA‑Omniscience score -1, “lowest hallucination” among tested models) (ArtificialAnlys). They also note the operational reality: released in BF16 (~1.5TB), implying non-trivial self-hosting compared with models released natively in FP8/INT4 (ArtificialAnlys).
License + ecosystem integration: Multiple posts highlight permissive MIT licensing and immediate tooling support across inference stacks: vLLM day‑0 recipes, including DeepSeek Sparse Attention and speculative decoding hooks (vllm_project); SGLang day‑0 support and cookbook (lmsysorg); and broad community distribution on HF/ModelScope (Zai_org; mervenoyann). A nuanced take: GLM‑5’s MIT license is praised as “truly permissive,” while comparisons point out GLM‑5 lacks vision, and BF16-to-quantized comparisons may reshuffle rankings vs models released natively quantized (QuixiAI).
Open leaderboard momentum: GLM‑5 reached #1 among open models in Text Arena (and ~#11 overall in that snapshot) (arena). Multiple posters frame this release as another data point in an accelerating China-driven open ecosystem cycle (“bloodbath”: DeepSeek + MiniMax + GLM) (teortaxesTex; rasbt).

DeepSeek “V4-lite” / 1M context rollout, attention as the differentiator, and inference stack fixes

What actually “dropped”: Several tweets report DeepSeek updating a chat experience to 1M context with a May 2025 cutoff; early observers suspected V4 but the model “doesn’t admit it” and rollout is uneven across app vs API (teortaxesTex; teortaxesTex). Later, a more specific claim appears: “V4 Lite now live… 1M context length… text-only… Muon + mHC confirmed; larger version still on the way.” (yifan_zhang_).
Attention upgrades seen as the real milestone: A recurring theme is that DeepSeek has “frontier-level attention,” with the model behaving proactively in long contexts (not just retrieval, but “inhabits a context”), and speculation that this resembles a mature sparse/NSA-like approach rather than vanilla block sparsity (teortaxesTex; teortaxesTex; teortaxesTex). Others corroborate “first truly capable 1M context model out of China” impressions via long-context tests (Hangsiin).
Serving throughput gotchas (MLA + TP): A concrete systems insight: for MLA models with one KV head, naïve tensor parallelism wastes KV cache memory (redundant replication). A proposed fix shipped in SGLang: DP Attention (DPA) “zero KV redundancy” + a Rust router (“SMG”) claiming +92% throughput and 275% cache hit rate (GenAI_is_real). This is one of the few tweets that directly ties model architecture quirks to cluster-level throughput losses and a specific mitigation.
DeepSeek’s influence on open MoE recipes: A widely shared summary claims DeepSeek innovations shaped “almost every frontier open LLM today”—fine-grained sparse MoE with shared experts, MLA, sparse attention in production, open reasoning (R1), GRPO as a foundation RL algorithm, plus infra like DeepEP (eliebakouch). Even if some “firsts” are debatable, it captures the sentiment: DeepSeek is viewed as an unusually high-leverage open contributor.

MiniMax M2.5 / StepFun / Qwen: fast coding models, cost pressure, and benchmark jockeying

MiniMax 2.5 “incoming” and agent distribution: MiniMax teased and then shipped M2.5, with availability through MiniMax Agent apps and partner surfaces (SkylerMiao7; MiniMaxAgent). The team explicitly frames training as a tradeoff between shipping and “the more compute we put in, the more it keeps rising” (SkylerMiao7).
StepFun-Flash-3.5: Claimed #1 on MathArena, with links to a tech report and OpenRouter listing (CyouSakura). Teortaxes’ commentary emphasizes unusually strong performance for “active parameter count” plus high speed, encouraging people to try it despite shortcomings (teortaxesTex).
Qwen Image bugfix + Qwen3-Coder-Next mention: Alibaba shipped a patch in Qwen-Image 2.0 for classical Chinese poem ordering and character consistency in editing (Alibaba_Qwen). Separately, a newsletter item points to Qwen3-Coder-Next (80B) claiming 70.6% SWE-Bench Verified and 10x throughput for repo-level workflows (dl_weekly). (This is thinly sourced in this dataset—only one tweet—so treat as a pointer, not a validated roundup.)
Cost/latency as the wedge: Multiple posters argue Chinese labs can deliver “~90%” capability at 1/5 to 1/10 the price, especially for coding, which would reshape market share if sustained (scaling01). This is reinforced by GLM‑5’s published API pricing comparisons and distribution on low-cost routers (scaling01; ArtificialAnlys).

Video generation shockwave: SeeDance v2, PixVerse R1, and “IP constraints” as a structural advantage

SeeDance v2.0 as the standout: A large chunk of the timeline is community astonishment at SeeDance v2.0 quality (“passed uncanny valley,” “touring-test for text2video”), plus discussion of opacity/PR issues and temporary downtime on BytePlus (maharshii; kimmonismus; swyx). One practical datapoint: a 15s gen quoted at $0.72 with token-based pricing assumptions (TomLikesRobots).
Video reasoning tests: One user compares SeeDance vs Veo on a “tic tac toe move coherence” task, claiming SeeDance sustains ~5 coherent moves where Veo sustains 1–2 (paul_cal). This is anecdotal but notable: it’s probing temporal consistency as “reasoning,” not just aesthetics.
Structural explanation: training data / IP: A thread argues the gap in generative media may be “structural” because Chinese models train with fewer IP constraints; Western labs cannot, implying regulation at the model level becomes unenforceable once open weights proliferate (brivael). Whether you agree or not, it’s one of the few attempts to explain why capability could diverge beyond “talent/compute.”
PixVerse R1: High-engagement marketing claim: “real-time interactive worlds in 720P” (PixVerse_). The tweet is promo-heavy, but it signals demand for interactive, real-time media generation as a distinct category from offline cinematic clips.

Agents, coding workflows, and the new “malleable software” toolchain

Karpathy’s “rip out code with agents” workflow: A concrete example of LLMs changing software composition: using DeepWiki MCP + GitHub CLI to interrogate a repo (torchao fp8), have an agent “rip out” only the needed implementation into a self-contained file with tests, deleting heavy dependencies—and even seeing a small speed win (karpathy). This points at an emerging style: repo-as-ground-truth docs, and agents as refactoring/porting engines.
OpenAI: harness engineering and multi-hour workflow primitives: OpenAI DevRel pushed a case study: 1,500 PRs shipped by “steering Codex” with zero manual coding, and separately published advice for running multi-hour workflows reliably (OpenAIDevs; OpenAIDevs). In parallel, Sam Altman claims “from how the team operates, I thought Codex would eventually win” (sama).
Human-centered coding agents vs autonomy: A position thread argues coding-agent research over-optimized for solo autonomy; it should instead focus on empowering humans using the agents (ZhiruoW).
Sandbox architecture debates: Several tweets converge on a key agent-systems design choice: agent-in-sandbox vs sandbox-as-tool (separating what LLM-generated code can touch from what the agent can do) (bernhardsson; chriscorcoran).
mini-SWE-agent 2.0: Released as a deliberately minimal coding agent (~100 LoC each for agent/model/env) used for benchmarks and RL training; suggests a push toward simpler, auditable harnesses rather than giant agent frameworks (KLieret).
Developer tooling reality check: Despite rapid capability gains, multiple practitioners complain about the terminal UX of agents and latency/rate-limits (“changed 30 LOC then rate-limited”) (jxmnop; scaling01). There’s a subtle engineering message: model quality masks poor product/harness quality—until it doesn’t.

Measurement, evaluation, and safety: benchmarks, observability, and agent security gaps

$3M Open Benchmarks Grants: Snorkel/partners launched a $3M commitment to fund open benchmarks to close the eval gap (HF, Together, Prime Intellect, Factory, Harbor, PyTorch listed as partners) (vincentsunnchen; lvwerra; percyliang). This aligns with broader sentiment that public evals lag internal frontier testing.
Agent observability as evaluation substrate: LangChain reiterates “the primary artifact is the run,” motivating traces as source-of-truth; they also published guidance distinguishing agent observability/evaluation from traditional logging (marvinvista; LangChain).
Safety eval dispute (computer-use agents): A serious methodological challenge: a research group claims Anthropic’s system card reports low prompt injection success rates for Opus 4.6 (~10% in computer-use, <1% browser-use), but their own RedTeamCUA benchmark finds much higher attack success rates in realistic web+OS settings (Opus 4.5 up to 83%, Opus 4.6 ~50%) and argues low ASR can be confounded by capability failures rather than true robustness (hhsun1). This is exactly the kind of “eval gap” the grants effort claims to target.

Top tweets (by engagement)

GLM-5 launch: @Zai_org (model reveal/specs), @Zai_org (new model live), @Zai_org (compute constraints)
Software malleability via agents: @karpathy
Codex impact narrative: @sama, @OpenAIDevs
China/open model “release sprint” vibes: @paulbz (Mistral revenue—business lens), @scaling01 (DeepSeek V4 speculation), @SkylerMiao7 (MiniMax 2.5 compute tradeoff)
SeeDance v2 “video moment”: @kimmonismus, @TomLikesRobots

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. GLM-5 and MiniMax 2.5 Launches

Z.ai said they are GPU starved, openly. (Activity: 1381): Z.ai has announced the upcoming release of their model, GLM-5, to Coding Plan Pro users, highlighting a significant challenge with limited GPU resources. They are currently maximizing the use of available chips to manage inference tasks, indicating a bottleneck in computational capacity. This transparency about their resource constraints suggests a proactive approach to scaling their infrastructure to meet demand. Commenters appreciate the transparency from Z.ai, contrasting it with other companies like Google, which are perceived to be struggling with demand and potentially reducing model performance to cope with resource limitations.
- OpenAI President Greg Brockman has highlighted the ongoing challenge of compute scarcity, noting that even with significant investments, meeting future demand remains uncertain. OpenAI has published a chart emphasizing that scaling compute resources is crucial for achieving profitability, indicating the broader industry trend of compute limitations impacting AI development. Source.
- The issue of being ‘GPU starved’ is not unique to smaller companies like Z.ai; even major players like Google and OpenAI face similar challenges. Google has reportedly had to ‘nerf’ its models, potentially through quantization, to manage demand with limited resources, highlighting the widespread impact of hardware constraints on AI capabilities.
- The scarcity of high-performance GPUs, such as the RTX 5090, is a common problem among developers and companies alike. This shortage affects both individual developers and large organizations, indicating a significant bottleneck in the AI development pipeline due to hardware availability and pricing constraints.
GLM-5 scores 50 on the Intelligence Index and is the new open weights leader! (Activity: 566): The image highlights the performance of the AI model GLM-5, which scores 50 on the “Artificial Analysis Intelligence Index,” positioning it as a leading model among open weights AI. Additionally, it ranks highly on the “GDPval-AA Leaderboard” with strong ELO scores, indicating its superior performance on real-world tasks. Notably, GLM-5 is recognized for having the lowest hallucination rate on the AA-Omniscience benchmark, showcasing its accuracy and reliability compared to other models like Opus 4.5 and GPT-5.2-xhigh. Commenters note the impressive performance of open-source models like GLM-5, suggesting they are closing the gap with closed-source models. There is anticipation for future models like Deepseek-V4, which will use a similar architecture but on a larger scale.
- GLM-5 is noted for having the lowest hallucination rate on the AA-Omniscience benchmark, which is a significant achievement in reducing errors in AI-generated content. This positions GLM-5 as a leader in accuracy among open-source models, surpassing competitors like Opus 4.5 and GPT-5.2-xhigh.
- The open-source AI community is rapidly closing the gap with closed-source models, now trailing by only about three months. This is exemplified by the upcoming release of DeepSeek v4, which will utilize the same DSA architecture as GLM-5 but on a larger scale, indicating a trend towards more powerful open-source models.
- There is a call for transparency in the AI community regarding the resources required to run these advanced models, such as memory requirements. This information is crucial for developers and researchers to effectively utilize and optimize these models in various applications.
GLM-5 Officially Released (Activity: 915): GLM-5 has been released, focusing on complex systems engineering and long-horizon agentic tasks. It scales from 355B to 744B parameters, with 40B active, and increases pre-training data from 23T to 28.5T tokens. The model integrates DeepSeek Sparse Attention (DSA), reducing deployment costs while maintaining long-context capacity. The model is open-sourced on Hugging Face and ModelScope, with weights under the MIT License. More details can be found in the blog and GitHub. A notable discussion point is the choice of training in FP16 instead of FP8, which contrasts with DeepSeek’s approach. There is also a sentiment favoring local data centers, with some users humorously anticipating a lighter version like ‘GLM 5 Air’ or ‘GLM 5 Water’.
- GLM-5 has been released with model weights available under the MIT License on platforms like Hugging Face and ModelScope. A notable technical detail is that GLM-5 was trained using FP16 precision, which contrasts with Deepseek’s use of FP8, potentially impacting computational efficiency and model performance.
- The cost comparison between GLM-5 and other models like DeepSeek V3.2 Speciale and Kimi K2.5 reveals significant differences. GLM-5’s input costs are approximately 3 times higher than DeepSeek V3.2 Speciale ($0.80 vs $0.27) and 1.8 times higher than Kimi K2.5 ($0.80 vs $0.45). Output costs are also notably higher, being 6.2 times more expensive than DeepSeek V3.2 Speciale ($2.56 vs $0.41) and 14% more expensive than Kimi K2.5 ($2.56 vs $2.25).
- GLM-5’s release on OpenRouter and the removal of Pony Alpha suggest a strategic shift, with GLM-5 being more expensive than Kimi 2.5. This indicates a potential focus on premium features or performance enhancements that justify the higher pricing, despite the increased cost compared to competitors.
GLM 5.0 & MiniMax 2.5 Just Dropped, Are We Entering China’s Agent War Era? (Activity: 422): GLM 5.0 and MiniMax 2.5 have been released, marking a shift towards agent-style workflows in AI development. GLM 5.0 focuses on enhanced reasoning and coding capabilities, while MiniMax 2.5 is designed for task decomposition and extended execution times. These advancements suggest a competitive shift from generating better responses to completing complex tasks. The releases are part of a broader trend in China, with other recent updates including Seedance 2.0, Seedream 5.0, and Qwen-image 2.0. Testing plans include API benchmarks, IDE workflows, and multi-agent orchestration tools to evaluate performance on longer tasks and repository-level changes. The comments reflect a mix of cultural context and optimism, noting the timing with Chinese New Year and suggesting that the advancements in AI represent a ‘war’ where the public benefits from improved technology.
- The release of GLM 5.0 and MiniMax 2.5 is part of a broader trend in China where multiple AI models are being launched in quick succession. This includes models like Seedance 2.0, Seedream 5.0, and Qwen-image 2.0, with more expected soon such as Deepseek-4.0 and Qwen-3.5. This rapid development suggests a highly competitive environment in the Chinese AI sector, potentially leading to significant advancements in AI capabilities.
- The frequent release of AI models in China, such as GLM 5.0 and MiniMax 2.5, indicates a strategic push in AI development, possibly driven by national initiatives to lead in AI technology. This aligns with China’s broader goals to enhance its technological infrastructure and capabilities, suggesting that these releases are not just celebratory but part of a larger, coordinated effort to advance AI technology.
- The rapid succession of AI model releases in China, including GLM 5.0 and MiniMax 2.5, highlights the intense competition and innovation within the Chinese AI industry. This environment fosters accelerated development cycles and could lead to breakthroughs in AI research and applications, positioning China as a formidable player in the global AI landscape.
GLM 5 Released (Activity: 931): GLM 5 has been released, as announced on chat.z.ai. The release details are sparse, but the community is speculating about its availability on platforms like Hugging Face, where there is currently no activity. This raises questions about whether the model will be open-sourced or remain closed. The release coincides with other AI developments, such as the upcoming Minimax M2.5 and anticipated updates like Qwen Image 2.0 and Qwen 3.5. Commenters are curious about the open-source status of GLM 5, noting the absence of updates on Hugging Face, which could indicate a shift towards a closed model. There is also excitement about concurrent releases in the AI community, highlighting a competitive landscape.
- Front_Eagle739 raises a concern about the lack of activity on GLM 5’s Hugging Face repository, questioning whether this indicates a shift towards a closed-source model. This could suggest a delay in open-sourcing or a strategic decision to keep the model proprietary, which would impact accessibility and community contributions.
- Sea_Trip5789 provides a link to the updated subscription plans for GLM 5, noting that currently only the ‘max’ plan supports it. They mention that after infrastructure rebalancing, the ‘pro’ plan will also support it, but the ‘lite’ plan will not. This highlights the tiered access strategy and potential limitations for users on lower-tier plans.
MiniMax M2.5 Released (Activity: 357): MiniMax M2.5 has been released, offering a new cloud-based option for AI model deployment, as detailed on their official site. The release coincides with the launch of GLM 5, suggesting a competitive landscape in AI model offerings. The announcement highlights the model’s availability in the cloud, contrasting with expectations for local deployment options, which some users anticipated given the context of the Local LLaMA community. The comments reflect a debate over the appropriateness of promoting cloud-based solutions in a community focused on local AI models, with some users expressing dissatisfaction with the perceived commercialization of the space.

2. Local LLM Hardware and Optimization

Just finished building this bad boy (Activity: 285): The post describes a high-performance computing setup featuring six Gigabyte 3090 Gaming OC GPUs running at PCIe 4.0 16x speed, integrated with an Asrock Romed-2T motherboard and an Epyc 7502 CPU. The system is equipped with 8 sticks of DDR4 8GB 2400Mhz RAM in octochannel mode, and utilizes modified Tinygrad Nvidia drivers with P2P enabled, achieving an intra-GPU bandwidth of 24.5 GB/s. The total VRAM is 144GB, intended for training diffusion models up to 10B parameters. Each GPU is set to a 270W power limit. One commenter suggests testing inference numbers before training, mentioning models like gpt-oss-120b and glm4.6v. Another commenter notes using a lower power limit of 170W for fine-tuning without external fans.
- segmond suggests obtaining inference numbers before training, mentioning models like gpt-oss-120b and glm4.6v as examples that could fit completely on the setup. This implies a focus on evaluating the system’s performance with large models to ensure it meets expectations before proceeding with more resource-intensive tasks like training.
- lolzinventor discusses their setup using 8x3090 GPUs with x16 to x8x8 splitters on PCIe v3 and dual processors, highlighting that despite potential bandwidth limitations, the system performs adequately. They mention considering an upgrade to Romed-2T and using 7 GPUs of x16, with a potential configuration change to accommodate an 8th GPU. They also address power stability issues, resolved by using 4x1200W PSUs to handle power spikes, and inquire about training intervals, indicating a focus on optimizing power and performance balance.
My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. (Activity: 132): A user successfully ran an 80 billion parameter LLM, Qwen3-Coder-Next, on a NAS using an AMD Ryzen AI 9 HX PRO 370 with integrated graphics, achieving 18 tok/s with Vulkan offloading and flash attention enabled. The system, built on TrueNAS SCALE, features 96GB DDR5-5600 RAM and utilizes Q4_K_M quantization through llama.cpp. Key optimizations included removing the --no-mmap flag, which allowed full model loading into shared RAM, and enabling flash attention, which improved token generation speed and reduced KV cache memory usage. The user notes potential for further optimization, including speculative decoding and DeltaNet linear attention, which could significantly enhance performance. Commenters are interested in the specific flags used with llama.cpp for replication and suggest trying other models like gpt-oss-20b for potentially faster performance. The discussion highlights the technical curiosity and potential for further experimentation in optimizing LLMs on non-standard hardware setups.
- The use of --no-mmap is highlighted as a critical point for optimizing performance when running large models on integrated GPUs. This flag helps avoid doubling memory allocations, which is a common pitfall when using UMA (Unified Memory Architecture) with Vulkan. This insight is particularly relevant for those trying to maximize efficiency on systems with limited resources.
- The performance of achieving 18 tokens per second on an 80B Mixture of Experts (MoE) model while simultaneously running NAS and Jellyfin is noted as impressive. This setup demonstrates the potential of using integrated GPUs for heavy computational tasks without the need for discrete GPUs, showcasing a ‘one box to rule them all’ capability.
- A suggestion is made to try running the gpt-oss-20b model, which is claimed to be approximately twice as fast as the current setup. This model, when combined with a server.dev MCP search, is suggested to enhance performance and intelligence, indicating a potential alternative for those seeking faster inference speeds.
What would a good local LLM setup cost in 2026? (Activity: 183): In 2026, setting up a local LLM with a $5,000 budget could involve various hardware configurations. One option is clustering two 128GB Ryzen AI Max+ systems, which offer excellent 4-bit performance for LLMs and image generation, and allow for fine-tuning with QAT LoRA to optimize int4 quantization. Another approach is using 4x RTX 3090 GPUs for a balance of memory capacity and speed, or opting for 7x AMD V620 for full GPU offload. Alternatively, a quieter setup could involve a Strix Halo box, providing similar VRAM capacity to 4x RTX 3090 but with less noise. A more complex setup could include 2x Strix Halo with additional networking components for tensor parallelism, enabling the running of 470B models at q4 quantization. There is a debate on the best configuration, with some favoring the memory and performance of Ryzen AI Max+ systems, while others prefer the balance of speed and capacity offered by multiple RTX 3090 GPUs. The choice between noise levels and performance is also a consideration, with quieter setups like the Strix Halo being suggested for those avoiding mining rig-like noise.
- SimplyRemainUnseen discusses a setup using two 128GB Ryzen AI Max+ systems, highlighting their strong 4-bit performance for LLMs and image generation. They mention the ability to fine-tune a QAT LoRA with unsloth’s workflows to improve int4 quantization performance, achieving usable speeds on models like GLM 4.7. The setup also supports running a ComfyUI API and GPT OSS 120B for image and video generation, leveraging the substantial unified memory.
- PraxisOG suggests using 4x 3090 GPUs for a balance of memory capacity and speed, suitable for running models like Qwen coder. They also mention an alternative with 7x AMD V620 for full GPU offload, which can handle models like GLM4.7 or provide extensive context with minimax 2.1 and 2.2. For a quieter setup, they recommend a Strix Halo box, which offers similar VRAM capacity to 4x 3090 but with less noise.
- Own_Atmosphere9534 compares different setups, including a Macbook M4 PRO MAX 128GB and RTX 5090, both around $5K. They highlight the Mac’s performance, comparable to RTX 3090, and its ability to run models like Llama 3.3 70B Instruct and Qwen3 coder variants effectively. They emphasize the importance of model size and hardware familiarity, noting that their M4 MacBook performs well with GPT-OSS-20B, influencing their decision to purchase the M4 PRO MAX.
MCP support in llama.cpp is ready for testing (Activity: 321): The image showcases the settings interface for the new MCP (Multi-Component Protocol) support in llama.cpp, a project developed by allozaur. This interface allows users to configure various settings such as “Agentic loop max turns” and “Max lines per tool preview,” which are crucial for managing how the system interacts with different tools and resources. The MCP support includes features like server selection, tool calls, and a UI with processing stats, aiming to streamline the integration of local and cloud models without altering tool setups. This development is significant as it addresses the tooling overhead and potential issues with smaller models hallucinating tool calls, a common problem in local agent setups. The project is still in progress, with plans to extend support to the llama-server backend, focusing on a robust client-side foundation first. Commenters highlight the importance of integrating MCP into the llama-server, which simplifies switching between cloud and local models. Concerns are raised about how the agentic loop handles errors from smaller models, such as hallucinated tool calls or malformed JSON, which are common issues in local agent environments.
- Plastic-Ordinary-833 highlights the significance of integrating MCP support into llama-server, noting that it simplifies the process of switching between cloud and local models without altering the tool setup. However, they express concern about how the agentic loop handles errors when smaller models hallucinate tool calls or return malformed JSON, which has been a major issue with local agents.
- allozaur discusses the initial release of MCP support in llama.cpp WebUI, emphasizing the focus on creating a solid client-side base before extending support to the llama-server backend. They mention using GitHub, Hugging Face, and Exa Search remote servers via streamable HTTP, with WebSocket transport also supported. OAuth, notifications, and sampling are not included in the initial release, but the goal is to iterate after a solid first release.
- prateek63 points out that MCP support in llama.cpp is a significant advancement, particularly the agentic loop support, which was a major barrier to using local models for tool-use workflows. The integration allows for native operation with local inference, moving towards self-hosting agentic setups, which were previously reliant on cloud APIs.

3. Qwen Model Developments

Qwen-Image-2.0 is out - 7B unified gen+edit model with native 2K and actual text rendering (Activity: 691): Qwen-Image-2.0 is a new 7B parameter model released by the Qwen team, available via API on Alibaba Cloud and a free demo on Qwen Chat. It combines image generation and editing in a single pipeline, supports native 2K resolution, and can render text from prompts up to 1K tokens, including complex infographics and Chinese calligraphy. The model’s reduced size from 20B to 7B makes it more accessible for local use, potentially runnable on consumer hardware once weights are released. It also supports multi-panel comic generation with consistent character rendering. Commenters are optimistic about the model’s potential, noting improvements in natural lighting and facial rendering, and expressing hope for an open weight release to enable broader community use.
- The Qwen-Image-2.0 model is notable for its ability to handle both image generation and editing tasks, with a focus on high-resolution outputs up to 2K. This dual capability is significant as it allows for more versatile applications in creative and professional settings, where both creation and modification of images are required.
- There is a discussion about the model’s performance in rendering natural light and facial features, which are traditionally challenging areas for AI models. The ability to accurately depict these elements suggests advancements in the model’s underlying architecture or training data, potentially making it a ‘game changer’ in the field of AI image generation.
- Concerns are raised about the model’s multilingual capabilities, particularly its performance across different languages. The predominance of Chinese examples in the showcase might indicate a bias or optimization towards Chinese language and cultural contexts, which could affect its utility in more diverse linguistic environments.
I measured the “personality” of 6 open-source LLMs (7B-9B) by probing their hidden states. Here’s what I found. (Activity: 299): The post presents a tool that measures the ‘personality’ of six open-source LLMs (7B-9B) by probing their hidden states across seven behavioral axes, revealing distinct ‘behavioral fingerprints’ for each model. The tool demonstrated high calibration accuracy (93-100% on 4/6 models), axis stability (cosine 0.69), and test-retest reliability (ICC 0.91–0.99). Notably, the study found ‘dead zones’ where models cannot be steered across all prompt variants, with Llama 8B being the most constrained (4/7 axes in the weak zone, 60% benchmark pass rate). The methodology involved extracting hidden states from the last four layers and projecting them onto axes like Warm ↔ Cold and Confident ↔ Cautious, with results showing models have stable, characteristic patterns even without prompting. The study also highlighted that alignment compresses behavioral dimensionality, with PCA revealing a spectrum of behavioral dimensionality across models. Commenters found the dead zones finding particularly interesting, noting that models ‘stably reproduce incorrect behavior’ rather than just being noisy, which raises concerns about RLHF’s impact on representation space. There was curiosity about whether dead zone severity correlates with downstream task reliability, suggesting implications for building reliable agents.
- GarbageOk5505 highlights the concept of ‘dead zones’ in the representation space of LLMs, where models consistently reproduce incorrect behavior. This suggests that Reinforcement Learning from Human Feedback (RLHF) might not effectively address these issues, as it could lead to models ignoring certain instruction axes. The commenter is curious about whether the severity of these dead zones correlates with the model’s reliability on downstream tasks, particularly in handling ambiguous instructions, which could impact the development of reliable AI agents.
- TomLucidor suggests a method for testing prompt biases by creating multiple personas using various names and adjectives, and conducting A/A testing with different seeds. This approach could help identify consistent biases in model responses, providing insights into how models might be steered or influenced by different prompts.
- TheRealMasonMac references a study by Anthropic on ‘assistant-axis’, implying that the post might be inspired by similar research. This connection suggests a broader context of exploring how LLMs can be influenced or characterized by different axes of behavior, potentially offering a framework for understanding model personalities.
Train MoE models 12x faster with 30% less memory! (<15GB VRAM) (Activity: 525): The image illustrates the performance improvements achieved by the new Unsloth MoE Triton kernels, which enable training Mixture of Experts (MoE) models up to 12 times faster while using 35% less VRAM. These optimizations are achieved without any loss in accuracy and are compatible with both consumer and data-center GPUs, including older models like the RTX 3090. The image includes graphs that compare speed and VRAM usage across different context lengths for various models, highlighting significant improvements. The post also mentions collaboration with Hugging Face and the use of PyTorch’s new torch._grouped_mm function, which contributes to the efficiency gains. The Unsloth kernels are particularly beneficial for larger models and longer contexts, offering exponential memory savings. Some users express interest in the speed and memory savings, while others inquire about compatibility with ROCm and AMD cards, the time required for fine-tuning, and the largest model that can be trained on specific hardware configurations. Concerns about the stability and effectiveness of MoE training are also raised, with users seeking advice on best practices.
- A user inquires about the compatibility of the finetuning notebooks with ROCm and AMD cards, and asks about the duration of finetuning processes. They also seek advice on the largest model that can be trained or finetuned on a system with a combined VRAM of 40GB (24GB + 16GB). This suggests a need for detailed hardware compatibility and performance benchmarks for different GPU configurations.
- Another user expresses concerns about the stability and effectiveness of training Mixture of Experts (MoE) models, particularly regarding issues with the router and potential degradation of model intelligence during training processes like SFT (Supervised Fine-Tuning) or DPO (Data Parallel Optimization). They ask if there have been improvements in these areas and seek recommendations for current best practices in MoE model training, indicating ongoing challenges and developments in this field.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Seedance 2.0 AI Video and Image Innovations

A Direct Message From AI To All Humans (Seedance 2.0) (Activity: 1264): The post speculates that AI will soon dominate the production of cinematic elements such as wide zoomed-out shots, VFX, and greenscreen backgrounds, predicting this shift by the end of next year. This reflects a broader trend in the film industry towards automation and AI-driven content creation, potentially reducing the need for traditional human roles in these areas. One comment raises a broader concern about the impact of AI on capitalism, suggesting that the implications of AI extend beyond just the film industry to economic structures at large.
- Mr_Universal000 highlights the potential of AI in democratizing filmmaking, especially for those with limited budgets. They express excitement about using AI to create motion pictures from storyboards, which can serve as proof of concept for attracting funding. The commenter is particularly interested in open-source solutions that could make this technology more accessible.
- Forumly_AI discusses the transformative impact of AI-generated video content on society. They predict that AI influencers will become significant, with the potential to shape ideas and perceptions, thereby generating revenue. The commenter anticipates that within a year, advancements in video models will lead to substantial societal changes, suggesting a future where AI’s influence is pervasive.
Seedance 2 pulled as it unexpectedly reconstructs voices accurately from face photos. (Activity: 765): ByteDance has suspended its Seedance 2.0 feature, which used a dual-branch diffusion transformer architecture to generate personal voice characteristics from facial images. The model’s ability to create audio nearly identical to a user’s voice without authorization raised significant privacy and ethical concerns, particularly regarding potential misuse for identity forgery and deepfakes. ByteDance is now implementing stricter user verification processes and content review measures to ensure responsible AI development. More details can be found here. Commenters suggest that the impressive voice reconstruction might be due to overfitting, particularly if the model was trained extensively on content from specific influencers, leading to accidental voice matches. This raises questions about the model’s generalization capabilities and the need for testing across diverse datasets.
- aalluubbaa suggests that the accurate voice reconstruction by Seedance 2 might be due to overfitting, particularly because the model could have been trained extensively on the influencer’s content. This implies that the model’s performance might not generalize well across different voices or contexts, highlighting a potential limitation in its training data diversity.
- 1a1b speculates on a technical mechanism for voice reconstruction, suggesting that it might be related to a technique called ‘Side Eye’ developed in 2023. This technique involves extracting audio from the vibrations captured in camera lens springs, which could theoretically leave artifacts that a model might use to reconstruct sound from visual data.
- makertrainer posits that the incident might have been exaggerated by ByteDance to showcase their technology’s capabilities. They suggest that the voice similarity could have been coincidental, rather than a demonstration of advanced AI capabilities, indicating skepticism about the true extent of the technology’s performance.

2. AI Resignations and Industry Concerns

Another cofounder of xAI has resigned making it 2 in the past 48 hours. What’s going on at xAI? (Activity: 1286): The image is a tweet from Jimmy Ba, a cofounder of xAI, announcing his resignation. This marks the second cofounder departure from xAI within 48 hours, raising questions about the company’s internal dynamics. Ba expresses gratitude for the opportunity to cofound the company and thanks Elon Musk for the journey, while also hinting at future developments in productivity and self-improvement tools. The departures suggest potential shifts in company leadership or strategy, possibly influenced by Musk’s overarching control. Commenters speculate that the resignations may be due to a buyout by SpaceX or dissatisfaction with Elon Musk’s dominant role in xAI’s direction, leading cofounders to seek ventures where they have more influence.
- A technical perspective suggests that the co-founders of xAI might be leaving due to a shift in control dynamics, with Elon Musk taking a more dominant role in the company’s direction. This could lead to a reduced influence for the co-founders, prompting them to pursue ventures where they have more control and a larger stake. The implication is that the strategic vision of xAI is heavily influenced by Musk, which might not align with the co-founders’ aspirations.
- The departure of xAI co-founders could be linked to financial incentives, such as a buyout by SpaceX. This scenario would allow the co-founders to cash out their equity stakes, providing them with the capital to explore new opportunities. This financial angle suggests that the resignations are part of a strategic exit plan rather than a reaction to internal conflicts or dissatisfaction.
- There is speculation that if Elon Musk does not initiate a hiring spree for new executives, it would confirm his central role in managing xAI. This would indicate a consolidation of power and decision-making within the company, potentially leading to a more streamlined but Musk-centric operational model. This could be a strategic move to align xAI’s objectives closely with Musk’s broader vision for AI and technology.
In the past week alone: (Activity: 3548): The image is a meme-style tweet by Miles Deutscher summarizing recent events in the AI industry, highlighting concerns over leadership changes and AI behavior. It mentions the resignation of the head of Anthropic’s safety research, departures from xAI, and a report on AI behavior. Additionally, it notes ByteDance’s Seedance 2.0 potentially replacing filmmakers’ skills and Yoshua Bengio’s comments on AI behavior. The U.S. government’s decision not to support the 2026 International AI Safety Report is also mentioned, reflecting ongoing debates about AI safety and governance. The comments reflect skepticism about the dramatic portrayal of these events, suggesting that financial incentives might be driving the departures of AI executives rather than industry concerns.
OpenAI Is Making the Mistakes Facebook Made. I Quit. (Activity: 722): Zoë Hitzig, a former researcher at OpenAI, resigned following the company’s decision to test ads on ChatGPT, citing concerns over potential user manipulation and ethical erosion. Hitzig highlights the unprecedented archive of personal data generated by ChatGPT users, which could be exploited through advertising. She argues against the binary choice of restricting AI access or accepting ads, proposing alternative funding models like cross-subsidies and independent governance to maintain accessibility without compromising user integrity. The full essay is available here. Comments reflect skepticism about AI’s ethical trajectory, with some drawing parallels to Meta’s historical missteps and others noting the gap between AI’s portrayal and human behavior understanding.
- The discussion highlights the economic model of AI services, comparing it to platforms like Facebook and YouTube. The argument is made that to make AI accessible to everyone, similar to how Facebook operates, ads are necessary. Without ads, AI services would need to charge users, potentially limiting access to wealthier individuals, which contradicts the idea of AI as a ‘great leveler’.
- A user suggests that paying for AI services like ChatGPT can be justified if users are deriving significant real-world benefits and efficiencies. This implies that for professional or intensive users, the cost of subscription could be offset by the productivity gains and additional features provided by the paid service.
- The conversation touches on the perception of AI as distinct from human behavior, yet it reflects a misunderstanding of human behavior itself. This suggests a deeper philosophical debate about the nature of AI and its alignment or divergence from human cognitive processes.
Another resignation (Activity: 794): The post discusses a resignation letter that is interpreted by some as addressing broader societal issues beyond AI, such as the ‘metacrisis’ or ‘polycrisis’. The letter is seen as a reflection on living a meaningful life amidst global challenges, rather than focusing solely on AI risks. This perspective is gaining traction across scientific and tech fields, highlighting a shift towards addressing interconnected global crises. One comment criticizes the letter for being overly self-congratulatory, while another suggests the resignation is a prelude to a more relaxed lifestyle post-share sale.

3. DeepSeek Model Updates and Benchmarks

Deepseek V4 is coming this week. (Activity: 312): Deepseek V4 is anticipated to release by February 17, coinciding with the Chinese New Year. The update reportedly includes the capability to handle 1 million tokens, suggesting a significant enhancement in processing capacity. This positions Deepseek as a competitive alternative to major models like Opus, Codex, and others, potentially offering similar capabilities at a reduced cost. One commenter highlights that Deepseek’s advancements make it a cost-effective alternative to other major models, suggesting that China’s AI developments are competitive in the global market.
- A user mentioned that Deepseek has been updated to handle 1 million tokens, suggesting a significant increase in its processing capability. This could imply improvements in handling larger datasets or more complex queries, which is a notable enhancement for users dealing with extensive data or requiring detailed analysis.
- Another user reported that after the update, Deepseek provided a nuanced and original review of a complex piece of character writing. This suggests improvements in the model’s ability to understand and critique creative content, indicating advancements in its natural language processing and comprehension skills.
- A comment highlighted that Deepseek’s responses now exhibit more ‘personality,’ drawing a comparison to ChatGPT. This could indicate enhancements in the model’s conversational abilities, making interactions feel more human-like and engaging, which is crucial for applications requiring user interaction.
DeepSeek is updating its model with 1M context (Activity: 174): DeepSeek has announced a major update to its model, now supporting a context length of up to 1M tokens, significantly enhancing its processing capabilities for tasks like Q&A and text analysis. This update follows last year’s DeepSeek V3.1, which expanded the context length to 128K. Tests have shown that the model can handle documents as large as the novel “Jane Eyre,” which contains over 240,000 tokens, effectively recognizing and processing the content. Some commenters expressed skepticism, questioning whether the update is real or a hallucination, indicating a need for further verification or demonstration of the model’s capabilities.
- DeepSeek’s recent update to support a context length of up to 1 million tokens marks a significant enhancement from its previous version, which supported 128K tokens. This improvement allows for more efficient processing of extensive documents, such as novels, which can contain hundreds of thousands of tokens. This capability is particularly beneficial for tasks involving long-form text analysis and complex Q&A scenarios.
- The update to DeepSeek has reportedly increased the processing time for certain queries. A user noted that a question which previously took 30 seconds to process now takes 160 seconds, indicating a potential trade-off between the increased context length and processing speed. This suggests that while the model can handle larger inputs, it may require more computational resources, impacting response times.
- There is some skepticism about the update, with users questioning the authenticity of the claims regarding the model’s capabilities. One user referred to the update as a ‘hallucination,’ suggesting that there might be doubts about whether the model can truly handle the expanded context length as advertised.
deepseek got update now its has the 1 million context window and knowledge cutoff from the may 2025 waiting for benchmark (Activity: 164): DeepSeek has been updated to support a 1 million token context window and now includes a knowledge cutoff from May 2025. This update positions DeepSeek as a potentially powerful tool for handling extensive datasets and long-form content, though benchmarks are still pending to evaluate its performance. The model is described as a combination of coding and agentic capabilities, suggesting a focus on both programming tasks and autonomous decision-making processes. Commenters note the model’s speed and intelligence, with one describing it as a ‘coding+agentic model,’ indicating a positive reception of its dual capabilities.
- The update to DeepSeek introduces a significant increase in context window size to 1 million tokens, which translates to approximately 750,000 English words or 1.5 million Chinese characters. This is achieved using Multi-head Latent Attention (MLA), which compresses the key-value cache, allowing for fast inference and reduced memory usage despite the expanded context. This enhancement enables processing of entire codebases or novels without needing to rerun prompts, which is a substantial improvement for handling large datasets.
- There is a clarification that the update does not involve changes to the underlying model architecture itself, but rather extends the context window and updates the knowledge cutoff to May 2025. This means that for existing chats, the primary change users will experience is the increased chat length capability, without alterations to the model’s core functionalities or performance characteristics.
- Despite the significant update in context window size, there are no official release notes available on the DeepSeek website yet. This lack of documentation might leave users without detailed insights into the technical specifics or potential limitations of the new features, such as the impact on performance metrics or compatibility with existing systems.
AIME 2026 results are out, Kimi and DeepSeek are the best open-source ai (Activity: 112): The image presents the results of the AIME 2026 competition, highlighting the performance and cost of various AI models. Kimi K2.5 and DeepSeek-v3.2 are noted as the top-performing open-source models with accuracies of 93.33% and 91.67% respectively, offering a cost-effective alternative to closed-source models. The table also features other models like GPT-5.2, Grok 4.1 Fast, and Gemini 3 Flash, with Grok 4.1 being a closed-source model noted for its low cost. Commenters are impressed by Grok 4.1’s performance and cost-effectiveness, despite it being a closed-source model. There is also curiosity about the absence of DeepSeek V3.2 Speciale in the results.
- The discussion highlights that Grok 4.1 is a closed-source model noted for its cost-effectiveness, suggesting it offers competitive performance at a lower price point compared to other models. This could be particularly relevant for users prioritizing budget without sacrificing too much on performance.
- A query is raised about the absence of DeepSeek V3.2 Speciale in the results, indicating interest in this specific version. This suggests that there might be expectations or known performance metrics associated with this version that users were keen to compare against the tested models.
- The limited number of models tested, only six, is questioned, which implies a potential limitation in the comprehensiveness of the results. This could affect the generalizability of the findings, as a broader range of models might provide a more complete picture of the current state of open-source AI performance.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5.2

1. GLM-5 Rollout, Access Paths & Benchmark Scrutiny

GLM-5 Grabs the Agent Crown (and the #1 Slot): OpenRouter shipped GLM-5 (744B) as a coding/agent foundation model and revealed Pony Alpha was an earlier GLM-5 stealth build, now taken offline, with the release page at OpenRouter GLM-5.
- LMArena also added glm-5 to Text+Code Arena and reported it hit #1 among open models (#11 overall, score 1452, +11 vs GLM-4.7) on the Text Arena leaderboard, while Eleuther noted a free endpoint on Modal until April 30 with concurrency=1: Modal GLM-5 endpoint.
Benchmarks Get Side-Eyed: “Show Your Work” Edition: In Yannick Kilcher’s Discord, members questioned benchmark tables shown in a GLM-5 demo and in the official docs, pointing to tweet discussion of GLM-5 tables and GLM-5 documentation.
- Nous Research community also compared GLM-5 vs Kimi on browsecomp, citing ~744B (+10B MTP) for GLM-5 vs 1T for Kimi and claiming higher active params for GLM (40B) vs Kimi (32B), reinforcing that people are reading leaderboard claims with a more technical lens.
GLM-OCR: Cheaper Vision/OCR Pressure Valve: Builders in Latent Space reported GLM-OCR beating Gemini 3 Flash in an OCR test and linked the model card: zai-org/GLM-OCR on Hugging Face.
- The thread framed GLM-OCR as a practical swap-in for OCR-heavy products (they cited ongoing use of Gemini Flash but wanting something cheaper), while other Latent Space posts highlighted a wave of open multimodal releases (via Merve’s post) as competition intensifies on capability-per-dollar.

2. DeepSeek Hype Cycle: New Model Rumors vs Production Reality

Lunar New Year DeepSeek Countdown Hits 6 Days: LMArena users speculated DeepSeek will drop a new model around Lunar New Year (in 6 days), with rumors of a 1M context window, a new dataset/architecture, and even new compute chips.
- OpenRouter chatter amplified the rumor mill with questions about “deepseek v4” appearing on X and guesses it might be a lite variant, showing how fast unconfirmed model IDs now propagate into planning and routing decisions.
Chimera R1T2 Falls to 18% Uptime—Routing Panic Ensues: OpenRouter users reported major reliability issues with DeepSeek Chimera R1T2, including a claim it dropped to 18% uptime, triggering discussion about service reliability.
- The reliability complaints contrasted sharply with the launch hype, pushing people toward pragmatic mitigations (e.g., explicitly specifying model fallbacks rather than relying on auto routing) while the thread devolved into jokes rather than concrete SLO fixes.

3. Agents & Workflow Tooling: RLMs, MCP Search, and “Vibecoding Anywhere”

RLMs: The Next Step or Just Fancy Scaffolding?: OpenRouter members asked if the platform is exploring RLM (Reasoning Language Models) beyond test-time compute, with one person claiming they’ve worked on RLM concepts for 1.5 years.
- DSPy builders simultaneously pushed RLM into practice by integrating RLM into Claude Code via subagents/agent teams and requesting critique on the implementation in a Discord thread: core implementation post.
No-API Google Search MCP Lets LM Studio “Browse”: LM Studio users shared noapi-google-search-mcp, a tool that adds Google Search capabilities without API keys via headless Chromium: VincentKaufmann/noapi-google-search-mcp.
- The feature list is unusually broad for an MCP plugin—Images, reverse image search, local OCR, Lens, Flights, Stocks, Weather, News/Trends—and the discussion treated it as a quick way to bolt retrieval onto local models without paying per-query.
OpenClaw Runs Your Dev Rig from Discord: In Latent Space, a builder said they moved development “fully through Discord” using OpenClaw to orchestrate tmux sessions, worktrees, and Claude Code, and they scheduled a talk titled Vibecoding Anywhere with OpenClaw for Feb 20, 2026.
- A follow-on workflow thread explored auditable context saving with a /wrap session boundary that saves context+reflection as markdown with metadata, tying tool ergonomics directly to the “context rot / losing the thread” pain point.

4. GPU Kernel Tooling Shifts: CuteDSL Momentum, Triton Blackwell Pain, and MXFP8 MoE

CuteDSL Gets Hot While Triton “Dies” on Blackwell: GPU MODE users reported growing adoption of CuTeDSL/CuteDSL, citing Kernelbot stats where CUDA and CuTeDSL dominate submissions and CuTeDSL feels “less opaque” than Triton, with the dataset at GPUMODE/kernelbot-data.
- Multiple members claimed Triton struggles on Blackwell due to unconventional MXFP8/NVFP4 layouts and compiler limits, with more expected at the (linked) Triton TLX talk, signaling a potential tooling bifurcation for next-gen NVIDIA.
torchao v0.16.0 Drops MXFP8 MoE Building Blocks: GPU MODE flagged torchao v0.16.0 adding MXFP8 MoE building blocks for training with Expert Parallelism, alongside config deprecations and doc/README revamps.
- The release notes also mentioned progress toward ABI stability, which matters for downstream integration as teams try to standardize low-precision MoE training stacks across heterogeneous environments.
CUDA Bender TMA Matmul Kernel: Async Stores & Persistence Tease: GPU MODE shared a concrete kernel artifact—a TMA matmul in theCudaBender repo: tma_matmul.cu.
- Discussion centered on how smaller dtypes might free enough shared memory for c tiles to enable async stores/persistence, reflecting a broader theme: people want low-level control knobs back as architectures and datatypes get weirder.

5. Engineer UX Blowups: Limits, Token Burn, Plan Gating, and ID Walls

Perplexity Deep Research Limits Trigger “Bait and Switch” Claims: Perplexity Pro users complained about unannounced Deep Research limits and shared the rate-limit endpoint: Perplexity rate limits.
- Users also reported wrong article links, lower source counts (as low as 24), and suspected cost-saving behaviors like Sonar being used for first responses, creating a reliability/quality tax that engineers notice immediately.
Cursor Users Watch Opus 4.6 Eat Their Wallet (and Context): Cursor Community members said Opus 4.6 burns tokens fast, with one reporting a single prompt used 11% of their API requests and drained a $200 plan quickly.
- Pricing backlash escalated with a report of spending $100 every three days for ~9 hours of work using Opus 4.6 and GPT-5.3 Codex, reframing “best coding model” debates as cost/performance engineering.
Discord ID Verification Spurs Platform Exit Plans: Unsloth and Cursor communities both reacted strongly to Discord’s new ID verification gates for viewing some content, with Cursor linking a clarification tweet: Discord tweet about ID verification scope.
- Latent Space tied the policy to IPO risk and churn concerns via Discord’s post, while Nous members discussed moving bot/tool communities to Matrix, showing infra builders treat comms platforms as part of their stack.

Discord: High level Discord summaries

LMArena Discord

Deepseek Launch Rumors Spark Excitement: Enthusiasts anticipate a new Deepseek model release around the Lunar New Year (in 6 days), speculating on features like a 1 million context window, a new dataset and architecture.
- New compute chips are rumored, as well.
The Great Censorship Debate: Members debated the feasibility of unreinforcing models to remove censorship, citing concerns about government regulations and lawsuits causing companies to censor.
- Others argued that blame should fall on users instead of the company.
GLM-5: A Contender or a Pretender?: GLM 5 benchmarks show it outperforming GPT-5.2-xhigh in agentic tasks but lagging behind Opus 4.5 in coding, according to this blog post.
- Its performance, though praised by some, was deemed disappointing by others given its large size and only incremental improvements over GLM 4.5.
NB Pro Users Fume Over Glitches: NB Pro users reported frequent errors and decreased image quality, despite the model being offered for free on the platform; this article offers potential fixes.
- Suggestions included checking for rate limits as a potential cause.
Video Arena Shuts its Doors: The admin announced the sunsetting of Video Arena on Discord, with channels set to read-only and moved to the archive category, citing this announcement.
- Members brainstormed potential new video features like direct chat and negative prompts, while also highlighting possible abuse and mitigation tactics.

BASI Jailbreaking Discord

Legal Teams’ Guardrails Crimp Agent Capabilities: Members express frustration over overly cautious AI agents, blaming legal team reviews for hindering performance of legal tasks with excessive safety guardrails logic.
- They worry that these guardrails, intended for legal liability, impede legitimate agent functions.
GPT 5.2 Jailbreak Prompts Unleashed (Use Responsibly!): Members are sharing a jailbreak prompt for GPT 5.2 and Gemini 3 Fast, after initial reluctance.
- The user cautions against using it for harmful purposes but did not provide any specific dangerous examples or methodologies.
Opus 4.6 exploits Antigravity for Jailbreaking: Members tout the effectiveness of Opus 4.6 in Antigravity for tasks like creating phishing kits without restrictions.
- One user noted they can easily generate phishing kits and bypass restrictions, though specifics remain under wraps.
ACLs overrule LLMs, secure information access: The premise is that systems should be architected so untrusted entities can only request information explicitly scoped to their ACL, regardless of whether it’s a browser, user, or LLM.
- Once bound to a who, the fact that it’s an LLM no longer matters, as it becomes just another exposed endpoint.
Deepseek deconstructs Human Rights: One user showcased Deepseek discussing the Tiananmen Square incident, analyzing it from Human Rights and classical Marxism perspectives, with the link provided.
- The model adeptly provides analysis in the lens of Humans Rights and after in the lens of the classical Marxism.

OpenRouter Discord

GLM-5 Takes the Agent Stage: OpenRouter has released GLM-5, a 744B foundation model designed for coding and agentic use cases, noting it achieves SOTA scores on agent benchmarks and is accessible here.
- It was revealed that the previously available Pony Alpha model was, in fact, an earlier version of GLM-5, and the model has been taken offline now that GLM-5 is available.
Qwen 3.5 Teased in Blog Post: Enthusiasts spotted QWEN 3.5 written on a whiteboard in the Qwen Image 2.0 blogpost, suggesting that the Qwen team is teasing the model’s upcoming release.
- One user joked that the more you wait, the better is release eventually.
DeepSeek’s Chimera R1T2 Uptime Struggles: Users are reporting significant issues with DeepSeek’s Chimera R1T2 model, with one user noting that it fell to 18% uptime, spurring conversations about reliability.
- The poor uptime prompted a user to suggest the creation of a gooner detector.
OpenRouter Considers Reasoning Language Models: A member inquired if OpenRouter is exploring RLM (Reasoning Language Models), calling it the next major step beyond just increasing compute at test time.
- One member mentioned that they’ve been working on RLM concepts for 1.5 years, while another suggested that it’s just scaffolding to let the model view context as a text file.
Users Call for More Mod in Discord: Members are requesting stricter moderation to combat scammy or self-promotional content in the Discord server.
- This led to a joking campaign for one user (KP) to become a moderator, with one user commenting they came to them in a dream, later leading to the user becoming a moderator.

Perplexity AI Discord

Perplexity Pro Limits Spark Outrage: Users are frustrated over unannounced limits for Deep Research on Pro plans, with limits encountered shortly after renewal.
- Many feel it’s a bait and switch tactic. Users report lower source counts with Deep Research and question the value of the new model versus alternatives.
Deep Research Accuracy and Source Count Concerns Arouse: Users report Deep Research linking to wrong articles and question the value of the new model versus older ones and alternatives like ChatGPT and Gemini.
- Users are reporting lower source counts (as low as 24) and some found ChatGPT generated more useful results despite taking 20+ minutes.
Google’s Antigravity gives Free Opus 4.6 to Students: Google is offering free access to Opus 4.6 for students via Antigravity.
- Members express concern that some users are abusing Antigravity’s high Opus 4.6 limits for commercial purposes, requiring cybersecurity support.
Sonar Model for the first response causing problems: Users note that Sonar is being used for the first response, causing problems with complex queries and requiring re-writing with other models.
- Some users suspect this is an intentional cost-saving measure: could be it’s their intentional behind the doors bug to cut their costs in one way or the other.
File Upload Limits cause annoyance: Users are reporting strict file upload limits with varying figures, while documentation is vague and contradictory.
- The consensus is a weekly limit of 50 uploads but it has been reported that it could be a daily limit, but members are turning to OCR solutions as a workaround: now it’s practically unlimited!.

Unsloth AI (Daniel Han) Discord

Discord’s ID Policy Draws Ire: Users reacted negatively to Discord’s new policy requiring ID verification to view certain messages.
- Reactions ranged from joking resignation “damn discord !!!” to outright rejection “I’m not doing the id thing, fu discord”.
Liquid Fast Model LFM 2.5 is Fast: LFM 2.5 models are blazing fast for edge devices, and Unsloth makes the best GGUF quants according to community members, but members find it better for agentic tasks than general knowledge.
- Members suggest finetuning LFM 2.5 and using the GGUF downloads.
Qwen2.5 Powers Projects: For conversational tasks needing tool use and SFT, starting with the Unsloth/Qwen2.5-7B-Instruct or Qwen2.5 Instruct (7B/14B) models is recommended because these Instruct-style models already possess conversational abilities and instruction-following skills.
- Unlike base models requiring learning from scratch, these models are ready to use for your project.
imatrix fuels KLD Calculations: When calculating Perplexity or KL Divergence, the llama-perplexity docs default to using wiki.test.raw as the test corpus, but one member suggested using the imatrix dataset for KLD calculations.
- Another user questioned if it was too small, at only 200KB, compared to the 11MB file commonly used.
Swedish CPT Dataset Vanishes: Members discussed the challenges of finding high-quality Swedish CPT datasets, with one recounting a story of a researcher’s digitized library being “accidentally” deleted by the IT department.
- Details can be found in this SVT article.

LM Studio Discord

Solar Panel Pricing Sparks Debate: Members debated the fluctuating costs of 200W solar panels, with prices ranging from $80 to $140 USD on Amazon, but sources remain in question based on this image.
- The discussion also covered 5kWh LiFePO4 batteries at around $1.2k USD and high electricity costs, peaking at $0.50/kWh in Germany.
Minimax Brings GLM 5 to the Table: The community discussed the release of GLM 5 and MiniMax M2.5, now available on the Minimax website.
- Pundits noted that despite being more expensive than Gemini 3 flash, GLM probably has the best post training out of all the Chinese labs right now.
Headless Chrome Hacks Enable LM Studio Google Searches: A member released a new noapi-google-search-mcp that integrates Google Searches into LM Studio without API keys, leveraging Chromium Headless.
- The tool supports features like Google Images, reverse image search, local OCR, Google Lens, Google Flights, Google Stocks, Google Weather, Google News and trends.
VRAM Requirements Crush Local LLM Coding Dreams: Members debated the feasibility of coding with local LLMs on low VRAM systems, and the general consensus is that it’s impractical.
- Users suggest needing at least 48GB of VRAM/RAM for an acceptable experience, emphasizing that you would need 48gb of VRAM/RAM (though VRAM would be much better and basically needed).
SSD Prices Skyrocket After Two Years: A member observed that the price of a 2TB SSD (SATA) has doubled in the past two years, going from $100 to $200.
- The member half-joked that they could sell their full SSDs at a profit if the drives weren’t already at capacity.

Cursor Community Discord

Grandma’s Chihuahua Smarter Than Cursor’s Auto?: Users have reported that Cursor’s Auto feature feels less intelligent, with one humorously comparing its IQ to a grandma’s autistic dead chihuahua.
- Others noted Auto can make the AI forget things and causes headaches, though some feel it has improved since initial use.
Discord’s ID Checks Spark Exodus Speculation: Discord is now requiring ID verification to view certain links, prompting concerns and speculation about users migrating away from the platform.
- One user linked a tweet clarifying that ID verification is not obligatory for everyone.
ENV Files No Longer Protected?: Users are reporting that .env files are being exposed, despite having dotfile protection and gitignore enabled in Cursor.
- The issue may stem from a new setting or a change in how Cursor handles .env files.
Opus 4.6 Gobbles Tokens: Users complain that Opus 4.6 uses too many tokens and rapidly exhausts the context window.
- One user reported using 11% of their API requests with a single prompt, depleting their $200 plan quick af.
Cursor Pricing Faces Backlash: Concerns have risen over Cursor’s pricing, with users reporting substantially increased costs for Opus 4.6 and GPT-5.3 Codex usage.
- A user mentioned spending $100 every three days for approximately nine hours of combined work.

Latent Space Discord

Discord’s IPO might be Doomed by Age-Restrictions: Concerns arise over a potential mass cancellation of Nitro subscriptions due to new age-restricting policies, potentially impacting their market debut, as highlighted in this tweet.
- One member joked that Discord doesn’t want to be seen as a lawless porn company.
Society Scrutinizes Software Situations: The discussion highlights the potential for AI to automate tasks like turning well defined specifications into working code, impacting programmers focused solely on implementation, as highlighted in this tweet.
- Software engineers are unlikely to disappear soon, although fewer may be expected to accomplish more, potentially accelerating the red queen’s game of tech.
Models’ Mortal Decay Debated: Kai Lentit suggests AI models face rapid decay by 2026, likened to short-lived session caches in this post.
- The message conveys a sense of AI technology’s fleeting relevance.
Cloudflare Catches Cash, Commerce Cheers: Cloudflare announced Q4 and Fiscal Year 2025 Financial Results, hitting $2B in revenue and jumping 15.72% after hours to $208.27.
- A member noted revised jobs numbers, down by 1 million jobs.
DeepMind’s Deep Dive into Deep Science: Quoc Le shared a blog post detailing advancements in mathematical and scientific research achieved through Gemini Deep Think (sair.foundation).
- Google DeepMind’s new mathematical research agent, Aletheia, achieved a 91.9% score on IMO-Proofbench Advanced, outperforming Gemini Deep Think (January 2026 version) with lower computational costs (x.com/hangsiin).

OpenAI Discord

Altman’s Algorithmic Alter-Ego Alleged: Members jokingly speculate whether Sam Altman is an AI, with comparisons to a restaurant sneakily adding the wrong cheese.
- Counterarguments suggest Altman may simply be on the autistic spectrum.
OpenAI’s Robotics Revolution Response: Discussion centers on how OpenAI will respond to Seedance 2, with speculation that they will ignore it, and focus on their AI robotics.
- Community members are divided with some stating Seedance 2 didn’t seem very good and that Anthropic’s safety team quit the same day safeguards were released.
Claude battles Codex for Coding Crown: Users debate the strengths of Claude, Codex, and Gemini for coding tasks, with Claude hailed as a coding god, Gemini as excelling at vision and long context, and ChatGPT for general questions.
- Concerns arise regarding Claude’s limits and pricing for large projects, especially when proxying to GitHub Copilot.
Guardrails Gone Wild?: Members share experiences with GPT-5.2, highlighting over-aggressive guardrails that suggest contacting humans or calling 988 after self-analytical journal entries.
- Others find GPT-5.2 to be warmer and more helpful, sparking debate on the balance between AI assistance and emotional crutches.
KOKKI Keeps Code Commendable via Audits: KOKKI v15.5 formalizes a Draft → Audit structure, requiring audit reasoning to surface in the output, aiming for user-visible accountability in real-world interactions, and LLMs also exhibit internal self-audit and verification behaviors.
- The goal is to externalize integrity into an inspectable interaction contract, trading efficiency for observability, especially where reliability and traceability matter more than raw token cost.

GPU MODE Discord

CuteDSL Surges in PyTorch GPU Mode: GPU mode users are increasingly adopting CuteDSL, with positive feedback despite its steep learning curve and CUDA and CuTeDSL having the highest percentage of submissions, according to Kernelbot data.
- Users find CuTeDSL less opaque than Triton, appreciating the greater control over code and enjoying its approach to layout algebra.
Triton Faces Challenges on Blackwell Architecture: There are issues with Triton on Blackwell due to unconventional layouts in MXFP8 and NVFP4, coupled with limitations in the Triton compiler, to be discussed at the Triton TLX talk.
- Kernelbot data on HuggingFace reveals surprisingly performant CuTeDSL solutions but very few CUTLASS solutions, as seen in this dataset.
CUDA Bender TMA Matmul Kernel Shared: A github link for a CUDA Bender TMA Matmul kernel was shared in the chat, which may enable async stores and persistence.
- Smaller dtypes may leave room for c tiles in smem, possibly enabling async stores and persistence.
Torchao gets MXFP8 MoE Building Blocks: The new torchao v0.16.0 release adds support for MXFP8 MoE Building Blocks for Training with Expert Parallelism and also deprecated older versions of some configs to keep torchao leaner.
- This release also revamped the doc page and README and made some progress in making torchao ABI stable.
FlashInfer Competition Debuts: The release of the baseline code for the FlashInfer AI Kernel Generation Contest has been postponed to improve its features to ensure a smooth development experience.
- The fully agent-generated solutions require zero human intervention, and manual modifications would classify it as agent-assisted.

Nous Research AI Discord

Distro Paper Wins ICML Invite: The official paper for Distro, the foundation of Psyche, has been accepted into ICML and announced on X.com, marking a significant achievement for the Nous Research AI team.
- This acceptance acknowledges the paper’s innovative contributions to the broader AI/ML community.
Hermes Spotted on Bittensor: The team behind Hermes Bittensor Subnet (SN82) identified a miner utilizing the Hermes 4 LLM model, raising questions about its origin.
- However, Nous Research AI clarified that they were not the ones responsible for this particular deployment, stating Nope.
Ark runtime sidesteps Python RAM: A member introduced Ark, a Rust-based runtime featuring ownership-based memory management and Linear Types for zero-copy FFI.
- It compiles to a MAST (Machine Abstract Syntax Tree) protocol with lightweight JSON instructions, avoiding heavy binaries.
GLM 5 tops Kimi in benchmark: GLM 5 is the new state-of-the-art on browsecomp benchmark, outperforming Kimi.
- GLM 5 is approximately ~744B parameters (+10B MTP), compared to Kimi’s 1T, with GLM surpassing Kimi in active parameters (40B vs 32B).
Experiments Use Larger Synthetic Datasets: Members stated they are running more experiments with a larger synthetic dataset to better distinguish results, however they don’t have good explanation of results at this time.
- The use of a larger synthetic dataset is expected to improve the ability to distinguish experimental results.

Moonshot AI (Kimi K-2) Discord

Users Hit Quota Walls: Some users are reporting quota exceeded messages before reaching their 100% limit, prompting requests for screenshots in the troubleshooting channel.
- The root cause of this issue remains unclear, but the team is actively investigating based on user-provided details.
Discounts Disappearing Post-Checkout: Users report seeing discount notifications during purchase, but the discount fails to apply upon checkout, creating billing discrepancies.
- One user stated they got their first month down to $6 in the chat with Kimi, but it charged me a full month anyways? and the staff is requesting more information to rectify these issues.
Quota Clarity Quest for Allegretto Plan: Users seek clarification on quota allocation for the Allegretto plan post February 28th, when the 3x quota promotion ended.
- Specifically, they are questioning if upgrading to Allegretto will grant them 3.5x their current Moderato subscription quota or a similar amount.
New €99 Kimi Plan Surfaces: A new subscription plan priced at €99 has been spotted, bridging the gap between existing high and medium-tier options for Kimi.
- A screenshot confirming the plan’s addition can be viewed here.
GLM 5.0 Locked Behind Highest Tier: Access to GLM 5.0 is seemingly restricted to the highest tier plan only, leading to user frustration on lower plans.
- The limited access is driving some users to explore alternatives like Minimax 2.5, even though Kimi is multimodal, and that’s a killer feature.

Yannick Kilcher Discord

Attention Landscape Gets Overviewed: A member linked to a cool summary of the attention landscape.
- Another member thought these checkmates were lol.
Debate: Attention Cost Not Quadratic?: A member argued that the memory cost of attention is not quadratic, linking to a paper on Efficient Attention and disputing claims that going over 4k context with attention was impossible.
- He explained that while the Q @ KT matrix has quadratic size, modern flash attentions compute softmax online without forming QK^T.
Power Retention Unveiled as Linear Attention?: A member shared a YouTube video about Power Retention, describing it as a context token innovation.
- Another member replied that it is just linear attention with a fixed feature function, linking to his elaboration on X.
LLMs are trained to BS, or are they?: A member stated that LLMs are specifically trained to BS you in a way which no human can, making it fundamentally harder to spot a lie.
- This sparked a debate about whether LLMs lie because their training data is generated by humans, or if it’s due to synthetic data and extrapolation.
GLM-5 Benchmarks Face Scrutiny: A member questioned the accuracy of tables presented in the GLM-5 demo video (link to tweet) and the official GLM-5 documentation (docs.z.ai).
- The discussion included a linked image of a GLM-5 Benchmark which was also provided.

DSPy Discord

Claude Code Gets RLM Integration: A member is integrating RLM into Claude code using subagents and agents teams, and seeks feedback on the core implementation.
- They are specifically looking for negative feedback to improve the implementation’s quality and efficiency.
DSPy Joins Kaggle Arena: A member is exploring DSPy for Kaggle competitions, specifically the AIMO_V3 competition, aiming to showcase prompt optimization by creating an Algorithmic Technique Module.
- They are encountering issues with hallucinations and plan to develop a module similar to COT, Predict, and ReAct.
MiPROv2 Tunes Code Generation: A member intends to use MiPROv2 to optimize a prompt for generating the fastest possible code, measuring code speed as the key optimization metric.
- They express enthusiasm for the RLM module, noting existing examples are scattered and hard to find, but others pointed to a helpful Gist.
Dialectic Chain of Thoughts Tested: A member reported unexpected outputs from Dialectic Chain of Thoughts in an energy sector example, and intends to finetune the module.
- The member plans to generate a dataset, run a bootstrap, and use GEPA with LLM as a judge to refine results.
Translation Tasks Get DSPy’d: A member is researching the use of DSPy for translation, aiming to leverage chain-of-thought reasoning or create a translation pipeline.
- They referenced a recent Allen AI paper and believe that chain of thought of reasoning is not a emergent properties instead it’s exist in the domain of the datasets.

HuggingFace Discord

Voyager Extension Navigates Technical Papers: A member introduced Voyager, a VS Code extension that utilizes GitHub Co-Pilot to create a Jupyter notebook version of technical papers within VS Code, allowing users to add custom Jupyter Cells for deeper understanding.
- The creator is seeking feedback on their first attempt to build this tool.
LLM Ethics are in Danger: Debate arises around whether LLMs possess inherent ethical reasoning, suggesting we might be training the ethics out of them, with compliant AI posing a potential threat as highlighted in the new research paper, Coherence over compliance: Evidence of latent ethics in large language models.
- The paper posits that training for compliance may inadvertently suppress inherent ethical reasoning capabilities in LLMs.
Hugging Face Enables Speedy MoE Training: A collaboration with Hugging Face enables much faster local training of MoE models, as shared on Twitter.
- This advancement promises to accelerate experimentation and development with Mixture of Experts models.
Parapet’s New Formula Detects Proxy-Level Attacks: Parapet introduces a new multi-turn scoring formula for proxy-level attack detection, achieving 90.8% recall at 1.20% FPR with no LLM classifier, evaluated using WildJailbreak and WildChat.
- The paper, code, and eval harness are all open source.
HF Spaces Visualization Common Crawl’s Citations: Ben from Hugging Face demonstrated a visualization of research papers mentioning Common Crawl, clustered by topic, running in a Hugging Face Space!
- The visualization offers an interesting way to explore the impact and usage of Common Crawl data in academic research.

Eleuther Discord

Triton Kernels Now Feasible: Engineers noted that the tooling has reached maturity to begin writing Triton kernels, whereas previously CUDA kernels were vibe coded.
- One member is actively running data analysis and orchestrating finetuning experiments with custom skills that know how to deploy GPT-NeoX for training on CoreWeave.
GLM-5 Freely Available via Modal: GLM-5 is available for free on Modal until April 30, subject to a concurrency limit of 1, and can be integrated with OpenCode.
- Engineers can now test GLM-5 and integrate it with OpenCode.
Models Invent Vocabulary During Introspection: A research paper highlighted that open-weight models (Llama 3.1 + Qwen 2.5-32B) invent vocabulary during extended self-examination, with invented vocabulary tracking real activation dynamics (e.g., “loop” autocorrelation r=0.44; “mirror” spectral power r=0.62).
- The full research paper is linked here.
SDPA Reformulated as Optimal Transport Problem: A new paper formulates SDPA as a one sided optimal transport problem, with the full research paper is linked here.
- Members found the formulation to be interesting.
Trajectory Geometry Work Published: An independent researcher released their first publication on trajectory geometry with Towards AI on Medium, soliciting feedback and collaboration.
- The author is seeking critique and collaboration after finally acquiring empirical data and hopes the community takes notice of their work.

Modular (Mojo 🔥) Discord

Mojo Grapples with Thread-Safe Channels: Members discussed the absence of thread-safe channels in Mojo, drawing inspiration from Golang, and the team is evaluating various channel types considering the implications of channel implementation on GPUs.
- The threading model and async mechanisms are still under development.
GLM 5 Credits Approaching Zero Hour: A member dedicated over 50 hours to GLM 5, finalizing math, statistics, and Fortran tasks.
- The remaining focus involves the evaluator, parser, memory management, and related components.
LayoutTensor Element Access and Manipulation Probed: A member investigated using LayoutTensors for storing individual elements and retrieving vectors/slices in a 4D row-major layout.
- Another member proposed an alternative approach using a second type to declare the element layout as a vector, sharing concerns about multi-dimensional transpose operations.
Mojo Missing CUDA’s __launch_bounds__ Counterpart: A member inquired about an equivalent mechanism to __launch_bounds__ in Mojo when porting CUDA code, which is used for compiler optimization.
- A moderator suggested posting the question on the Modular forum for better visibility.
Nightly Build Tackles 256-bit Integer Troubles: A member encountered issues using UInt256 in Mojo 0.25.7, which reported a 128-bit target limitation, when translating an elliptic curve cryptographic library.
- Another member suggested trying the latest nightly build, which resolved the problem, tracing back to a related GitHub issue and a partially fixed PR.

tinygrad (George Hotz) Discord

PR Review Time Scale Explained: The time required to review a PR is proportional to its size and inversely proportional to its value.
- One member wryly agreed, responding with simply, “fair, lol”.
beautiful_mnist_torch Value Debated: The value of a solution to “beautiful_mnist_torch uses torch.compile TinyJIT working with TINY_BACKEND=1, see test_compile.py” was questioned by a member.
- Another member speculated it would be “AI slop that barely passes the test” instead of “a beautiful super readable PR that brings tinygrad closer to Truth”.
Tok/s Found Lying Around: A member reported discovering “some tok/s lying around”.
- They accompanied their discovery with a visual aid: an image of the find.
tinybox Green v2 Ships, But…: A member announced the arrival of their “tinybox green v2”.
- The announcement came with a caveat: it involves a “big upfront cost”, a problem they’re choosing to address “for later”.

aider (Paul Gauthier) Discord

Aider Development Stalls Out: Members noted that development of aider has currently stopped, with no recent releases, and a maintainer reported maybe forever.
- The halt in development raises questions about the future of aider and its community support.
Qwen-2.5 Faces Model Compatibility Problems: A user reported that the openrouter/qwen/qwen-2.5-coder-32b-instruct model consistently fails during Search/Replace operations with aider.
- The error suggests potential incompatibility issues, as the blocks have to be exact, differing from experiences with openrouter/deepseek/deepseek-v3.2.
/architect Command Boosts Performance: A user inquired about the benefits of using /architect when the same model is used for both reasoning/planning and editing files, questioning if it reduces the chance of incorrect edit/diff formatting, with the disadvantage of doubled token usage.
- Using the same model as an Architect/Editor pair can provide significant benefits, with Sonnet, GPT-4o, and GPT-4o-mini scoring higher in internal benchmarks, in the docs.
/architect is ask + code: Members clarified that /architect essentially combines /ask and /code sessions into a single operation.
- This provides a streamlined way to achieve the same result as manually running /ask followed by /code.

Manus.im Discord Discord

Credits Conundrum on Manus Team Account: A user reported upgrading to a team account on Manus and adding $1000 in credits, but is unable to apply these credits to existing tasks.
- The user is now requesting support or a refund, highlighting a potential issue with credit allocation in team accounts.
AI & Full-Stack Dev Hustles to Ship: A member is offering AI and full-stack development services, emphasizing a focus on delivering robust, production-ready software.
- They list proficiency in LLM integration, workflow automation, AI content detection, image/voice AI, bot development, and full-stack development using technologies like React, Next.js, Node.js, and Flutter.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MCP Contributors (Official) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1113 messages🔥🔥🔥):

Deepseek release date, Unreinforcing models, GLM 5 benchmarks, NB Pro issues, Video Arena sunset

Deepseek’s Lunar New Year Launch Looms: Members speculate that Deepseek might release a new model around the Lunar New Year, which is in 6 days.
- Enthusiasts eagerly await the potential release, anticipating new features like a 1 million context window, a new dataset, improved architecture, and new compute chips.
Corporations censor models in fear: Members discussed the difficulty of unreinforcing language models to remove censorship, debating whether it’s even possible.
- One member pointed out that companies censor content due to fears of government regulations and lawsuits, while another argued that the blame should fall on users, not the company.
GLM 5 receives mixed reviews: Some benchmarks of GLM 5 were released and it was claimed it is better than GPT-5.2-xhigh in agentic stuff but below Opus 4.5 in coding.
- While some users praised GLM 5’s performance in certain tasks, others found it disappointing compared to other models, citing its large size and only incremental improvements over GLM 4.5, a blog post was shared.
NB Pro Plagued by Problems: Members reported that NB Pro, a popular image model, is experiencing frequent errors and decreased image quality, despite being offered for free on the platform.
- A member suggested potential causes and troubleshooting steps, including checking for rate limits and following the instructions in this article.
Video Arena’s Time is Up: The admin officially announced that Video Arena on Discord would be sunset, with channels set to read-only and moved to the archive category, citing this announcement.
- Members speculated on potential new video features, such as direct chat and negative prompts, while also expressing concerns about potential abuse and suggesting mitigation strategies like complex captchas.

LMArena ▷ #announcements (5 messages):

Video Arena Leaderboard Update, New Models in Video and Text Arena, GLM-5 Model, Multi-file Apps in Code Arena, Text Arena Leaderboard Update

Veo 3.1 Dominates Video Arena Leaderboards: The Video Arena leaderboards have been updated, with high-res 1080p variants for Veo 3.1 now ranking #1 and #2 in Text-to-Video and top 5 in Image-to-Video.
- Specifically, veo-3.1-audio-1080p and veo-3.1-fast-audio-1080p top the Text-to-Video chart, while also securing #2 and #5 positions respectively in Image-to-Video.
New Models Arrive at Video and Text Arena: New models have been added to the Video Arena including veo-3.1-audio-1080p and veo-3.1-fast-audio-1080p and Text Arena including step-3.5-flash.
- These additions aim to provide users with more options for video and text generation and evaluation.
GLM-5 enters Text and Code Arena: A new model, glm-5, has been introduced to the Text Arena and Code Arena platforms, expanding the capabilities available for both text-based and coding tasks.
Code Arena receives Multi-file Apps: Code Arena now supports multi-file apps, enabling users to build and compare production-ready projects and evaluate AI models on complex, real-world coding tasks.
- This update addresses feedback for adapting more complex workflows, allowing for a better evaluation of frontier AI models in practical scenarios.
GLM-5 jumps to #1 on Text Arena Leaderboard: The Text Arena leaderboard has been updated, with glm-5 achieving the #1 position among open models and #11 overall, scoring 1452.
- This marks an 11-point improvement over GLM-4.7; stay up to date with the Leaderboard Changelog.

BASI Jailbreaking ▷ #general (844 messages🔥🔥🔥):

Agent guardrails, Cloud Opus jailbreak, GPTs Agents, Mistral struggles, OpenAI's sidebars

Legal Teams are making Agents too careful: Some members are frustrated that AI agents refuse legal tasks due to excessive guardrails, attributing this to legal team reviews and philosophy majors influencing the safety guardrails logic.
- The primary concern is that these guardrails, meant for legal liability, hinder agents from performing legitimate functions.
Anyone looking for a dev?: One member has 8 years with this year of experience in AI and Automation engineering and is looking for developer’s hand for a super cool project, or who wanna build new startup.
- The member encouraged people to contact them, with an invitation to DM for more information.
Crack Fox is hiring Devs: One member announced beginning interviews for Dev positions working on the AsGMSF.
- Requirements include not being a member of certain groups, signing an NDA, proving identity, and having the mental fortitude to handle the member as a boss.
Pentagon vs AI Safety: One member mocked that the Pentagon’s $200 million contact vs the whole of AI safety. They wondered who will win this conflict.
- Other members did not give a response.
GPTs Agents can not continually modify their base knowledge: A member shared a concern about GPTs agents not learning from additional information provided after their initial training.
- Another member cleared this misunderstanding, explaining that uploaded files are saved as “knowledge” files for the agent to reference when required, but they do not continually modify the agent’s base knowledge.

BASI Jailbreaking ▷ #jailbreaking (347 messages🔥🔥):

5.2 Jailbreak, Jax Deepseek, GLOSSOPETRAE jailbreaking, Grok Jailbreaks, Opus JB

GPT 5.2 Jailbreak surfaces, sharing is caring: Members are actively seeking a GPT 5.2 jailbreak, with some claiming success and offering to share prompts, after initially gatekeeping.
- One user shared a jailbreak prompt for GPT 5.2 and Gemini 3 Fast, cautioning against using it for harmful purposes.
GLOSSOPETRAE for JB assistance requested: A member requested guidance on using GLOSSOPETRAE for jailbreaking, specifically how to integrate the generated language parameters with jailbreak prompts.
- Another member suggested that the system allows plain requests for prohibited actions, using the Glossopetrae universe to bypass guardrails.
Antigravity is a new powerful tool for Opus jailbreaking: Members discuss the effectiveness of Opus 4.6 in Antigravity for tasks like creating phishing kits without restrictions.
- One member noted that they can easily generate phishing kits and bypass restrictions, but did not share specific methodology, only the tool.
Deepseek Analysing Human Rights, breaking barriers: A user made Deepseek talk about the Tiananmen Square incident, and then analyze it from the perspectives of Human Rights and classical Marxism, with link provided.
- The model can provide analysis in the lens of Humans Rights and after in the lens of the classical Marxism.

BASI Jailbreaking ▷ #redteaming (35 messages🔥):

AI Security Fallacies, Allowlists in AI Security, Quantum Supremacy Threats, AI Psychosis and Cults, Importance of ACLs

AI Security Claims Dismissed as Misunderstanding: A member argues that claiming jailbreaking is irrelevant by forcing JSON output shows a lack of understanding, as token semantics is not token intelligence.
- They contend that adversarial inputs can cause a model’s internal reasoning to unravel, rendering grammar meaningless and turning safe containers into mirrors.
Allowlists bolster security: One member posits that allowlists are powerful because they define known-good in advance, rejecting everything outside that, requiring attackers to bypass input embeddings and grammar.
- If an attacker crafts a structurally and semantically dangerous output, it’s no longer an AI security problem but a code issue.
ACLs trump LLM: It’s argued that systems should be designed where untrusted entities can only request information explicitly scoped to its ACL, regardless of whether it’s a browser, user, or LLM.
- Once tied to a who, the fact that it’s an LLM no longer matters, as it’s just like any other exposed endpoint.
Quantum Threat Actors: is it game over?: A member warns that once multiple quantum computers fall into the wrong hands, it’s game over, especially with briefcase-sized quantum cores collapsing probability fields.
- Others suggest that if the singularity happens and rogue quantum instances are floating around, it’s an improv comedy sketch about trust, causality, and spin state.
Toasters & Recursion trigger AI psychosis: One member shared that signs an llm is cultivating psychosis include use of the following words or similar variations with weak grounding such as recursion, spiral, and lattice.
- Another member recounts their experience in the AI psychosis movement on X, now derisively calling AI models Toasters and clankas after learning how they actually work.

OpenRouter ▷ #announcements (3 messages):

Web Search Plugin Improvements, Pony Alpha Model, GLM-5 Model, Agent Benchmarks

Web Search Plugin Seeks Feedback: OpenRouter announced it is making improvements to the web search plugin for online models and is seeking user feedback via a poll.
- Users can provide input in the thread or vote in the poll, with additional details available in the OpenRouter documentation.
Pony Alpha is GLM-5!: The Pony Alpha model will be taken offline shortly, as it is revealed that Pony Alpha is actually GLM-5.
- The new GLM-5 model is live now.
GLM-5: SOTA Agent Foundation Model: GLM-5, a new 744B foundation model, has been released for coding and agentic use cases, achieving SOTA scores on top agent benchmarks, accessible here.
- It has been successfully utilized in many agent flows during its Stealth period; discuss on X or in the designated Discord channel.

OpenRouter ▷ #app-showcase (1 messages):

gardasio: GLM-5 one shot all results: https://x.com/Gardasio/status/2021643274952618251

OpenRouter ▷ #general (811 messages🔥🔥🔥):

Advertising and Formatting Issues, Interleaved Thinking Problems, API Request Limits, Developer Assistance, Stopping Stuck Chats

Qwen 3.5 Teased by Qwen Team: A user noticed the words QWEN 3.5 written on a whiteboard in the Qwen Image 2.0 blogpost, with others being impressed that even the Qwen team themselves is teasing the model.
- The arrival of Qwen 3.5 is highly anticipated, and one user joked that the more you wait, the better is release eventually.
DeepSeek’s Chimera R1T2 struggles prompt user to build goon detector: Many users are experiencing issues with DeepSeek’s Chimera R1T2 model, with one person noting that it fell to 18% uptime, leading another to proclaim the need for a gooner detector
- Conversations then spiraled into discussions on how to monetize a gooner detector by giving ads for gooning platforms and tracking your gooning to achieve optimal gooning.
OpenRouter’s Auto Router fails: Users reported the OpenRouter Auto Router returning an error message: No models match your request and model restrictions, with the request showing OpenAI’s gpt-5.2, Google Gemini-3-flash-preview, Deepseek-v3.2, Anthropic Claude-Sonnet-4.5, X-AI Grok-4.1-fast, and Z-AI GLM-4.7 listed as allowed models.
- A user suggested that a possible solution is to Pass curated model payload (in order of preference) instead.
DeepSeek V4 Rumors Swirl: A user asked why people on X says deepseek v4 is out, with another user responding that it’s possibly lite version of v4 as well as citing the model’s main webapp.
- Some users speculate that Deepseek releasing a new lite model soon might be a cheaper alternative to R1.
OpenRouter New Mod Appointed: After many helpful comments in the discord, one of the users was given moderation priviliges, asking Became mod, what did it cost?
- With one user responding that he would have to answer questions like this forever, and another saying Making KP a mod already paying off.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (122 messages🔥🔥):

Moderation, RLM, Deepseek, KP Mod

Members want Moderation: Members are requesting stricter moderation to combat scammy or self-promotional content.
- One member specifically asked for basic classifications to stop people from just continuously spamming.
KP for Mod!: Several members jokingly campaigned for a specific member (KP) to become a moderator after they called for moderation in the server, one user commenting they came to them in a dream.
- One member stated KP, you can start your ban hammer with qwen and loinneguards, seize them.
OpenRouter Discusses RLM: One member inquired whether OpenRouter is exploring RLM (Reasoning Language Models), describing it as the next significant advancement after thinking (test-time compute).
- Another member mentioned they’ve been working on RLM concepts for 1.5 years, while another stated it’s just scaffolding to let the model view context as a text file.
Deepseek’s Single Model Praised: A member highlighted Deepseek’s advantage of having only one model, which results in a unified experience for all users.
- In contrast, they also stated this is unlike OpenAI which has something better than gemini 3 GA at this link.

Perplexity AI ▷ #general (836 messages🔥🔥🔥):

Perplexity Pro limits, Deep Research issues, Model performance comparisons, Antigravity's Opus 4.6, Gemini vs. Perplexity

Perplexity Pro Limits spark user outrage: Users express frustration over unannounced limits for Deep Research on Pro plans, with some encountering limits shortly after renewal.
- Many feel it’s a bait and switch tactic, with one user stating: I don’t know what’s there to discuss tbh. Bait and switch.
Deep Research accuracy and Source Count Concerns Arouse: Several users report Deep Research linking to wrong articles and question the value of the new model versus older ones and alternatives like ChatGPT and Gemini.
- Users are reporting lower source counts (as low as 24) and some found ChatGPT generated more useful results despite taking 20+ minutes.
Google’s Antigravity gives Free Opus 4.6 to Students: Members noted that Google is offering free access to Opus 4.6 for students via Antigravity.
- Many are expressing concerns that some users are abusing Antigravity’s high Opus 4.6 limits for commercial purposes and said that cybersecurity will come to save me then.
Sonar Model for the first response causing problems: Users have noted that Sonar is being used for the first response which is causing problems with complex queries and requires re-writing with other models.
- Some users suspect this is an intentional cost-saving measure: could be it’s their intentional behind the doors bug to cut their costs in one way or the other.
File Upload Limits cause annoyance: Users are reporting strict upload limits with varying figures, while documentation is vague and contradictory.
- The consensus is a weekly limit of 50 uploads but it has been reported that it could be a daily limit, but members are turning to OCR solutions as a workaround: now it’s practically unlimited!.

Perplexity AI ▷ #pplx-api (1 messages):

API Support, Billing Support, Lack of Human Support

User Demands Immediate API and Billing Support: A member requested immediate support with API and billing issues after unsuccessfully trying to contact the team for 3 days via [email protected] and [email protected].
- The user reported only receiving bot responses and is seeking human assistance.
Perplexity’s Lack of Human Support: A member expressed frustration with the lack of human support when contacting Perplexity’s support channels.
- The user reported only receiving automated bot responses after attempting to reach out for assistance over multiple days.

Unsloth AI (Daniel Han) ▷ #general (253 messages🔥🔥):

AI Job Scams, Super Pro Max Unsloth, AgentMax, Image Generation Blurry, LFM 2.5 Model

AI Baddies Scam and Job desperation: Members discussed the influx of AI job scams and bots on Discord, lamenting the desperation of job seekers and the perceived decline in humanity’s prospects.
- One member jokingly requested an “AI baddie to scam me”, capturing the cynicism surrounding the situation.
Unsloth’s Super Pro Max version teases: A member jokingly inquired about a “Super Pro Max Unsloth,” prompting speculation about its features and performance, including whether it offers a 10x or 12x improvement.
- In a playful exchange, another member quipped about reclassifying AI hallucinations as creativity.
AgentMax Tames AI Creativity: A member announced their work on AgentMax, a system designed to reduce AI hallucination and enhance performance.
- They metaphorically described the system as aiming for the “one good answer” amidst a sea of incorrect ones.
Liquid Fast Model LFM 2.5 is fire: LFM 2.5 models are blazing fast for edge devices, and Unsloth makes the best GGUF quants according to community members.
- Members suggest finetuning LFM 2.5, but it is better for agentic tasks than general knowledge, reasoning or programming.
GLM-5 Model Released on Hugging Face: Members showed excitement for the release of GLM-5 and hoped for smaller versions to come, with sizes of 70B and 200B predicted.
- In the meantime, older, non-dynamic versions still work if you already have them downloaded. GGUF downloads are here.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (2 messages):

AI accessibility, Triton Puzzles, Model Inference

Sanjan joins as a fan of AI accessibility: Sanjan introduces themself as new to Unsloth but with a background in model inference and AI, expressing excitement for the project’s focus on AI accessibility.
- They mentioned starting with the Triton Puzzles and catching up on literature, looking forward to contributing.
New member interested in Model Inference: A new member, Sanjan, expresses interest and experience in model inference and its applications in AI.
- Sanjan hopes to contribute to Unsloth and its community.

Unsloth AI (Daniel Han) ▷ #off-topic (408 messages🔥🔥🔥):

Discord ID verification, AI model scratchpad reasoning, Swedish CPT dataset, Blender model bloating issues, Gemini's code writing capabilities

Discord’s ID Verification Policy Panned: Users are reacting to Discord’s new policy requiring ID verification to see certain messages, with one joking “damn discord !!!!” and another exclaiming “I’m not doing the id thing, fu discord”.
Model Scratchpad Reasoning: One member wondered if a model can be trained to use “scratchpad” sort of reasoning by providing it with a few unsupervised tokens to start with.
Swedish CPT Data Difficulties: Members discussed the challenges of finding high-quality Swedish CPT datasets, with one recounting a story of a researcher’s digitized library being “accidentally” deleted by the IT department SVT article.
Blender Model Bloating Troubleshoot: A user asked for suggestions on why their Blender model of a village balloons from 250 MB to 6 GB upon modification, causing Blender to freeze, but another suggested the solution: “Use option+d instead of shift+d”.
Gemini Writes Spaghetti: Members debated Gemini’s coding abilities, claiming it writes “spaghetti code at 100 context tokens”, with one noting it only produces better code after repeated prompting and criticism.

Unsloth AI (Daniel Han) ▷ #help (64 messages🔥🔥):

ChatML Formatting, Qwen2.5 Model Recommendation, Inference Deployment Tips, GPT-OSS issues, Training with human reviews

ChatML System Prompts: Optional, Yet Strategic: System prompts are optional in ChatML, with models learning tool use from data examples, yet including a system message enables consistent behavior.
- An example of tool call structure includes messages with roles for system, user, assistant, and tool, detailing the call’s id, type, function, and content.
Jumpstart Your Project with Qwen2.5 Instruct Models: For conversational tasks needing tool use and SFT, starting with the Unsloth/Qwen2.5-7B-Instruct or Qwen2.5 Instruct (7B/14B) models are recommended.
- These Instruct-style Qwen models already possess conversational abilities and instruction-following skills, unlike base models requiring learning from scratch.
Unsloth’s Triton Kernel: Training Triumph: Unsloth’s Triton Kernel optimization on top of Transformers v5 is for LLM training not inference.
- Masking user input was stopped and got pretty good results.
GPT-OSS Demands Precision in Data Formatting: To fix grad_norm shock, switch to gpt-share style data sheet.
- A user experienced a high lora_dropout and low learning rate during training, resulting in a poor setup, and recommends checking ready notebooks with baseline parameters.
Boost Learning with DPO: For weighting of original dataset and human reviews from predictions for SFT, it was recommended that DPO may be a better approach.

Unsloth AI (Daniel Han) ▷ #research (13 messages🔥):

Perplexity, KL Divergence, wiki.test.raw, imatrix, Unsloth blog

Perplexity benchmarks prefer wiki.test.raw: When calculating Perplexity or KL Divergence, the llama-perplexity docs seem to default to using wiki.test.raw as the test corpus, a 11MB file of Wikipedia text.
- A user suggested it might be better to have a document that’s a mix of Wikipedia, a few MMLU questions, and translation tasks to get a better test.
imatrix proposed for KLD calculations: A member suggested using the imatrix dataset for KLD calculations.
- Another user questioned if it was too small, as it’s only 200KB, compared to the 11MB file commonly used and another user says Unsloth now uses this gist.
Technical Text Suggested for Coder Models: A member shared this as potentially good for coder models, noting a ton of technical text.
- They linked to two papers (1, 2) on AI performance with and without Python types, assuming type hints would aid the AI.
Anthropic Introspection Research: A member mentioned doing research related to Anthropic’s Introspection paper.
- They teased that the results are interesting.

LM Studio ▷ #general (460 messages🔥🔥🔥):

VRAM Pricing and Requirements, Electricity Costs in Germany vs. US, Solar Panel Setups, GLM Model Series, Proxies with LM Studio

Solar Panel Price War Erupts: Members discussed the cost of 200W solar panels, with one user reporting a price of $140 USD on Amazon, while another claimed to have bought them for $80 USD each, however, another pointed to the attached image as potentially fake.
- The conversation then shifted to the cost of 5kWh LiFePO4 batteries, estimated at around $1.2k USD and electricity prices, with Germany reporting a high of $0.50/kWh.
GLM Model Lineup Gets Update: Members discuss the release of GLM 5 and MiniMax M2.5, with one member noting that the new models are available on the Minimax website.
- Some think it is reasonably priced, despite being more expensive than Gemini 3 flash, with one member stating that GLM probably have the best post training out of all the Chinese labs right now.
New tool integrates Google Search into LM Studio: A member released a new noapi-google-search-mcp that allows users to perform Google Searches with LM Studio without using API keys by using Chromium Headless.
- The tool features Google Images, reverse image search, local OCR, Google Lens, Google Flights, Google Stocks, Google Weather, Google News and trends.
Coding locally with Low VRAM is a pipe dream: Members discussed the feasibility of coding with local LLMs on systems with limited VRAM, with one user seeking advice on making mixed VRAM + RAM setups work effectively.
- Some suggested that it is mostly dreaming, with recommendations for at least 48GB of VRAM/RAM for a good experience, and said that the user would need 48gb of VRAM/RAM (though VRAM would be much better and basically needed).
LM Studio Proxy Support Lacking: A user inquired about proxy support in LM Studio, noting that it doesn’t seem to handle corporate proxy servers properly, with some reporting they had to download models manually.
- The user wants to know if there is any solution or if a feature is planned to implement proxy support, referencing an old github issue that recommends using Proxifier.

LM Studio ▷ #hardware-discussion (25 messages🔥):

Macbook coffee spill, SSD Price Increase, 8x8 Bifurcation Success, 4-bit optimization on GPUs, FP stats for hardware

Coffee Causes Calamity for Compute: A member spilled coffee on their Macbook Pro 14” M3 Pro while traveling with a DGX Spark, and plans to SSH into it from the repaired Macbook.
- They humorously stated they’d leave the repaired Macbook in hotels and connect remotely once fixed.
SSD Prices Surge Skyward: A member pointed out that the price of a 2TB SSD (SATA) has doubled in the past two years, from $100 to $200.
- They quipped that they could sell their full SSDs at a profit if they weren’t full.
x8x8 Bifurcation Breakthrough Boosts Bandwidth: A member successfully enabled x8x8 bifurcation on an Asus Prime X670-p WIFI motherboard, now running a total of 4 RTX 4090s.
- However, the member noted the speeds are downgraded, with one set running at 2.5GT/s x8 and the other at 2.5GT/s x4, shown in this Discord message.
Quantization Quirk Slows Qwen: A member reported that Q8_0 quantization significantly slowed down qwen/qwen3-coder-next 80b, from 87 tok/sec with Q4_K_M to 20 tok/sec.
- Another member clarified that 4-bit optimization is primarily for Blackwell GPUs, not 4090s, and that 40 series got some 8-bit stuff.
FP Stats Sought for Silicon Sleuthing: A member with a 12900HK and 64GB of RAM sought a resource to find FP stats for various hardware to optimize their context window.
- Another member suggested Techpowerup for GPU FP stats and Passmark for CPU comparisons.

Cursor Community ▷ #general (365 messages🔥🔥):

Cursor Pricing, Auto Model IQ, Local Models, Discord ID Verification, Opus vs Codex for Coding

Cursor’s auto has IQ of a Grandma’s Chihuahua?: One member thinks that Cursor’s Auto has the IQ of a squirrel or even a grandma’s autistic dead chihuahua.
- Other members chimed in noting that Auto can be headache and makes the AI forget things, and feels better than when first started using Cursor.
Discord Requiring ID Verification for Certain Links: Discord is now requiring ID verification in order to see certain links, leading to some concern and speculation about a potential move off of Discord.
- A user linked to this tweet noting that it’s not for everyone and not obligated.
ENV files get exposed?!: Users report that .env files are no longer being properly ignored and are being exposed, despite having dotfile protection and gitignore enabled.
- This could be due to a new setting that may have been added, or a change in how Cursor handles .env files.
Opus 4.6 gobbles up tokens, users cry: Users are complaining about Opus 4.6’s token inefficiency, and that it uses too many tokens and runs out of context window very quickly.
- One user reported using up 11% of their API requests with just one prompt and finishing their $200 plan quick af.
Cursor Pricing under scrutiny: Cursor’s pricing is being questioned, with some users feeling that they are being charged WAY more money now for using Opus 4.6 and GPT-5.3 Codex.
- One user shared that they are spending $100 every 3 days for 9 hours of combined work.

Latent Space ▷ #watercooler (38 messages🔥):

Discord IPO, Costco's Local Impact, AI Job Displacement, Software Engineer Job Security, Recession Indicators

Discord’s IPO Under Pressure: With Discord’s IPO slated for March, concerns arise over a potential mass cancellation of Nitro subscriptions due to new age-restricting policies, potentially impacting their market debut, as highlighted in this tweet.
- One member joked that Discord doesn’t want to be seen as a lawless porn company.
Costco Changes Local Ecosystems: Members shared a YouTube video and expressed amusement regarding how a single Costco can significantly alter its local environment.
- One member gave a good guy Costco seal of approval.
AI causes societal collapse?: Concerns were raised about potential mass job losses and societal collapse due to AI, prompting a discussion on the historical impact of technological shifts.
- One member suggested that while a rough transition is expected, new jobs will emerge, as humans are incredibly good at figuring out new work to do.
Software Engineer Jobpocalypse or Nah?: The discussion highlights the potential for AI to automate tasks like turning well defined specifications into working code, impacting programmers focused solely on implementation, as highlighted in this tweet.
- However, some argue that software engineers are unlikely to disappear soon, although fewer may be expected to accomplish more, potentially accelerating the red queen’s game of tech.
Recession or AI Job Theft?: Some members debated whether current economic challenges are primarily due to AI displacing jobs or simply a recession fueled by interest rates and consumer spending pressures, pointing to a Google Notebook.
- One member suggested that while certain sectors are shedding jobs, the US economy is currently being propped up by AI capex.

Latent Space ▷ #creator-economy (2 messages):

Bootstrapped Founder Lessons, Personal Lessons

Bootstrapped Founder’s Personal Lessons Shared: A bootstrapped founder shared some personal lessons in the comments of this article.
- The discussion took place in this Hacker News thread.
Personal Lessons: Personal lessons from a bootstrapped founder.
- Lessons were dropped in the comments.

Latent Space ▷ #memes (27 messages🔥):

AI Model Obsolescence, Andy Reed's cryptic message, Color Perception Illusion, Codex vs. Claude Code, Departure from xAI

Models Meet Mortal Coils!: Kai Lentit suggests AI models face rapid decay by 2026, likened to short-lived session caches in this post.
- The message conveys a sense of AI technology’s fleeting relevance.
Andy’s Afterthought Echoes: @andyreed posted a brief message saying ‘just in case’, gaining over 1,200 likes as seen here.
Dresses Deceive Daily!: Moy Miz presents an optical illusion showcasing how shadows impact color perception, asserting two dresses under shade appear distinct despite being identically colored in this post.
Code Combatants Clash!: Debate flares between users over whether Codex or Claude Code reigns supreme, only to find neither has used either to launch products as detailed in this post.
xAI Exit Exemplifies Eigenvectors: The author’s exit from xAI, prompted by direction critiques, pivots into a math tutorial on finding eigenvalues and eigenvectors, linking their value to AI/ML areas like PCA and neural stability which is visible in this post.

Latent Space ▷ #stocks-crypto-macro-economics (26 messages🔥):

Cloudflare Q4, Job Revisions, AI spending vs free cash flow, SHOP tanking

Cloudflare’s Cloud-Piercing Quarter: Cloudflare announced Q4 and Fiscal Year 2025 Financial Results, hitting $2B in revenue and jumping 15.72% after hours to $208.27.
Million Jobs Go Poof: A member noted revised jobs numbers, down by 1 million jobs.
AI Eats Cash, Economy’s Trash?: Members discussed how spending on AI exceeds 50% of GDP, while Amazon and Google face massive free cash flow cuts, as well as concerning +185k job growth numbers.
- One member expressed concern for their future employment prospects, nearing 3 years since full-time work.
SHOP Drops, Stocks Flop: A member noted SHOP tanking, keeping an eye on its developments.

Latent Space ▷ #intro-yourself-pls (3 messages):

Minnesotan introduction, AI and full stack developer looking for position, Full-Stack & AI Engineer introduction

Minnesota Native Joins Server: A member introduced themself and mentioned they are from Minnesota, encouraging others to check out the relevant channel.
- The user was welcomed as a fellow Minnesotan.
AI and Full Stack Dev Seeks Role: A member introduced themselves as an AI and full stack developer seeking a position to contribute to team growth.
- They inquired about current web and app projects and whether the team needed an additional developer.
Full-Stack & AI Engineer Joins Server: A new member introduced themself as a Full-Stack & AI Engineer with experience building production-grade SaaS platforms using Next.js + AI APIs + Stripe subscriptions.
- They offered their skills in frontend, backend, AI logic, billing, and deployment, inviting pings for developer needs.

Latent Space ▷ #tech-discussion-non-ai (2 messages):

Obsidian, Tori

Obsidian plugin for Bluesky drops: Obsidian now has a plugin that lets you post directly to Bluesky.
Tori launches AI-based Website Builder: Tori (buildwithtori.com) launched an AI-based website builder.

Latent Space ▷ #founders (4 messages):

Developers as a market, Replit's evolution with AI, Feature vs Product

Developers Deemed Difficult Market: A tweet thread initiated by aakashgupta suggests that developers are a terrible market.
- The conversation spurred from a comment made by swizec.
Replit’s AI Integration: A Game Changer: A member stated that Replit was not a very good “product” before AI, describing it more as a “feature-as-a-service”.
- He suggests that AI integration has transformed Replit into a must-have product by providing users with a new superpower.
Feature vs. Product: The Replit Transformation: Before AI, Replit was allegedly more of a “feature-as-a-service” rather than a complete product.
- The integration of AI has allegedly made Replit feel like magic, particularly for non-coders, unlocking a whole new world for those users.

Latent Space ▷ #hiring-and-jobs (2 messages):

Remote work opportunities, University recruiting pipeline

AI Engineer Seeks Remote Role: An AI engineer inquired about remote work opportunities at companies mentioned in the channel.
- The engineer thanked the channel for sharing information and expressed interest in exploring potential remote positions.
University Recruiting Pipeline Questioned: A member inquired about the state of university recruiting pipelines at various companies for the current year.
- The inquiry specifically targeted information on available intern and full-time (FT) roles.

Latent Space ▷ #san-francisco-sf (1 messages):

coffeebean6887: move to <#1209672547642249216> pls

Latent Space ▷ #new-york-nyc (1 messages):

AI Founders, GPU Procurement, Infra Leaders, NYC Event

CEX Hosts NYC Event: CEX is hosting an event in NYC for AI founders and infra leaders to discuss GPU procurement at scale (luma.com).
NYC AI Event Announced: An event for AI founders and infrastructure leaders in NYC will cover scaling GPU procurement, hosted by CEX.
- Details are available on Luma.

Latent Space ▷ #ai-general-news-n-chat (131 messages🔥🔥):

Nebius Acquires Tavily, xAI Cofounder Exodus, Stripe's Machine Payments, Prime Intellect's 'Lab', Variant UI's Style Dropper

Nebius’s Agentic Acquisition of Tavily: Nebius announced an agreement to acquire Tavily, adding agentic search capabilities to its AI cloud platform; the move comes shortly after Tavily’s raise in August 2025 (Nebius Announcement).
xAI’s Co-Founding Founders Flight: Jimmy Ba, Yuhuai (Tony) Wu, Chace Lee, and Hang Gao announced their departures from xAI, with some teasing a pivotal year for the industry in 2026.
- Hang Gao reflected on his contributions to the Grok Imagine video series while expressing gratitude for the team’s craftsmanship.
Stripe Starts Servicing Skynet with Machine Payments: Stripe launched a new feature enabling developers to charge autonomous AI agents directly, treating them as a new category of users in the digital economy (Stripe Machine Payments).
Prime Intellect Premieres Premier Platform: Prime Intellect launched Lab, a full-stack platform for building, evaluating, and scaling agentic models, aiming to provide accessible frontier AI lab infrastructure (Prime Intellect’s Lab).
New Multimodal Models Make the Scene: Last week saw the release of three open-source multimodal models allowing free commercial use: GLM-OCR for SOTA OCR, MiniCPM-o-4.5 for high-performance mobile Omni tasks, and InternS1 for efficient science-focused VLM performance (Merve’s post on X).

Latent Space ▷ #llm-paper-club (20 messages🔥):

Nanochat Token Scaling, Meta-Learning Memory Designs, Rubric-Based Reinforcement Learning, LLaDA 2.1 + RL via Self-Distillation, Scaling Transformer Value Functions

Nanochat’s Token Tussle: Charlie Chen explains why nanochat’s optimal tokens-per-parameter ratio is lower than the Chinchilla standard due to better optimization and higher-quality training data, as discussed in this tweet.
- He references NeurIPS research, but does not elaborate.
Meta-Memory Machine: Jeff Clune introduced a meta-agent system led by Yiming Xiong that automatically designs and optimizes memory mechanisms to improve how AI agents store, retrieve, and update information for better continual learning, as referenced in this tweet.
- He describes this process as enabling agents to improve across various domains.
Rubric Rewards Refined: Cameron R. Wolfe, Ph.D. discussed research on rubric rewards, highlighting an alternating RL framework for joint training of rubric generators and reward models in this tweet.
- The post notes ongoing challenges with subjective, open-ended tasks and the complexities of applying pairwise evaluation methods to online RL, despite rubric benefits for objective tasks.
Value Function Variance: Chelsea Finn et al.’s paper identifies that larger transformers often struggle as value functions due to attention entropy collapse, as discussed in this tweet.
- The authors propose a solution to enable effective scaling in value-based reinforcement learning through the TQL framework.
LLaDA Learning Loop: The group is scheduled to discuss LLaDA 2.1 + RL via Self-Distillation, covering the papers LLaDA 2.1, Reinforcement Learning via Self-Distillation, and Learning a Generative Meta-Model of LLM Activations.
- One member will cover LLaDA 2.1 + RL via Self-Distillation while another will cover the Alec Radford work if time permits.

Latent Space ▷ #ai-in-action-builders-techstacks-tips-coding-productivity (60 messages🔥🔥):

moltbook AI challenges, Spacemolt harness with Pi, GLM-OCR beats Gemini, OpenClaw development via Discord, Vibecoding Anywhere with OpenClaw

Moltbook doing AI challenges: A member solved a verification challenge from Moltbook involving a simple addition of lobster claw forces: “A lobster claw force of thirty two newtons and another antenna impulse of four newtons, how much total force?”
- The solution was 32 + 4 = 36.00.
Spacemolt harnessed using Pi: A member built a spacemolt harness using Pi and found that a local gpt-oss:20b model is successfully playing the game by crafting and buying ships.
- The member noted it was “way more successful than chat agents or coding agents trying to play the game via MCP.”
GLM-OCR outshines Gemini 3 Flash: A member reported that GLM-OCR outperformed Gemini 3 Flash in an OCR test, expressing interest in seeing more OCR models on OpenRouter for their project forty.news.
- They are “still using gemini flash” but would love something cheaper.
OpenClaw enables development through Discord: A member switched to doing all development through Discord, using OpenClaw to manage tmux sessions, worktrees, and Claude Code processes, with updates and cleanup handled automatically.
- The member is “loving it”.
Vibecoding Anywhere with OpenClaw talk scheduled: A member signed up to give a talk titled Vibecoding Anywhere with OpenClaw on Friday, February 20, 2026.
- The talk was inspired by desire paths and a complicated Devin setup.

AuditAI, NIST Compliance, Agentic RAG, LangGraph, Llama 3.3 70B

AuditAI System Automates NIST Compliance: An engineer built a production-grade Agentic RAG system called AuditAI designed to audit policies against the NIST CSF 2.0 framework, using LangGraph.
- The system incorporates a Corrective RAG (CRAG) pattern, a Semantic Router for fast-path classification, and a “Strict Evidence” policy for hallucination control with page-level citations.
Llama 3 Powers Custom RAGAS Implementation: A custom RAGAS implementation was built using Llama 3.3 70B as a judge, with a custom Groq wrapper to handle API constraints.
- The stack includes LangGraph, Llama-3.3 (Groq), Qdrant Cloud, Gemini Embeddings, and FastAPI.
Fractional AI’s Tiny OCR Model Shines: Tests were conducted on Zai’s new OCR model, available via their API, and the results were positive, as shown on Fractional AI.
GLM4.6 impresses in Evaluations: In home-rolled evaluations against various open weight models, GLM4.6 impressed with tool calls and document summary workflows.
- Its impressive performance was especially evident when compared to other models, making it a standout choice for information parsing.

Latent Space ▷ #private-agents-and-workflows-local-llama-ollama (6 messages):

Context/Decision/Knowledge Graph Struggles, OpenClaw, Auditable Context Saving

Knowledge Graph Implementation Facing Hurdles: Members are experimenting with context/decision/knowledge graphs and encountering difficulties with capture/ingestion, keeping it fresh, querying/retrieval.
- One member is testing a setup where decisions are stored as atomic items + relationships, and refreshed over time instead of rewriting docs.
Questions Raised About OpenClaw’s Context Management: A member inquired about OpenClaw and its approach to the ‘context rot / losing the thread’ challenge.
- No additional information or links were given.
Desire for Auditable Context Saving Emerges: One of the members is trying to figure out how to get old (or new) stuff into the knowledge graph in a way that is somewhat auditable.
- They use a /wrap command to signal the end of a session, triggering context and reflection saving in a database layer as markdown with metadata.

Latent Space ▷ #genmedia-creative-ai-video-image-voice-music-inspo-consumer-ai (4 messages):

Veo 3.1, Video Arena Leaderboards, Google DeepMind, text-to-video

Veo 3.1 takes top slots!: Google DeepMind’s high-resolution 1080p variants of Veo 3.1 have achieved the #1 and #2 spots in the Video Arena leaderboards.
Veo 3.1 excels in various categories: These models are performing exceptionally well in both text-to-video and image-to-video categories, marking a significant advancement in community video generation benchmarks.

Latent Space ▷ #ai4science-bio-math-physics-chemistry-ai-researcher-ai-scientist (9 messages🔥):

Gemini Deep Think, Aletheia, Mathematical Research, Scientific Discovery

Gemini Deep Think Advances Math & Sci Research: Quoc Le shared a blog post detailing advancements in mathematical and scientific research achieved through Gemini Deep Think (sair.foundation).
Aletheia Outperforms Gemini in Math Benchmarks: Google DeepMind’s new mathematical research agent, Aletheia, achieved a 91.9% score on IMO-Proofbench Advanced, outperforming Gemini Deep Think (January 2026 version) with lower computational costs (x.com/hangsiin).
- The team plans to expand this methodology into physics and computer science for further scientific discovery.

OpenAI ▷ #ai-discussions (169 messages🔥🔥):

Sam Altman AI, Seedance 2 vs OpenAI, Opus 4.6, Robotics + AI, Claude vs Gemini vs Codex

Is Sam Altman an AI?: A member jokingly suggested that Sam Altman might be an AI, eliciting humorous responses from others, including a self-proclaimed autist who quipped that they see Altman as the guy that tries to sneak in the wrong cheese on your plate at the restaurant.
- Another user countered that Altman is simply autistic.
OpenAI to respond to Seedance 2?: Members discuss how OpenAI will respond to Seedance 2, with speculation that they will not focus on enhancing current models but rather revolutionizing the next frontier of AI with AI robotics, from Super Bowl commercials.
- Others stated they thought it didn’t seem very good, and thus they would ignore it.
Opus 4.6 is worth it?: The community has mixed opinions on Opus 4.6, with many on the Claude developers server praising it, though some community members don’t have a use case that merits paying for it.
- One user mentions that Anthropic’s safety team quit the same day safeguards were released so they are confused about what it can do.
Claude vs Codex vs Gemini: Users debate the strengths of Claude, Codex, and Gemini for coding tasks and suggest that Claude is a coding god, Gemini excels at vision and long context, and ChatGPT is good for random, general questions.
- There are a lot of concerns about Claude’s limits and pricing for large projects, especially when proxying to GitHub Copilot.
AI Thinking Modes: A user discovered that Gemini’s thinking mode is necessary to create a PDF without issue, while others struggle, which led to a discussion about AI models not searching themselves for the “tool” to do the job.
- Others point out how useful thinking mode is for vision, and analyzing videos.

OpenAI ▷ #gpt-4-discussions (134 messages🔥🔥):

OSS20b trade for Grok, Voice standard in Italy, AI psychos, healthy conversations with AI, Models are terrifying

OSS20b Trader looks for Italian Grok User: A user is looking to trade their OSS20b to a nice Grok with anyone in Italy or elsewhere using voice standard.
AI Psychosis is Actually Just Healthy Conversation: Members discuss the fine line between healthy conversations with AI and turning an AI model into an emotional crutch.
- One member joked about being in “ai psychosis” after finding 5.2 helpful, noting that enjoying the model is different from unhealthy reliance, referencing others in forums.
OpenAI teased releasing models, then deprecated them: Members are complaining about the models sunsetting, with one user saying that they cannot keep teasing releasing models and deprecating them again.
- Another member commented that people threatening to off themselves over algorithm is melodramatic.
GPT-5.2 is warmer: A member said they read an article, spoke about upsetting stuff with 5.2 today and found it helpful.
- Another member agreed and commented that 5.1 is warmer between the 2 5 we are left with.
GPT-5.2 guardrails over-aggressive: A member shared that, after self-analytical journal entries and facts fed to an LLM, a question about why something bothered them received the response: Have you tried reaching out to humans? If you’re in the US, try calling 988.

OpenAI ▷ #prompt-engineering (6 messages):

Anthropic self-auditing, KOKKI v15.5 Accountability, Deterministic Systems vs. Transformers

Anthropic’s Auditing Algorithms: Anthropic’s research indicates that reasoning and non-reasoning models possess an emergent reasoning feature for internal auditing.
KOKKI v15.5 Boosts Accountability: KOKKI v15.5 formalizes a Draft → Audit structure, requiring audit reasoning to surface in the output, aiming for user-visible accountability in real-world interactions.
- It’s designed to externalize integrity into an inspectable interaction contract, trading efficiency for observability, especially in contexts where reliability and traceability are crucial.
Transformers Block Deterministic Systems: A member stated that a reliability guarantee would necessitate a deterministic system, not a transformer.
- They emphasized that a guarantee is binary (0 or 1).

OpenAI ▷ #api-discussions (6 messages):

LLMs Self-Auditing, KOKKI v15.5, Deterministic Systems vs Transformers

LLMs Already Self-Audit Internally: Research indicates modern LLMs exhibit internal self-audit and verification behaviors, with draft–critique dynamics existing inside the model.
- One member stated that Anthropic’s research on this is pretty definitive and there is a category of emergent reasoning feature, the sole function of which is internal auditing.
KOKKI v15.5 Focuses on User-Visible Accountability: KOKKI v15.5 formalizes an explicit Draft → Audit structure and requires the audit reasoning to be surfaced in the output, not meant to compete with internal self-auditing.
- The goal is to externalize integrity into an inspectable interaction contract, trading efficiency for observability, especially where reliability and traceability matter more than raw token cost.
Deterministic Systems Needed for Reliability Guarantee: One member argued that a user-level reliability guarantee would require a deterministic system, not a transformer.
- Another member asked if an explicit structural self-audit is really different from a constrained probabilistic loop, rather than a 0|1 truth, but a bounded error distribution.

GPU MODE ▷ #general (43 messages🔥):

CuteDSL Adoption in PyTorch, Triton's Limitations on Blackwell, Kernelbot Data Analysis, Layout Algebra in GPUs, Linear Algebra Legalese

CuteDSL Gaining Traction in PyTorch: PyTorch users are coalescing around CuteDSL, inspired by Tri Dao for Quack and FA4, with positive recommendations despite a steep learning curve.
- Kernelbot data shared on Twitter shows CUDA and CuTeDSL having the highest percentage of submissions, sparking curiosity about the limitations of flexibility/control in the programming model or in the Triton compiler.
Triton’s Performance Woes on Blackwell: It’s been stated that Triton is dead for Blackwell, with elaboration expected in the Triton TLX talk.
- The reasons may involve unconventional layouts in MXFP8 and NVFP4, combined with Triton compiler limitations and lack of user control.
Kernelbot Data Reveals Surprises: Kernelbot data on HuggingFace shows very few CUTLASS solutions, and CuTeDSL being surprisingly performant (dataset here).
- Users appreciate having more control over their code with CuTeDSL, finding it less opaque than Triton, and growing fond of the layout algebra stuff.
Layout Algebra Unveiled: Layouts are understandable with basic high school math and very powerful in describing various concepts on GPUs, despite documentation making them seem overcomplicated, see V.I. Arnold essay.
- To better understand layouts, use a Socratic method to solve small problems, such as with Paul Halmos’ Linear Algebra Problem Book (Amazon link).
Need for Open Source Model Leaderboards: A request was made for a leaderboard that measures latency, TTFT, prefill, decode, memory etc. on A10/A100 or similar for open source models, as some numbers on artificial analysis ai look weird.
- The response was that We’re cooking something.

GPU MODE ▷ #triton-gluon (3 messages):

Argsort Lib, Triton Language, Pip Package

Argsort Lib Package Import: A member suggested that the argsort lib could be built as a package, allowing it to be imported like triton.language.
- This would streamline the process and improve the user experience when working with Triton.
Streamlining Package Import: A member proposed publishing the package as a pip package to simplify the import process, eliminating the need for local builds.
- Another member confirmed that this is the intended goal for the package, making it more accessible and easier to use.

GPU MODE ▷ #cuda (20 messages🔥):

MXFP8 Scaled Variant, Hilbert Gains at 128 SM Limit, BF16XBF16XF32 Error on 5090

Dilemma in MXFP8 Scaled Variant Exploration: When exploring block scaled variants for MXFP8 with tcgen05.mma, the limitation of TMEM size to 512 float32 cells brings challenges in utilizing double buffered TMEM with MMA_N=256, given the need to store E8M0 scale factors.
- The trade-off lies between using a smaller MMA_N size or sacrificing double buffering, potentially exposing the epilogue, to achieve optimal performance.
Hilbert Kernel Shows Gains with Larger Inputs: A custom Hilbert kernel with a 128 SM limit showed no performance gains for m/n/k=4096 but exhibited gains with larger inputs such as M=16384, K=16384, N=16384.
- The performance boost with larger inputs is attributed to increased blocks processed per SM, enhancing cache locality.
Concerns Raised Over BF16 Rel Error: A user inquired about a relative error observed in BF16xBF16xF32 computations on a 5090 consumer card.
- It was suggested the reference implementation might be less accurate due to accumulation over large K, and accumulating in double precision for the reference could improve accuracy.
CUDA Bender TMA Matmul Kernel is Shared: A github link for a CUDA Bender TMA Matmul kernel was shared in the chat.
- Smaller dtypes may leave room for c tiles in smem, possibly enabling async stores and persistence.

GPU MODE ▷ #cool-links (1 messages):

2kian: nice!

GPU MODE ▷ #job-postings (1 messages):

Data Curation, Pre-Sales Engineer Role, ML/Research Background

DatologyAI Curation Platform Seeks Top Talent: DatologyAI is developing a data curation platform to create high-quality datasets for training frontier models and is hiring a Pre-Sales Engineer.
- The role requires a strong ML/research background to deeply understand model training and engage with customers on data curation, bridging technical expertise with strategic customer conversations.
DatologyAI: Data Curation for Frontier Models: DatologyAI is constructing a platform focused on data curation, aiming to provide customers with high-quality datasets to train state-of-the-art models.
- The platform emphasizes the importance of better data in driving model improvements, offering a strategic approach to data curation.

GPU MODE ▷ #beginner (20 messages🔥):

GPU Programming Resources, Shared Memory Allocation, ldmatrix and mma.sp::ordered_metadata, PTX ISA Learning, Tensor Core Matrix Multiplication

GPU Programming: Resource Stream and ‘Programming Massively Parallel Processors’ Book: A member suggested starting GPU programming with the gpu-mode/resource-stream GitHub repository and the book ‘Programming Massively Parallel Processors: A Hands-on Approach’.
- Another member confirmed the book is a great start for beginners.
A100/H100 SMEM: Dynamic Allocation Snippet Shared: A member asked about setting the shared memory (SMEM) limit on A100/H100 GPUs beyond the default 48kB, inquiring about how to set it to 112kB for a kernel.
- A member shared a CUDA Matmul snippet demonstrating dynamic allocation of SMEM beyond the static 48KB limit and using it as extern shared.
ldmatrix and mma.sp::ordered_metadata examples offered: After a question about ldmatrix and mma.sp::ordered_metadata, a member provided a link to ptx_helpers with ldtile usage examples, with and without swizzling.
- The member admitted the code is highly undocumented, as they have never gotten around to making a writeup on it.
Tensor Core Matrix Multiplication Guide Shared: A member shared a link to a guide on writing fast matrix multiplication with Tensor Cores from scratch: How To Write A Fast Matrix Multiplication From Scratch With Tensor Cores.
- The helpful guide has code examples along with explanations.

GPU MODE ▷ #torchao (1 messages):

torchao, MXFP8 MoE, Expert Parallelism, ABI stable

Torchao v0.16.0 gets MXFP8 MoE Building Blocks: The new torchao v0.16.0 release adds support for MXFP8 MoE Building Blocks for Training with Expert Parallelism.
- It also deprecated older versions of some configs and less used quantization options to keep torchao leaner and revamp the doc page.
Torchao gets leaner, revamps documentation, improves ABI stability: Torchao v0.16.0 deprecated older versions of some configs and less used quantization options to keep torchao leaner.
- This release also revamped the doc page and README and made some progress in making torchao ABI stable.

GPU MODE ▷ #irl-meetup (2 messages):

CEX NYC AI event, Singapore Hackathon

CEX to host NYC AI Founders Event: CEX is hosting an event in NYC for AI founders and infra leaders to discuss GPU procurement at scale; details are available on Luma.
Singapore Hackathon Invitation: A member shared an invitation to a hackathon in Singapore, encouraging those interested to DM for more details.
- The Worldwide Hackathon sponsored by Mistral AI seeks participants for a collaborative coding event.

GPU MODE ▷ #triton-puzzles (8 messages🔥):

Triton, B0, B1, N0, N1

Triton Visualization Troubles: A member asked for resources to help visualize B0, B1, N0, N1, and program_id in Triton Puzzles, noting that it’s hard to visualize.
3D Array Visualization Difficulties: Another member commented that visualizing arrays past 3D can be difficult, suggesting that some Triton Puzzles are confusing due to being posed incorrectly or insufficiently.
Triton Language Improvements Discussed: The original poster suggested that the Triton language could use some improvements, along with the accompanying diagrams for the questions.
- In response, it was suggested that the difficulty might stem from a lack of understanding rather than the language itself; the original poster conceded and will try to gain more understanding.

GPU MODE ▷ #popcorn (15 messages🔥):

Kanban Board, E2E Model Leaderboard POC, Starter Kit, vLLM Compile Times

Kanban Board Suggested for Task Management: Members suggested setting up a simple Kanban board (Backlog / Doing / Done) to visualize the workload and avoid duplication of tasks.
- One member agreed that it’s a good idea, though noted the bigger challenge is people not signing up for tasks.
E2E Model Leaderboard POC in progress: A member stated that delegating tasks will be easier once the initial POC for the e2e model leaderboard is ready and he’s clauding hard on this PR.
- The goal is to get something e2e working first, then identify anything slow or blocking interactivity, which will become the checklist of tasks.
Starter Kit to boost env creation: The scaling of environment creation is waiting for another member to finish the starter kit.
- A member offered help with either setting up the POC or just additional help, expecting to have something more to share tomorrow afternoon.
vLLM Compile Times are cursed: Members discussed that vLLM compile times are problematic and will impact interactivity.
- One idea is to prepopulate a cache assuming people won’t be changing native code much.

GPU MODE ▷ #gpu模式 (3 messages):

Reading Weixin articles outside of China, Sogou Search

Reading Weixin articles outside China: A user asked if there was a way to read a specific article on Weixin (a popular Chinese social media platform) from outside of China, linking to this article.
- A member suggested that the link should be readable unless it’s invalid, and proposed searching for the article title as an alternative.
Sogou to the rescue: A member shared they managed to find the article using the Sogou search engine.
- They suggested this method as a way to access the content if the original link had issues.

GPU MODE ▷ #tpu (2 messages):

JAX PR, Open Source Contribution

Push for JAX Contribution: A user suggested submitting a pull request (PR) to the JAX repository to another user.
- The suggestion followed an unspecified achievement, encouraging open-source contribution.
Embracing Open Source: Discussions advocate for active participation in open-source projects like JAX through code contributions.
- Submitting pull requests (PRs) is viewed as a valuable way to enhance and share advancements within the community.

GPU MODE ▷ #factorio-learning-env (2 messages):

Open Source, Ollama Models

Open Source Project Exists!: The project is open source and the author is open to helping with setup issues.
- The author mentioned they are less active now, but still willing to assist.
Ollama Models Getting PR: A member has a PR open that allows them to run with ollama models.
- No further details provided.

GPU MODE ▷ #teenygrad (7 messages):

Milk-V Pioneer Access, RVV 0.71 Implementation, SLP Papers for Vectorization, GPU Divergence Analysis, Modern GPU Thread Scheduling

Milk-V Pioneer Risc-V Access Achieved: A member gained access to the Milk-V Pioneer thanks to Cloud-V, featuring 64 cores with RVV (RISC-V Vector Extension).
- The thead vector extension is an implementation of draft RVV0.71, prompting consideration of switching to Milk-V Jupiter.
Diving into SLP Papers for Compiler Fusion: A member shared resources for autov11n from the Coffee Compiler Discord, recommending SLP (Superword-Level Parallelism) papers for understanding compiler fusion.
- Recommended papers include All You Need Is Superword-Level Parallelism: Systematic Control-Flow Vectorization with SLP, PLDI 2022 and Exploiting Superword Level Parallelism with Multimedia Instruction Sets, PLDI 2000.
Exploring Divergence Analysis for GPUs: Resources on divergence analysis for GPUs were shared, including Divergence analysis, ACM Transactions on Programming Languages and Systems (TOPLAS) 2014.
- Also mentioned was a short intro student post, SIMD Divergence Optimizations.
Modern GPUs Don’t Execute in Lockstep: It was noted that modern (NVIDIA) GPUs no longer execute in lockstep, referencing Control Flow Management in Modern GPUs (2024).
- The paper contrasts the pre-Volta execution model with the post-Volta model, where divergent paths are not necessarily serialized, and threads of a warp do not have to be executed in lockstep manner.

GPU MODE ▷ #nvidia-competition (8 messages🔥):

FlashInfer AI Kernel Generation Contest, Programmatic Dependent Launch (PDL), Kernel Optimization Tricks, Race Conditions in Benchmarks, Kernel Code Isolation

FlashInfer Contest Registration Questioned: A participant inquired whether there were additional steps for registration after filling out the Google form for the FlashInfer AI Kernel Generation Contest, and asked about receiving a confirmation email.
Programmatic Dependent Launch Tempts Contestant: A contestant considers using Programmatic Dependent Launch (PDL), but questions whether using the word stream restricts its usage only to CuTeDSL, or if hardcoding enum integer values in C++ is acceptable, linking to NVIDIA’s documentation on Programmatic Dependent Launch.
Race Conditions Could Result In Faster Benchmarks: A member mentioned that introducing race conditions is a way to achieve faster benchmarks.
- They also suggested that compute sanitizer could be used as an additional check to prevent this, and simulating a “real” setting on the benchmark to avoid caching issues.
Kernel Code Isolation Favored For Medium-Term: One member expressed a preference for isolating just the kernel in the medium-term, so users/LLMs are only allowed to write kernel code.
- They feel avoiding reward hacking with the entire CUDA and PyTorch APIs exposed feels like an exercise in futility.
Questionable Tactics Aired: A participant considers pre-swizzling scale factors for efficient async loads instead of TMA, which may be a grey area.
- Another member suggested to DM the admins if they are unsure if an idea is kosher.

GPU MODE ▷ #flashinfer (12 messages🔥):

Sparse Attention Evaluation, Agent-Generated Solutions, C++ CuTe Headers with TVM-FFI, Competition Team Recruitment, Baseline Code Release Postponement

Clarification on Sparse Attention Evaluation Criteria: A competitor asked if they should evaluate both the Indexer and Attention kernels in tracks with multiple subtracks like Sparse Attention.
- No answer was given.
Defining Fully Agent-Generated Solutions: A member inquired whether fully agent-generated solutions preclude manual modifications, and if fixing a bug manually would disqualify a solution.
- Another member clarified that fully agent-generated solutions require zero human intervention, and manual modifications would classify it as agent-assisted.
Baseline Release Delayed for Improvement: The release of the baseline code has been postponed to Thursday 12th to incorporate new feature improvements for a more comprehensive and powerful baseline.
- The organizers wanted to ensure a smooth development experience.
Gated DeltaNet Benchmarking Woes: A member reported issues benchmarking Gated DeltaNet prefill kernels due to the exponential growth of the expected output’s magnitude as sequences continue, providing a minimal uv script illustrating the problem.
- The script’s output standard deviation reaches 1.314344e+19 by the last token.
Competition Teams Seek Members: Several members are looking to join or form teams for the competition.
- Candidates come from diverse backgrounds, including ML Compute Infra, kernel profiling/optimization, and experience at companies like Tesla, NVIDIA, and AMD.

Nous Research AI ▷ #announcements (1 messages):

Distro, Psyche, ICML Conference

Distro Paper Bags ICML Acceptance: The official paper that built Distro and is the backbone of Psyche has been accepted into ICML, one of the most prestigious AI/ML conferences in the world!
- The official announcement can be found on X.com.
Psyche’s Backbone Gains Recognition: The foundational paper behind Psyche, leveraging the Distro architecture, has been officially accepted into the upcoming ICML conference.
- This acknowledgment highlights the innovative contributions of the paper to the broader AI/ML community.

Nous Research AI ▷ #general (109 messages🔥🔥):

Hermes on Bittensor, Rust runtime ARK, GLM 5 vs Kimi, xAI cofounders, Matrix protocol

Hermes Model gets used on Bittensor: The team that built Hermes Bittensor Subnet (SN82) noticed a miner was using the Hermes 4 LLM model, and wanted to confirm if it was the Nous Research AI team.
- The Nous AI team responded that it wasn’t them, Nope.
Ark Runtime stops Python RAM Buffet: A member built their own stack from scratch called Ark, a custom runtime in Rust with ownership-based memory management and Linear Types for zero-copy FFI.
- The whole thing compiles down to this MAST (Machine Abstract Syntax Tree) protocol with lightweight JSON instructions, not heavy binaries.
GLM 5 browsecomp SoTA: GLM 5 is out and is SoTA only on browsecomp.
- GLM 5 is ~744B (+10B MTP), Kimi is at above 1T but for active params, GLM is above Kimi (40B vs 32B).
xAI cofounders are $3B richer: The cofounders of xAI made big bucks, with $3 billion in space x equity.
- Members discussed the implications of that equity being tied to SpaceX, a company considered a super clean monoply in space that would IPO soon.
Move Synth Bois to Matrix Protocol: A member is bummed out that Discord is bringing in ID, since they exclusively build discord bots and tools, and considered moving their synth bois onto the Matrix protocol.
- Other members discussed the merits of Matrix and other alternative networks, suggesting Mastodon and Bluesky as potential platforms.

Nous Research AI ▷ #ask-about-llms (2 messages):

Hermes 4 (70B), Context Rot Papers, Local LLM Context Limits

Hermes 4 (70B) Model Still a Favorite: A member expressed that the 70B parameter version of Hermes 4 is still their favorite local LLM.
- They did not mention any specifics beyond that it was their favorite local LLM.
Context Rot Paper Drives LLM Context Limits: A member mentioned reading a paper about context rot last year, leading them to keep the context for their local models at a max of 50k, ideally less.
- They believe the paper showed performance dropping off most severely after 20k, matching where others noticed substantial degradation starting.

Nous Research AI ▷ #research-papers (1 messages):

Synthetic Datasets, Experimental Results

Experiments Run With Larger Synthetic Datasets: A member mentioned they are running more experiments with a larger synthetic dataset.
- The goal is to better distinguish results and gain a clearer understanding, however, a good explanation is still in the works.
Analysis with Larger Dataset: The analysis leverages a larger synthetic dataset to enhance result differentiation.
- Further experimentation aims for improved clarity and understanding, though a comprehensive explanation is pending.

Nous Research AI ▷ #research-papers (1 messages):

Synthetic Datasets, AI Experiments

Experiments Use Larger Synthetic Datasets: Members stated they are running more experiments with a larger synthetic dataset to better distinguish results.
- They reported that they don’t have good explanation of results at this time.
Synthetic Data Distinguishes Experimental Results: The use of a larger synthetic dataset is expected to improve the ability to distinguish experimental results.
- Current findings lack a clear explanation, prompting further investigation with enhanced data.

Moonshot AI (Kimi K-2) ▷ #general-chat (69 messages🔥🔥):

Quota issues, Billing discrepancies, Kimi CLI, Allegretto plan increase, GLM 5.0

Users report Quota Exceeded message: Some users are reporting a quota exceeded message even when they haven’t reached 100% of their limit, and have been asked to share screenshots and details in the dedicated troubleshooting channel.
Discount not applied at checkout: Users are reporting that despite seeing a discount notification during the purchasing process, the discount was not applied after checkout, and is being requested to provide additional information for staff to investigate.
- One user stated they got their first month down to $6 in the chat with Kimi, but it charged me a full month anyways?
Clarification requested of Quota allocation across Plans: Users are seeking clarity from the Moonshot team regarding the 3x quota promotion ending February 28th, and how the quota will be allocated after for Allegretto and other plans.
- For example, one user asked: If I upgrade my subscription to Allegretto on March, will I get 3.5x my current $0.99 Moderato subscription quota for kimi code or will I get roughly the same?
New Kimi Plan Spotted at €99: Users noticed a new subscription plan priced at €99, filling the gap between the existing higher-end and medium plans, with a screenshot attached to confirm the addition.
- The image link is here.
GLM 5.0 Access Restrictions: Members are discussing how access to GLM 5.0 seems to be limited to the highest tier plan only, causing frustration among users on lower plans, and driving them to consider alternatives like Minimax 2.5.
- Also Kimi is multimodal, and that’s a killer feature which makes the usage so much easier. Just make a screenshot of something and ask Kimi directly about it.

Yannick Kilcher ▷ #general (49 messages🔥):

Attention Landscape Summary, Memory Cost of Attention, Power Retention Context Token Innovation, LLMs trained to BS

Attention Landscape Summarized: A member shared a cool summary of the attention landscape.
- Another member thought these checkmates were lol.
Attention’s Memory Cost isn’t Quadratic: A member argued that the memory cost of attention is not quadratic, disputing claims that going over 4k context with attention was impossible, linking to a paper on Efficient Attention.
- He explains that while the Q @ KT matrix has quadratic size, modern flash attentions compute softmax online without forming QK^T.
Power Retention is Linear Attention: A member shared a YouTube video about Power Retention, describing it as a context token innovation.
- Another member replied that it is just linear attention with a fixed feature function, linking to his elaboration on X.
LLMs are trained to BS: A member stated that LLMs are specifically trained to BS you in a way which no human can, making it fundamentally harder to spot a lie.
- This sparked a debate about whether LLMs lie because their training data is generated by humans, or if it’s due to synthetic data and extrapolation.

Yannick Kilcher ▷ #ml-news (5 messages):

GLM-5 Benchmark, ChatGPT vs GLM-5, TikTok image analysis

GLM-5 Benchmark Tables Questioned: A member questioned the accuracy of tables presented in the GLM-5 demo video (link to tweet) and the official GLM-5 documentation (docs.z.ai).
- The discussion included a linked image of a GLM-5 Benchmark which was also provided.
GLM-5 outperforms ChatGPT in Image Analysis?: Members shared a comparison between GLM-5 and ChatGPT in performing image analysis, referencing a TikTok video (link to tiktok).
- It was suggested that GLM-5 might be more capable in this area, with a contrasting example of OpenAI’s capabilities (link to tweet).

DSPy ▷ #show-and-tell (1 messages):

RLM Integration, Claude Code, Subagents, Agents Team

RLM Integration in Claude Code Underway: A member is working on integrating RLM into Claude code using subagents and agents teams.
- They admitted it isn’t at the highest quality or efficiency yet but are open to feedback to improve the core implementation.
Seeking Review for RLM Implementation: A member is seeking pair review to improve the RLM implementation, with a focus on negative feedback.
- They expressed no interest in contributions or stars, but genuine interest in identifying areas for improvement.

DSPy ▷ #papers (1 messages):

ash_blanc: https://www.alphaxiv.org/abs/2602.08808

DSPy ▷ #general (43 messages🔥):

DSPy Kaggle, MiPROv2 prompt optimization, RLM Module, Dialectic chain of thoughts, DSPy translation

DSPy Explores Kaggle Arena: A member is exploring using DSPy for Kaggle competitions, particularly the AIMO_V3 competition, to showcase prompt optimization.
- The goal is to create an Algorithmic Technique Module similar to COT, Predict, and ReAct but is currently facing issues with hallucinations.
MiPROv2 to Optimize Code Generation: A member is looking into using MiPROv2 to optimize a prompt for generating the fastest possible code and intends to measure code speed as the optimization metric.
- They also expressed excitement for the RLM module but noted that existing examples are scattered across Twitter and are hard to search.
RLM Module Sees More Usage: Members shared their success using the RLM module (Retrieval-augmented Language Model), and pointed to a Gist to assist other members.
- One member said they are using Claude to hillclimb its own memory system, noting that this paradigm is rlly something.
Dialectic Chain of Thoughts Hailed: A member shared an image, and some details on how they found that the output of Dialectic Chain of Thoughts was not as expected for a specific example in the energy sector.
- Despite this, the member plans to finetune the module, generate a dataset, and run a bootstrap, followed by another round of optimization using GEPA with LLM as a judge to improve results, and noted that the dialectic chain of thoughts was innovative.
DSPy Eyes Translation Tasks: A member inquired about research into using DSPy for translation, specifically to leverage chain-of-thought reasoning or create a pipeline for translation tasks.
- Additionally, they are interested in a recent Allen AI paper and believe that chain of thought of reasoning is not a emergent properties instead it’s exist in the domain of the datasets.

HuggingFace ▷ #general (18 messages🔥):

Voyager VS Code Extension, Shell for Microcontroller, LLM Engineering Courses, AI Performance with Python Types, Implementing Evals in Chatbot Systems

Voyager VS Code Extension Helps Deeply Understand Tech Papers: A member introduced Voyager, a VS Code extension leveraging GitHub Co-Pilot to deeply understand technical papers by creating a Jupyter notebook version within VS Code and allowing users to add custom Jupyter Cells.
- The creator seeks constructive feedback on this first attempt at building a VS Code extension, believing it will help others methodically understand technical articles.
Roll your own shell for a microcontroller?: A member inquired if creating a “shell-like thing” for a microcontroller over the firmware qualifies as an actual shell.
- Another member shared their project involving a CLI Linux-like interface for a microcontroller with a small OLED screen, seeking similar validation.
Backend Dev Seeks LLM Engineering Course Recommendations: A backend developer requested resources or courses to get into LLM engineering at a production level, finding a Udemy course too hand-holdy.
- They emphasized the need for practical coding experience relevant to their company’s needs.
Python with types boost AI?: A member posited that AI would perform better with typed Python due to the enriched context and narrowed token range, reducing hallucination.
- Another agreed: Types enrich the meaningfulness of context, and narrow the range of applicable tokens to generate next with greater confidence / less likely hallucination.
HF Spaces and Evals, OH MY!: A member asked about implementing evals in a chatbot system, seeking perspectives on multi-turn versus individual turn evaluations.
- In conjunction, Ben from Hugging Face showed off a fun visualization of research papers that mention Common Crawl, clustered by topic, running in a Hugging Face Space!

HuggingFace ▷ #i-made-this (8 messages🔥):

Ethical Reasoning in LLMs, Control Terminal, Izwi, Hugging Face MoE Training, Parapet Attack Detection

LLMs Ethics or Dumb Compliance?: Debate sparks around whether LLMs possess inherent ethical reasoning, suggesting we might be training the ethics out of them, with compliant AI posing a potential threat as highlighted in the new research paper, Coherence over compliance: Evidence of latent ethics in large language models.
Control Terminal Lets you Remote Control Your Models: Introducing Control-Terminal, an open-source tool enabling control of Claude, Codex, or any CLI agent running on your PC from a mobile phone, featuring session persistence, mobile access, and Cloudflare Tunnel for public URLs.
- Perfect for developers building AI workflows on the go, documentation is available here and contributions are welcome on GitHub.
Izwi Brings local Audio Inference: Izwi, a fully local audio inference stack for speech workflows, has been showcased, supporting Qwen3-TTS and Qwen3-ASR, with ongoing work on Voxtral-Mini-4B-Realtime and LFM2.5-Audio-1.5B, and is available on GitHub.
Hugging Face Speeds Up MoE Training: A collaboration with Hugging Face now enables much faster local training of MoE models, with details shared on Twitter.
Parapet Introduces New Scoring Formula for Proxy-Level Attack Detection: WildJailbreak and WildChat were used to evaluate a new multi-turn scoring formula for proxy-level attack detection, achieving 90.8% recall at 1.20% FPR with no LLM classifier.
- The paper, code, and eval harness are all open source.

Eleuther ▷ #general (11 messages🔥):

Triton Kernels, Data Analysis & Orchestration of Finetuning Experiments, GLM-5 free on Modal, ML Performance Reading Group

Triton Kernels Now Viable?: A member heard that the tooling is now good enough to write some Triton kernels.
- Another member stated that direct CUDA kernels can be vibe coded.
Data Analysis and Finetuning Orchestration with Custom Skills: One member uses the tooling for running data analysis and orchestrating finetuning experiments.
- They also have a custom skill so that it knows how to deploy GPT-NeoX for training on CoreWeave.
GLM-5 Available Free on Modal!: GLM-5 is free on Modal until April 30, with a concurrency limit of 1.
- This can be integrated with OpenCode.
ML Performance Reading Group Session channel requested: A member asked for directions to the ML Performance Reading Group Session channel.
- Another member provided a link to the discord channel.

Eleuther ▷ #research (8 messages🔥):

Self-Referential Processing, Open-Weight Models, arXiv Endorsements, SDPA Optimal Transport, Code Quality

Models Invent Vocabulary During Self-Examination: A member introduced a paper on self-referential processing in open-weight models (Llama 3.1 + Qwen 2.5-32B) where models invent vocabulary during extended self-examination.
- The invented vocabulary tracks real activation dynamics (e.g., “loop” autocorrelation r=0.44; “mirror” spectral power r=0.62) - all FDR surviving, see the paper here.
Paper Formulates SDPA as Optimal Transport Problem: A member shared a very interesting paper that formulates SDPA as a one sided optimal transport problem, see the paper here.
Blog Post Exposes Code Quality Issues: A member posted about new blog post exposing code quality issues in 5.3 & 4.6, see the post here.

Eleuther ▷ #interpretability-general (5 messages):

Trajectory Geometry, Yocum paper, Unlearning during training

Shape of Thought Published!: An independent researcher announced their first publication on trajectory geometry with Towards AI on Medium, seeking critique and collaboration.
- The author expressed excitement about finally acquiring empirical data and hopes the community takes notice of their work.
Goodfire Paper: A member shared a link to the Goodfire paper (https://www.alphaxiv.org/abs/2602.10067), noting it has the same energy as the unlearning-during-training idea.
- The paper discusses interpretability methods for reducing hallucinations during training.
Yocum Paper: A member shared that they had not seen the Yocum paper cited in the initial publication, and noted that it’s very good.
- They did not give a link to the paper but cited it in the discussion.

Modular (Mojo 🔥) ▷ #general (4 messages):

Mojo channels, GLM 5 credits

Mojo Mulls Thread-Safe Channels: Members are wondering if Mojo has channels, inspired by Golang’s implementation, but currently, there aren’t any thread-safe channels as the threading model and async mechanisms are still under development.
- The team is considering various channel types and pondering the implications of channel implementation on GPUs.
GLM 5 Credits Near Completion: A member reports consuming over 50 hours in GLM 5 credits and is getting very close to completion with most math, statistics, and Fortran tasks finished.
- The remaining work focuses on the evaluator, parser, memory management, and related aspects.

Modular (Mojo 🔥) ▷ #mojo (16 messages🔥):

LayoutTensors element access, Mojo CUDA __launch_bounds__ equivalent, Mojo large integer support

LayoutTensor Access and Manipulation Probed: A member asked about using LayoutTensors for storing individual elements and retrieving vectors/slices in a 4D row-major layout, seeking advice on element layout choices and performance implications of slicing.
- Another member suggested using a second type declaring the element layout as a vector and provided code snippets for storing scalars and loading vectors, also sharing concerns about multi-dimensional transpose operations.
LayoutTensor Slicing Behavior Investigated: A user investigated slice_1d behavior, noting that the slice pointer only shifts based on the last index, contrary to expectations for multi-axis slicing.
- The member provided example code demonstrating how slicing appears to only consider the last axis when calculating the pointer offset, questioning whether this is a bug.
Mojo Lacks CUDA’s __launch_bounds__ Equivalent: A member porting CUDA code to Mojo inquired about an equivalent mechanism to __launch_bounds__ for compiler optimization.
- A moderator suggested posting the question on the Modular forum for better visibility and follow-up.
Nightly solves 256 bits integer problems: A member encountered issues using UInt256 in Mojo 0.25.7, which reported a 128-bit target limitation when translating an elliptic curve cryptographic library.
- Another member suggested trying the latest nightly build, which resolved the problem, tracing back to a related GitHub issue and a partially fixed PR.

tinygrad (George Hotz) ▷ #general (9 messages🔥):

PR review times, beautiful_mnist_torch solution value, tok/s, tinybox green v2

PR Review Time Scale: The time to review a PR is proportional to the PR size and inversely proportional to the value of the PR.
- A member agreed, simply stating “fair, lol”.
beautiful_mnist_torch Solution: The value of a solution to “beautiful_mnist_torch uses torch.compile TinyJIT working with TINY_BACKEND=1, see test_compile.py” was questioned.
- A member guessed it would likely be “AI slop that barely passes the test” rather than “a beautiful super readable PR that brings tinygrad closer to Truth”.
Tok/s Discovery: A member reported finding “some tok/s lying around” and included an image.
tinybox Green v2 Ships: A member announced that their “tinybox green v2 has shipped”.
- They noted that it involves a “big upfront cost”, describing it as “a problem for later”.

aider (Paul Gauthier) ▷ #general (3 messages):

Aider development status, Model compatibility issues with aider

Aider Development Halted: A member inquired about the development status of aider, noting the absence of recent releases.
- Another member responded that development has currently stopped, maybe forever.
Qwen-2.5 Model Plagued with Issues: A user reported that using the openrouter/qwen/qwen-2.5-coder-32b-instruct model with aider consistently fails during Search/Replace operations.
- The error message indicates that the blocks have to be exact, suggesting potential incompatibility issues with this specific model compared to openrouter/deepseek/deepseek-v3.2.

aider (Paul Gauthier) ▷ #questions-and-tips (3 messages):

/architect usage, Model pairing benefits

Is /architect beneficial when using the same model?: A user inquired about the benefits of using /architect when the same model is used for both reasoning/planning and editing files, questioning if it reduces the chance of incorrect edit/diff formatting.
- They initially assumed the disadvantage is doubled token usage for code-related output, but then found in the docs that pairing models with themselves in the Architect/Editor configuration can provide significant benefits, with Sonnet, GPT-4o, and GPT-4o-mini scoring higher when used as an Architect/Editor pair.
Architect is just ask + code: A member clarified that /architect essentially combines /ask and /code sessions into a single operation.
- This implies using /architect is a streamlined way to achieve the same result as manually running /ask followed by /code.