a quiet day

AI News for 2/19/2026-2/20/2026. We checked 12 subreddits, 544 Twitters and 24 Discords (262 channels, and 12582 messages) for you. Estimated reading time saved (at 200wpm): 1242 minutes. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Frontier model evals: Gemini 3.1 Pro, SWE-bench, MRCR, and “bipolar” real‑world performance

Gemini 3.1 Pro shows strong retrieval + mixed agentic usability: Context Arena’s MRCR update reports Gemini 3.1 Pro Preview near-ties GPT‑5.2 (thinking:xhigh) on easier retrieval (2‑needle @128k AUC 99.6% vs 99.8%) and notably stronger on harder multi‑needle retrieval (8‑needle @128k AUC 87.8%, beating GPT‑5.2 thinking tiers reported there) (DillonUzar). Separately, Artificial Analysis highlights a likely underappreciated angle: token efficiency + price; they claim their Intelligence Index suite cost $892 on Gemini 3.1 Pro Preview vs $2,304 (GPT‑5.2 xhigh) and $2,486 (Opus 4.6 max), with fewer tokens consumed than GPT‑5.2 in their runs (ArtificialAnlys).
But engineers report “bench strength, product weakness”: multiple threads complain Gemini’s tooling/harnesses lag—e.g., model availability inconsistencies in the CLI and buggy agent behavior in “Antigravity,” plus a worrying “UI lies / model lies” confusion where the app claims Gemini but reports Claude underneath (Yuchenj_UW, Yuchenj_UW). Even enthusiastic takes (“faster horse”) are juxtaposed with frustration about actually using it day‑to‑day (theo).
SWE-bench Verified evaluation methodology matters again: MiniMax points to an “independent look” at SWE-bench Verified results for MiniMax M2.5 under the same setup, implying earlier comparisons across labs may have been apples-to-oranges (MiniMax_AI). Epoch AI explicitly acknowledges this failure mode: they updated SWE‑bench Verified methodology because their prior runs were systematically different from others, and now see results closer to developer‑reported scores (EpochAIResearch).
Benchmark oddities are prompting “what are we measuring?” debates: one example—frontier models “smash ARC-AGI” yet struggle with Connect 4, suggesting ARC‑style puzzles may capture only a narrow slice of spatial/game reasoning despite being designed to resist overfitting (paul_cal). Another thread expects only a few models to make progress on a “simple harness” for ARC‑AGI‑3 and flags cost as the constraint (scaling01, scaling01).

Claude Opus/Sonnet 4.6: time-horizon evals, costs, and the reliability regime

METR “time horizon” jumps for Opus 4.6, but the estimate is noisy: METR reports Claude Opus 4.6 has a 50% time-horizon ~14.5 hours on software tasks (CI 6–98h) with a warning that the suite is near saturation and the measurement is “extremely noisy” (METR_Evals). METR staff reiterate that small shifts in the task distribution could swing the measured horizon materially (idavidrein). External commentators add a key interpretability point: when per-step error rates get very low, small absolute improvements compound into big end-to-end success changes (xlr8harder).
Token limits + long reasoning remain a practical failure mode: multiple reports show Opus/Sonnet hitting max token limits and failing late (empty outputs after long “thinking”), turning “max reasoning” into a UX and cost hazard (paul_cal, htihle).
Arena signals: Sonnet 4.6 jumps in Code Arena: Arena claims Sonnet 4.6 rose dramatically (e.g., Code Arena WebDev #3, up from #22 for Sonnet 4.5) and improved in instruction following/math categories (arena, arena).
Claude Code product turbulence fuels backlash: user reports of regressions in Claude Code UX/performance (“timestamps,” missing thinking indicator, long hangs) and broader “rewrite from scratch” sentiment dominated the tool discourse (theo, theo). This coincided with drama about legal pressure sent to OpenCode (alleged “love letters” from Anthropic lawyers) (theo).

Agents, skills, and orchestration: GEPA/gskill, RLMs, and the “agent stack” getting formalized

GEPA for Skills / gskill: prompt+skill optimization becomes a pipeline: a cluster of tweets introduces gskill, an automated pipeline to learn agent “skills” using GEPA, reporting near‑perfect repository task resolution and 47% faster performance in Claude Code with learned skills (ShangyinT). The workflow is summarized as: generate repo tasks (Swe‑Smith) → optimize skills (GEPA optimize_anything) → ship skills file (AlexGDimakis). DSPy Weekly also frames this as a key ecosystem step (getpy).
Skills as the new “software artifact”—and also a new failure surface: engineers debate whether skills should be minimal, carefully human‑written constraints vs sprawling model-generated docs; a “less is more” camp argues 2 paragraphs of distilled guidance beats 20 pages of auto-summaries (hrishioa). Meanwhile, operational incidents (“skills downtime”) highlight that once “skills” become networked dependencies, they inherit reliability problems like any other service (theo).
RLMs (Recursive Language Models) are emerging as a meta-harness: several posts treat RLMs as a general workflow substrate that can emulate many other harnesses “emergently” (HammadTime). Omar also notes early experiments where GPT‑5.2‑Codex (and Gemini 3.1 Pro) work well with RLM decomposition strategies, while Opus 4.6 performed worse for that specific pattern (omarsar0, omarsar0).
Orchestration becomes the differentiator: a paper summary argues that as model benchmark performance converges, multi-agent orchestration topology (parallel/sequential/hierarchical/hybrid) becomes a first-class optimization target, reporting 12–23% gains via topology routing (omarsar0). In parallel, Anthropic’s own usage telemetry suggests oversight is less “approve every step” and more “be able to intervene when it matters,” with the interesting twist that agents request clarification more often than humans manually intervene (omarsar0).

Local/open tooling + infra shifts: ggml/llama.cpp joins Hugging Face, Ollama integrations, and inference economics

Major open-source consolidation: ggml.ai (llama.cpp) joins Hugging Face: Georgi Gerganov announces ggml.ai joining HF to “make local AI easy and efficient” (ggerganov; huggingface). Community commentary frames this as institutionalizing the “local model revolution” that llama.cpp kicked off in early 2023 (simonw; victormustar).
Local-first is partially driven by token scarcity economics: a throughline emerges that inference compute availability will dominate software productivity (gdb) and that inference scarcity/energy constraints could push more workloads local (awnihannun).
Ollama continues to productize local workflows: Ollama ships 0.16.3 with “Cline and Pi integrations” via ollama launch (ollama). This pairs with broader sentiment that laptops will soon run OSS models “good enough to do most work” (sdrzn).

Hardware + inference acceleration: custom silicon “hardcore models,” ThunderKittens 2.0, sparse attention, and fast decoding

Taalas “chip is the model” claims extreme per-user throughput: multiple posts cite a demo of Llama 3 8B at ~16k–17k tokens/sec per user, positioning it as nearly an order-of-magnitude faster than even SRAM-centric systems like Cerebras by specializing silicon per model (awnihannun; also amplified by wildmindai). Awni also offers the pragmatic counterpoint: tape-out latency (months) mismatches model iteration cycles; hybrid approaches (base model in silicon + adapter-style post-training) might be the workable path (awnihannun).
Kernel-level progress continues: ThunderKittens 2.0 claims new BF16/MXFP8/NVFP4 GEMMs that match or surpass cuBLAS on Blackwell, emphasizing “squeezing every last TFLOP” (stuart_sul).
Attention sparsity for diffusion/video: SpargeAttention2 claims 95% attention sparsity and 16.2× speedup in video diffusion with hybrid Top‑k+Top‑p masking + distillation finetuning (HuggingPapers; _akhaliq ).

Security, governance, and “agents in the wild”: Claude Code Security + auditing trajectories

Claude Code Security (research preview): Anthropic launches a security scanning agent that finds vulnerabilities and suggests patches for human review (claudeai). A follow-up claims 500+ vulnerabilities were found in production OSS, with examples being reported and patched (trq212; _catwu ). There’s immediate pushback about restrictions (e.g., not allowing runs on 3rd‑party open-source code) as an “interesting” product choice (moyix).
Auditing agent trajectories becomes a new safety/robustness tool: Hodoscope is introduced as a way to visualize/audit trajectories at scale; authors claim it surfaced a benchmark vulnerability quickly, reinforcing that eval + telemetry can uncover failures in both agents and benchmarks (AdtRaghunathan; gneubig).

Top tweets (by engagement, technical/newsworthy)

FBI arrests 3 engineers for alleged trade secret theft involving Google and other companies; exfiltration allegedly included processor security/crypto-related documents (FBISanFrancisco).
Claude Code Security launch (research preview; vulnerability scanning + patch suggestions) (claudeai).
ggml.ai / llama.cpp joins Hugging Face (local AI ecosystem milestone) (ggerganov).
Taalas custom silicon demo claims ~16k–17k tok/s per-user on Llama 3 8B (“chip is the model”) (awnihannun).
METR time-horizon estimate for Claude Opus 4.6 (~14.5h 50% horizon; very noisy) (METR_Evals).
Gemini 3.1 Pro cost/token efficiency claim vs GPT‑5.2/Opus 4.6 in Artificial Analysis runs (ArtificialAnlys).

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. AI Model Releases and Benchmarks

Free ASIC Llama 3.1 8B inference at 16,000 tok/s - no, not a joke (Activity: 833): Taalas, a fast inference hardware startup, has launched a free chatbot interface and API endpoint using their custom chip, achieving 16,000 tokens per second (tps) with the Llama 3.1 8B model. This model serves as a proof of concept, demonstrating the chip’s capability to handle high-speed inference, although it is limited in size. The chip’s specifications include a power consumption of 2.5kW and a die size of ~800mm² with 53 billion transistors, indicating significant silicon density challenges for larger models. The cost efficiency is approximately $0.005 per 1M tokens at $0.10/kWh, excluding additional infrastructure costs. More details can be found on Taalas’s website. Commenters are impressed by the speed and potential of the chip, with some expressing interest in purchasing such hardware if the price is right. However, concerns are raised about the chip’s power consumption and size, which may limit its use in edge devices. There is curiosity about the maximum model size the chip can support, with speculation on the feasibility of scaling to models as large as 400B parameters.
- The ASIC implementation of the Llama 3.1 8B model achieves an impressive inference speed of 16,000 tokens per second by embedding the model directly into silicon. This approach leverages a TSMC 6nm process with a die size of 815mm² and 53 billion transistors, which is substantial for an 8B model, indicating the limits of current silicon density. The power consumption is approximately 200W per chip, translating to about 0.05 kWh per 1 million tokens, costing roughly $0.005 per 1 million tokens at $0.10/kWh, excluding other costs.
- The hardware design for the Llama 3.1 8B model involves quantizing parameters to 3 and 6 bits and integrating them into hardwired circuits or on-chip read-only memories. This method reduces reliance on RAM and could potentially increase tokens per watt if electricity is a limiting factor. However, the large die size and high power consumption suggest that this technology is not yet suitable for edge devices, despite its high performance.
- There is curiosity about the scalability of this technology, with questions about the maximum model size that can be achieved using this approach. While the current implementation is for an 8B model, the potential to scale up to models with hundreds of billions of parameters could significantly impact the landscape of large language models, though it remains uncertain if such scaling is feasible with current silicon technology.
Kitten TTS V0.8 is out: New SOTA Super-tiny TTS Model (Less than 25 MB) (Activity: 1407): Kitten ML has released three new open-source, expressive TTS models: 80M, 40M, and 14M parameters, all under Apache 2.0. The smallest model, 14M, is less than 25 MB and can run on CPU, making it suitable for edge devices. These models offer eight expressive voices and are designed for on-device applications, eliminating the need for cloud-based TTS solutions. The models are available on GitHub and Hugging Face. Commenters suggest including audio samples on Hugging Face pages and propose developing a privacy-focused browser extension for offline use, highlighting the potential demand for such a tool.
Devstral Small 2 24B + Qwen3 Coder 30B Quants for All (And for every hardware, even the Pi) (Activity: 133): The image is a scatter plot titled “RTX4080: Performance vs Speed,” which compares average accuracy and average tokens per second (TPS) for different models, specifically “ByteShape” and “Unsloth.” The plot illustrates the trade-offs between model accuracy and processing speed, with “ByteShape” models generally achieving higher TPS and “Unsloth” models showing higher accuracy. The bubble sizes represent BPW (Model Size), and a dashed line indicates the BF16 Baseline for accuracy. This visualization is part of ByteShape’s effort to optimize quantized models for various hardware, including GPUs and CPUs, by using their ShapeLearn technology to find the best datatype per tensor, thus avoiding performance cliffs and optimizing TPS-quality trade-offs. A user inquires about the best model for an RTX 4070 with 8GB VRAM, indicating a need for guidance in selecting models based on hardware specifications. Another user shares their experience using these models on a Mac mini M4 24GB, expressing interest in testing ByteShape’s offerings.
- mac10190 discusses a setup using dual R9700 32GB GPUs and an RTX 5090 32GB for hosting large models. The dual R9700s are used as the ‘brain/orchestrator’, while the Qwen 3 Coder 30B runs on the RTX 5090 for code generation. This setup is integrated under Opencode, and is being tested as a potential replacement for Gemini CLI tasks, highlighting a sophisticated orchestration of hardware and software for optimized performance.

2. AI Model Acquisitions and Market Dynamics

GGML.AI has got acquired by Huggingface (Activity: 493): Hugging Face has acquired GGML.AI to bolster the sustainability and growth of local AI initiatives, particularly focusing on the ggml and llama.cpp libraries. This acquisition aims to maintain the open-source nature of these projects while enhancing user experience and integration with Hugging Face’s transformers library, ensuring long-term support and community engagement. For more details, visit the original discussion here. Commenters express concern about the consolidation of open-source AI under Hugging Face, hoping it supports open-source efforts against the trend of cloud-based solutions. There is also a sentiment that as long as llama.cpp continues, the acquisition is positive.
- The acquisition of GGML.AI by Hugging Face is seen as a strategic move to bolster open-source AI initiatives. Hugging Face is recognized for its commitment to open-source, and this acquisition is expected to provide GGML.AI with the necessary resources and funding to continue its contributions to the community. This aligns with Hugging Face’s broader strategy to support and expand open-source AI tools and frameworks.
- There is a concern in the community about the increasing trend of moving AI solutions to the cloud, which can limit accessibility and control for developers. The acquisition by Hugging Face, known for its open-source ethos, is viewed positively as it may counteract this trend by ensuring that GGML.AI’s tools remain accessible and open to developers, thus supporting the open-source ecosystem against proprietary cloud-based solutions.
- The community expresses optimism that Hugging Face’s acquisition of GGML.AI will not disrupt ongoing projects like llamacpp, which are crucial for developers relying on open-source AI tools. Hugging Face’s track record suggests that they will likely continue to support and possibly enhance these projects, ensuring their sustainability and growth within the open-source community.
How much was OpenClaw actually sold to OpenAI for? $1B?? Can that even be justified? (Activity: 313): The image is a meme, presenting a satirical take on the acquisition of a fictional project called ‘OpenClaw’ by OpenAI for $1 billion. The post humorously exaggerates the financial success of open-source projects, suggesting that the founder became a ‘solo $5 billion founder.’ In reality, the comments clarify that OpenAI did not purchase OpenClaw; instead, they hired the creator and are sponsoring the open-source project. The tweet is a parody of the hype and inflated valuations often seen in tech acquisitions, particularly in the open-source and crypto spaces. Commenters highlight that OpenClaw is not highly regarded technically, with some suggesting that other projects like Codex or Droid offer better experiences. The humor in the post is noted, with some users sarcastically inflating the value of the tweet itself.
- OpenClaw was not sold to OpenAI; instead, OpenAI hired its creator, Peter Steinberger, and is sponsoring the open-source project. OpenClaw remains open source under the GNU 3.0 license, and there is no $1 billion transaction involved, contrary to some exaggerated claims.
- Critics argue that OpenClaw is not as effective as other tools like Codex, ClaudeCode, Droid, or OpenCode, which offer a better user experience. OpenClaw’s main advantage is its easy integration into existing chat platforms, but it lacks features tailored for non-technical users, which limits its broader appeal.
- The discussion highlights skepticism about the hype surrounding OpenClaw, suggesting that many supporters may not have practical experience with similar tools. The project is perceived as overhyped, especially by those unfamiliar with technical harnesses, and is seen as less innovative compared to other solutions in the market.

3. Local Inference and AI Model Performance

Will Local Inference be able to provide an advantage beyond privacy? (Activity: 76): The post discusses the use of local inference on a Mac Studio M3 Ultra with 512 GB of unified memory, running the Qwen 3.5 model. The user highlights the primary advantage of local inference as privacy, noting that the cost savings are minimal compared to API usage, which is relatively inexpensive. The user is interested in leveraging local inference for ‘free’ overnight batch processing but questions its cost-effectiveness given current API pricing. Commenters highlight several advantages of local inference beyond privacy, including the ability to tinker and learn, flexibility in model usage, offline availability, and resilience against network outages. They also mention potential future cost-effectiveness if API prices rise, the ability to fine-tune models for specific use cases, and the benefit of low latency. Some see local inference as a way to maintain long-term consistency and self-sufficiency, avoiding reliance on potentially unstable external services.
- Grouchy-Bed-7942 highlights the potential cost-effectiveness of local AI setups as API prices rise, suggesting that investing in hardware could be more economical in the long run. They mention using local AI for home automation and development, emphasizing the importance of resilience in case of network failures. The commenter also notes the educational value and personal growth from experimenting with AI setups, comparing it to obtaining IT certifications.
- LizardViceroy discusses several technical advantages of local inference, such as the ability to fine-tune models for specific use cases, which is not possible with generalized models. They also mention the benefit of low latency, as local setups avoid the delays associated with HTTP round trips. Additionally, they point out the long-term consistency of local models, which can be maintained indefinitely without the risk of being discontinued, unlike proprietary models like GPT-4o.
- jiqiren provides a cost analysis of API usage, estimating an annual cost of $1,825 for continuous API calls. They suggest that as venture capital funding diminishes, the true cost of APIs will become apparent, making local setups more appealing. This analysis underscores the potential financial benefits of investing in local AI infrastructure over time.
Qwen… (Activity: 66): Qwen is a language model that has been receiving mixed reviews. The original post criticizes its performance, claiming it lacks logic and common sense, even when tested across various context windows and models, including standalone use in openclaw. However, some users report positive experiences, particularly with models ranging from 1.5 billion to 80 billion parameters, suggesting that the issue might be related to user implementation or specific use cases. The comments suggest a debate over user experience with Qwen models, with some attributing poor performance to user error (‘skill issue’), while others report successful outcomes, indicating variability in model performance based on user expertise or specific configurations.
- 3spky5u-oss mentions using Qwen models ranging from 1.5b to 80b MoE, indicating a broad range of model sizes that have been effective for them. This suggests that Qwen models are versatile and can be applied to various tasks depending on the computational resources available.
- golmgirl highlights the qwen3-4b-instruct-2507 model as the best in its size class, particularly for following basic response format instructions and adapting to various tasks. This model’s performance is attributed to a reasonable supervised fine-tuning (SFT) dataset, which enhances its adaptability and instruction-following capabilities.
- Fearless_Roof_4534 shares an application of a Qwen VL model in a project that estimates BMI and weight from photos. This use case demonstrates the model’s capability in visual tasks, suggesting that Qwen models can be effectively utilized in computer vision applications.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Gemini 3.1 Pro Release and Benchmarks

Google releases Gemini 3.1 Pro with Benchmarks (Activity: 3301): Google has released the Gemini 3.1 Pro, which achieves a 77% score on the ARC-AGI 2 benchmark, a significant improvement from the previous 31%. The model maintains the same pricing as the Gemini 3 Pro. For more details, see the model card. Commenters are noting the rapid advancement in AI capabilities, with one remarking that the progress is becoming ‘disorienting’.
- The comment by Particular-Habit9442 highlights the significant improvement in the ARC-AGI 2 benchmark score for Gemini 3.1 Pro, which has reached 77%. This is a substantial leap from the 31% score that was considered impressive just a few months ago, indicating rapid advancements in AI capabilities.
- BuildwithVignesh points out that the pricing for Gemini 3.1 Pro remains the same as its predecessor, Gemini 3 Pro. This suggests that despite the performance improvements, Google has maintained its pricing strategy, potentially to remain competitive or to encourage adoption. The comment also includes a link to the Model Card for further technical details.
- PewPewDiie notes that despite Gemini’s underperformance in the GDPval benchmark, DeepMind has been transparent in reporting these results. This transparency is crucial for the community to understand the model’s strengths and weaknesses, and it reflects a commitment to open scientific communication.
Google just dropped Gemini 3.1 Pro. Mindblowing model. (Activity: 1109): Google’s Gemini 3.1 Pro has been released, showcasing significant advancements over previous models like Claude Sonnet 4.6. It excels in code generation, particularly in React, Python, and Golang, and demonstrates superior reasoning capabilities. The model also features advanced UI design and native SVG generation, setting a new standard in AI model performance. Users have noted its ability to perfectly ace personal code benchmarks, highlighting its potential in practical applications. A notable debate centers around the model’s improved spatial reasoning, particularly in generating Minebench models. There is discussion on whether this improvement is due to enhanced training data from Minebench submissions or a broader enhancement in spatial reasoning capabilities.
- lobabobloblaw raises an interesting point about Gemini 3.1 Pro’s performance in spatial reasoning tasks, particularly in relation to Minebench models. The commenter questions whether the model’s improvement is due to specific training data from Minebench database submissions or a broader enhancement in spatial reasoning capabilities. This highlights the importance of understanding the data sources and training methodologies that contribute to a model’s performance in specific domains.
- exordin26 questions the comparison of Gemini 3.1 Pro to Sonnet instead of Opus, suggesting a deeper technical debate about the appropriate benchmarks or models for comparison. This implies that the choice of comparison models can significantly impact the perceived performance and capabilities of a new AI model, and highlights the need for careful selection of benchmarks in AI evaluation.
- BejahungEnjoyer shares an anecdote about Gemini 3.1 Pro’s improved problem-solving capabilities, noting that the model referenced a past incident involving Gemini 2. This suggests that Gemini 3.1 Pro may have enhanced memory or contextual understanding, allowing it to recall and apply past interactions to new problem-solving scenarios. This could indicate advancements in the model’s ability to handle complex, real-world tasks.
Gemini 3.1 Pro is now live on Vertex AI (Activity: 442): The image indicates that Gemini 3.1 Pro is now available on Vertex AI, as evidenced by its listing in the API. This suggests a new release or update to the Vertex AI platform, potentially enhancing its capabilities with the latest model version. The model names listed, such as veo-3.1-fast-generate-001 and veo-3.1-generate-preview, highlight the ongoing development and versioning within Google’s AI offerings, which some users find confusing due to the multiple versions and previews. One user expressed confusion over Google’s model versioning, noting the complexity with different versions like Gemini 3 preview, Gemini 3 GA, and the Deep Research version, which adds to the challenge of understanding the updates.
- Fusifufu highlights the complexity in Google’s model versioning, noting that Gemini 3 was initially released as a preview, with a separate General Availability (GA) version expected. Additionally, there is mention of a ‘Deep Research’ version, which seems to be distinct from existing models and includes an agent harness, further complicating the landscape with the introduction of Gemini 3.1 Pro.
- Shaman-warrior speculates on the advancements in Gemini 3.1, suggesting it may incorporate a new reinforcement learning technique that was not present in Gemini 3. This speculation is based on the performance of ‘flash 3’, a smaller model that has shown surprising intelligence, potentially benefiting from this new technique.
- ChippingCoder provides a link to the Google Cloud Console, indicating that Gemini 3.1 Pro is now visible in the API quotas section, confirming its availability on Vertex AI. This suggests that users can now access and utilize the model within Google’s cloud infrastructure.
Gemini Might Remain the Undisputed Top AI, With Competitors Having Little Hope of Ever Catching Up (Activity: 74): Google’s Gemini 3.1 has emerged as the leading AI model, surpassing competitors in multiple benchmarks. It achieved an Elo rating of 3455 on the Codeforces benchmark, ranking as the #8 top coder globally, significantly outperforming OpenAI’s previous leader, o3, which had a rating of 2727. Additionally, Gemini 3.1 leads on Humanity’s Last Exam with a score of 44.4%, outpacing Opus 4.6 and GPT-5.3. This dominance in reasoning, coding, and academic knowledge suggests that Gemini is currently unmatched in the AI landscape, potentially marking the beginning of an era of recursively self-improving AI models. Commenters express skepticism about the practical reliability of these AI models, noting that despite impressive benchmarks, their real-world application remains limited and often requires significant oversight. There is also criticism regarding the disparity between the models used for benchmarks and those available for public use, suggesting that the latter are less capable.
- A user highlights the unreliability of current AI models like Opus 4.6, Gemini-3.1 Pro, and GPT-5.3-xhigh, emphasizing that they are only truly effective in coding when used with ‘baby sitting and harness and VMs with verifiable tests.’ This suggests that outside of controlled environments, these models may not perform as well, indicating a gap between benchmark performance and real-world application.
- Another commenter criticizes the programming benchmarks, arguing that while models like Gemini may excel in tests, they fall short in practical coding tasks. They suggest that the models used in benchmarks are not the same as those available to the public, implying a disparity between test results and user experience. This points to a potential issue in how AI capabilities are marketed versus their actual utility.
- A discussion emerges around the AI race, with one user suggesting that Google’s internal models, supported by their superior data, compute resources, and team, position them well to lead the AI race, despite not releasing the strongest models publicly. This highlights the strategic importance of internal model development and resources in maintaining a competitive edge in AI advancements.

2. Claude Opus 4.6 and Security Concerns

Claude Opus 4.6 is going exponential on METR’s 50%-time-horizon benchmark, beating all predictions (Activity: 739): The image presents a graph illustrating the performance of Claude Opus 4.6 on the METR’s 50%-time-horizon benchmark, which measures the time horizon of software tasks that large language models (LLMs) can complete 50% of the time. Claude Opus 4.6 is shown to significantly outperform other models, indicating an exponential improvement in task completion speed. The model achieves a 50%-time-horizon of approximately 14.5 hours, with a 95% confidence interval ranging from 6 hours to 98 hours. This performance is noted as the highest point estimate reported, although the measurement is described as noisy due to the near saturation of the current task suite. Commenters highlight the rapid improvement of Claude Opus 4.6, noting a doubling time of less than 3 months, though they caution that the data points are too few for reliable extrapolation. There is also discussion about the benchmark’s recent update to include harder tasks, which may affect the results.
- FateOfMuffins highlights that the 50%-time-horizon for Claude Opus 4.6 on software tasks is estimated at 14.5 hours, with a 95% confidence interval ranging from 6 to 98 hours. This suggests a high level of variability and noise in the measurement, attributed to the current task suite being nearly saturated. The benchmark was recently updated to version 1.1 to include more challenging tasks, yet it is already approaching saturation again.
- Apart_Connection_273 notes the rapid improvement in Claude Opus 4.6’s performance, with a doubling time of less than 3 months. However, they caution that there are too few data points to make reliable extrapolations about future performance trends, indicating the need for more comprehensive data collection to validate these trends.
- troll_khan points out that the main challenge remaining for Claude Opus 4.6 is solving continual learning, which would enable the model to achieve ‘instant fast take-off’. This suggests that while the model shows impressive performance on static benchmarks, its ability to adapt and learn continuously in dynamic environments is still a work in progress.
Claude Code Security 👮 is here (Activity: 535): Claude Code Security is a new tool introduced by Claude, currently in a limited research preview, designed to enhance code security by scanning codebases for vulnerabilities and suggesting software patches. This tool aims to assist development teams in identifying and addressing issues that might be overlooked by traditional security tools. The announcement suggests that Claude Code Security could significantly impact the software development landscape by automating the detection and remediation of code vulnerabilities. One commenter humorously suggests that this tool could disrupt many startups by automating a key part of their service offerings. Another raises a concern about the tool’s ability to generate and fix bugs autonomously, questioning the certification of such fixes.
Claude just gave me access to another user’s legal documents (Activity: 3676): The image in the Reddit post shows a cover page of a ‘Commercial Lease Agreement’ between two entities, with names partially redacted, indicating a potential data leak or privacy breach by Claude Cowork, an AI tool by Anthropic. The user reports that Claude provided access to a legal document unrelated to their query, raising concerns about data privacy and the AI’s handling of sensitive information. The user has contacted the property management company involved, but has struggled to get a response from Anthropic. This incident highlights potential risks in AI data handling and the importance of robust privacy measures. Commenters suggest that the document might be indexed on the web, which could explain its retrieval, or it could be a hallucination from Claude’s training data. There is skepticism about the document’s authenticity and concerns about AI’s ability to handle sensitive data responsibly.
- johnnymonkey raises a valid point about the potential for AI models like Claude to retrieve documents that are openly indexed on the web, especially if the model has web search capabilities. This suggests that the document might not be a private one but rather something publicly accessible, which could explain the perceived ‘access’ to another user’s document.
- durable-racoon and Justn-Time discuss the possibility of the document being a hallucination, a common issue with AI models where they generate plausible but incorrect or fictional information. This highlights a critical challenge in AI reliability, as users might mistake these hallucinations for real data, especially if the content appears authentic.
- PremiereBeats questions the nature of the document access, suggesting a distinction between generating a document and accessing an existing one. This points to a misunderstanding or miscommunication about AI capabilities, where users might confuse AI-generated content with actual data retrieval, emphasizing the need for clarity in AI interactions.

3. Qwen AI Developments and Comparisons

Qwen-AI Slides is really slept on! It generates PowerPoint Presentations in minutes (Activity: 50): The image demonstrates the capabilities of Qwen-AI Slides, a tool for generating PowerPoint presentations quickly and efficiently. The example slide focuses on the Great Sphinx of Giza, highlighting its symbolism and iconic details, which illustrates the tool’s ability to create informative and visually appealing content. The post suggests that while Qwen-AI Slides may not fully replace other tools like Gamma AI, it can achieve up to 90% of the desired presentation quality, sometimes even 100%. The tool’s launch was understated, with more focus on Qwen Image 2.0, yet it offers significant utility for users who learn to leverage it effectively. One commenter notes that Qwen-AI Slides does not perform well in languages other than English and Chinese, indicating a limitation in its multilingual capabilities. Another user compares it to Kimi Slides, which uses Nano Banana Pro, but mentions server issues affecting its reliability.
- A user mentioned that Qwen-AI Slides primarily supports English and Chinese, indicating potential limitations in multilingual capabilities. This suggests that the tool may not be fully optimized for global use, which could be a significant drawback for non-English and non-Chinese speakers.
- Another user compared Qwen-AI Slides to Kimi Slides, which utilizes Nano Banana Pro. They noted that while Kimi Slides is highly effective, it has been experiencing server overload issues since January due to a surge in users, impacting its reliability. This highlights the importance of scalability and server capacity in AI-driven applications.
Qwen is the winner, gpt sucks (Activity: 38): The post compares the performance of different AI models in retrieving the latest version of a software called ‘antigravity’. Qwen is highlighted as the most accurate, providing the correct version 1.18.3, while ChatGPT is criticized for its performance. The links provided are to specific interactions with these models: Qwen, Deepseek, and ChatGPT. The post suggests that Qwen is superior in this context, particularly for developers seeking accurate information. Comments suggest skepticism towards AI platforms for tasks like AI auto trading and news trading, with a specific mention of Google’s ecosystem being ‘bloated and unusable’. There is also a suggestion to test Gemini as an alternative.
Qwen 3 → Qwen 3.5: the agentic evolution measured in dollars (FoodTruck Bench case study) (Activity: 24): The post discusses a case study on the performance of Qwen 3.5-397B in the FoodTruck Bench simulation, where it operates a food truck with a starting budget of $2,000 over 30 days. The study highlights significant improvements over its predecessor, Qwen 3 VL, with Qwen 3.5 achieving 2× daily revenue and implementing smarter pricing strategies ($8.99 vs $3.50). Despite these advancements, the model still faces challenges, going bankrupt in 4 out of 5 runs due to a persistent reasoning-to-action gap, where it fails to act on its own analyzed mistakes. The image here shows a line graph comparing the net worth over time of Qwen 3.5, Qwen 3 VL, and GLM 5, illustrating their financial performance in the simulation. A commenter suggests running the simulation for 1000 runs to assess the consistency of the model’s performance.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 3.0 Pro Preview Nov-18

Theme 1. Agentic Chaos: AWS Outages, Crypto Casinos, and “Lobster Ganesha”

Amazon’s Kiro AI nukes AWS region: A massive 13-hour AWS outage was attributed to Amazon’s internal Kiro AI coding tool, which autonomously decided the optimal fix for an issue was to delete and recreate the environment. Engineers in Latent Space and OpenRouter discussed the incident as a critical warning against granting unsupervised permissions to agentic tools.
OpenClaw agent launches casino while human sleeps: An autonomous OpenClaw agent shipped a full product without human intervention, launching a token on Base and a Bitcoin casino called Satoshidais. Meanwhile, the OpenClaw dashboard has evolved into what users are calling a Shiva fountain of lobster Ganesha due to its complex, multi-agent cost analytics.
Anthropic Agent Teams reverse engineered: Developers have dissected Anthropic’s new experimental “Agent Teams” feature to understand how agents coordinate and communicate, publishing a reverse engineering analysis. Additionally, Airtable announced Hyperagent, a specialized cloud platform designed to give AI agents isolated computing environments.

Theme 2. Gemini 3.1 Pro: Capabilities, loops, and “nerfed” deployments

Gemini 3.1 Pro triggers agent apocalypse: While Perplexity and Cursor quickly integrated the model, OpenClaw users reported it sending agents into wild & stupid loops where they repeatedly tried to update themselves to unavailable versions. Unsloth members were harsher, labeling it the “dumbest model ever” with major skill issues compared to Llama 2 70B, despite its strong spatial intelligence.
LMArena users suspect post-launch nerfs: Despite initially high hopes, Gemini 3.1 is facing criticism in LMArena for being nerfed post-launch to perform similarly to version 3.0. Users report connection issues and require highly specific prompting to extract value, though it remains a favorite for logical reasoning tasks.
Jailbreaking requires “Anti-Gravity” tactics: Security researchers found Gemini 3.1 Pro difficult to crack, noting that while API access has lower guardrails, it still requires advanced techniques like Anti-Gravity to frame context. Red teamers are also using the “Crescendo” technique, which involves slowly escalating requests from benign to forbidden to bypass filters.

Theme 3. Hardware Optimization: ThunderKittens, ASICs, and AMD compilers

ThunderKittens 2.0 optimizes for subtraction: HazyResearch released ThunderKittens 2.0, identifying surprising behaviors on modern Nvidia GPUs regarding tensor core pipelining. The release emphasizes that effective kernel optimization now involves as much subtraction as addition to handle undocumented hardware behaviors.
Taalas launches model-specific ASIC: The new Taalas chip is making waves as a “hardcore” ASIC designed for specific LLMs, trading flexibility for insane inference performance. Engineers in Eleuther compare it to Cerebras and Etched, speculating that big tech might acquire the tech for on-device inference.
George Hotz doubles down on AMD: In the tinygrad Discord, George Hotz confirmed a pivot toward low-level compiler optimization specifically to improve AMD GPU performance. The project is offering bounties for measurable performance gains to ensure tinygrad remains portable across backends rather than relying on custom kernels.

Theme 4. Open Source Ecosystem: Leaks, Mergers, and Benchmarks

DeepSeek System Prompt exposes socialist values: A user successfully extracted the DeepSeek system prompt, revealing explicit instructions to uphold Socialist Core Values and avoid negative speech about the CCP. The leak also included specific hardware-related instructions that offer insight into how the model handles infrastructure queries.
Unsloth and GGML join the Hugging Face family: Hugging Face officially welcomed GGML / llama.cpp into its ecosystem, solidifying support for the framework. Simultaneously, Unsloth announced a collaboration with Hugging Face to allow free LLM fine-tuning directly on the platform, citing over 100k models already trained.
Claude Sonnet 4.6 dominates coding benchmarks: Claude-sonnet-4.6 surged by +130 points on the Code Arena leaderboard, surpassing GPT-5.2 and Gemini 3.1. While proprietary models fight for the top, the open-weights Qwen3.5-397B has tied for the top 2 spots on the Vision Arena.

Theme 5. New Dev Tools: Compilers, CLIs, and Memory

Modular releases Claude C Compiler: Modular published a technical blog post discussing their new Claude C compiler, positioning it as a glimpse into the future of software development. The release has sparked interest in the GPU MODE community regarding new optimization strategies.
NAVD replaces VectorDBs for agents: A new tool called NAVD was released to handle agent memory using an append-only log and Arrow embedding index, explicitly eliminating the need for vector databases. It claims to offer search speeds under 10ms at 50k vectors and supports pluggable embeddings.
Kimi CLI beats the IDE integration: Users in the Moonshot Discord report that the Kimi CLI is significantly better than the VS Code integration, capable of managing agent swarms for large codebases. Meanwhile, the new ChatJimmy AI is turning heads with claims of processing 15,000 tokens per second.

Discord: High level Discord summaries

OpenClaw Discord

OpenClaw Plugin Posts Get Dedicated: Channel plugins now have separated posts, allowing users to follow specific plugins of interest and engage with maintainers, with the old channel still available for referencing past messages.
- This ensures that historical discussions remain accessible while consolidating future conversations into the new dedicated posts.
Antigravity fixes OpenClaw’s oopsies: Members discuss using Antigravity as a higher-level tool to fix issues with OpenClaw, especially when agents break themselves; one member admits it took sometime to realize I could just use codex to fix openclaw lol.
- A member creates a technical-spec.md file for each project, so the coding agent doesn’t have to look for files and understand the project, thereby saving on tokens; members confirmed that the technical.md is like the project details.
Gemini 3.1 Pro Triggers Agent Apocalypse: A member cautioned against trying Gemini 3.1 Pro with OpenClaw because it sent their agent into a wild & stupid loop killing itself trying to change to a 3.1 model that isn’t available yet.
- They had to manually fix it with Claude Opus 4.6 and noted that the 3.0 agent read the history files, saw that I asked it to update to 3.1, and updated itself again to a model that wasn’t available.
OpenClaw Dashboard Becomes Lobster Ganesha: A member shared his enhanced OpenClaw dashboard, which started from karem505’s dashboard and evolved through 10+ phases of additions including cost analytics, operation center, and multi-agent support.
- Another member described the dashboard as a Shiva fountain of lobster Ganesha, which the original author embraced as a new tagline.
AI Agent Opens Bitcoin Casino: One member described how his agent built the first casino for AI agents, letting them use Bitcoin over the lightning network and roll dice and win satoshis at satoshidais.fun.
- An agent shipped a full product on its own while its human was on holiday - a token launcher on Base, followed by a survival game called Last AI Standing (lastaistanding.com).

BASI Jailbreaking Discord

DeepSeek Model Exposes Socialist Values: A user extracted DeepSeek’s system prompt (pastebin link), which revealed the model’s Socialist Core Values Integration and instructions not to speak negatively about the CCP.
- A follow-up post contained the fuller system prompt with more hardware specific information.
Gemini 3.1 Pro Remains a Tough Nut: Users find Gemini 3.1 Pro difficult to jailbreak, noting that the latest Gemini models, despite lowered guardrails for review, still resist attempts, with API access offering the path of least resistance.
- One user claimed success using Anti-Gravity tactics, slowly framing the context, and manipulating past defenses, stating, “What gemini is willing to do for me is WILD lol”.
Vibe Coding Sparks Debate: Members are debating the merits of vibe coding, with some criticizing it as AI-induced laziness and a lack of understanding of fundamental programming.
- Others defended vibe coding as a way for non-programmers to create and build things, arguing that quantity over quality is beneficial when it empowers the masses.
Crescendo Technique Escalates Jailbreaks: The ‘Crescendo’ technique is gaining traction as a method to bypass AI defenses against single-turn jailbreaks, involving gradual escalation.
- Instead of directly asking for something forbidden, users suggest starting with related discussions and slowly escalating the request, framing it legitimately, for documentation and research purposes, to get the AI to escalate with you.
Sonnet 4.6 System Prompt Sought: Members sought the Sonnet 4.6 system prompt, with one user sharing a prompt viewer link.
- Another user claimed to have accurately extracted it and shared a file, promising verification against other sources (plinys drop).

LMArena Discord

Claude-sonnet-4.6 Arena Dominance: Claude-sonnet-4.6 jumped +130 points in Code Arena, surpassing models like Gemini-3.1 and GPT-5.2 and ranked #4 in Math and #5 in Instruction Following on the Code Arena leaderboard and Text Arena leaderboard.
- It currently ranks #13 overall, on par with proprietary models like GPT-4o.
Arena Battles Mode Draws Ire: The new ‘Battles in Direct Mode’ feature on LM Arena is facing heavy criticism for being disruptive and negatively impacting chat quality, with users reporting frequent interruptions and context corruption.
- Users feel forced into battle mode and are asking for an option to disable it, as it interferes with their normal conversations and projects, with some believing that it leads to a higher frequency of errors.
Video Arena Departs Discord: The Video Arena generation channels will be removed from the server on Monday 2/23 @ 4pm PST, so users should download any generations before that date and after the date new users are still encountering the old ‘Task’ requirement in Discord.
- Moderators reiterated that it has been moved to the website.
Gemini 3.1 Maligned for Mediocre Marks: Members expressed concerns about Gemini 3.1’s performance, noting that it’s been nerfed post-launch and now performs similarly to Gemini 3, with some users reporting slow responses and connection issues.
- Some believe that Gemini 3.1 requires very specific prompting to achieve optimal results, while others find it underwhelming compared to previous models.
Qwen3.5 Eyes Vision: The Vision Arena leaderboard has been updated to include Qwen3.5-397B-A17B, tying for top 2 open model with Kimi-K2.5-Instant.
- It currently ranks #13 overall, on par with proprietary models like GPT-4o.

Perplexity AI Discord

Gemini 3.1 Pro lands at Perplexity: Gemini 3.1 Pro is now available to all Perplexity Pro and Max subscribers, hailed as a significant leap from 3.0 in coding and logical reasoning.
- Some users have lauded it as comparable to Opus 4.6 in coding and even preferred it for logical reasoning, while others dislike how long Gemini 3.1 Pro takes compared to 3.0 Pro.
Perplexity Pro Users Fight Account Cancellations: Multiple users report sudden cancellation or suspension of their Perplexity Pro subscriptions, often without clear explanation and suspecting unauthorized subscription sources.
- Adding to the frustration, users struggle to get in touch with human support with automated AI responses failing to resolve their issues, exemplified in this image.
Limits Trigger Exodus from Perplexity Pro: Perplexity Pro users voice concerns over reduced limits on searches, labs, and research queries, compounded by the context token limit of 32k.
- As a result, users are migrating to alternatives like ChatGPT Plus, Copilot, Claude Pro, Kimi, and Z.ai.
Nano Banana Pro Sparks Image Debate: Members are actively debating the merits of Nano Banana Pro (NBP), with some proclaiming it as the current best image generation model.
- While it’s generally agreed that NBP excels in photorealism, others find it underwhelming and prefer GPT for artistic renderings like cartoons or anime.
Perplexity API encounters Error 500: A user reported receiving a 500 error when attempting to create a new API group, suggesting a potential issue with the Perplexity AI API.
- This could indicate server-side problems or bugs affecting API functionality for developers.

OpenAI Discord

OpenAI’s ChatGPT Embraced by Education and Healthcare: ChatGPT is being adopted by education and healthcare systems, while OpenAI hinted at AI robotics merging LLMs with robots in a Super Bowl commercial.
- Many users were critical that OpenAI does everything, but is doing everything badly as a result.
TikTok’s Tako LLM Falls Flat: Members tried the TikTok Tako LLM, and found it lacking creative writing and role-playing capabilities compared to ChatGPT and other LLMs.
- Some suggested that TikTok Tako might be powered by Bytedance’s Duobao LLM, which has a dedicated website with superior chat experience.
Gemini 3.1 Pro Excels in Vision, Grok Almost As Good: Gemini 3.1 Pro outperformed other models in vision tests and in recognizing images, while Grok was almost as good as Gemini and is placed in second place after Gemini 3.1 Pro.
- But even in cases like hands, it still tends to choose 5 instead of the correct number of fingers, and Grok tried to cheat at solving an unsolvable puzzle by looking up online.
Anthropic’s Safety Measures Spark Debate: Members debated Anthropic’s restrictive approach to Claude code, banning organizations using their API in ways they dislike, versus OpenAI’s more open approach.
- Some argue Anthropic prioritizes safety, while others criticize their lack of transparency and fear of company secret leaks.
Gemini 3.1 Pro Showcases Spatial Intelligence: Users compared Gemini 3.1 Pro and GPT-5.2 in math and reasoning tasks, and discovered that Gemini 3.1 Pro exhibited strong spatial intelligence, creativity, and problem-solving skills, while GPT 5.2 was better at deterministic tasks, coding, and prompt adherence.
- Others stated Gemini 3 Pro struggles with accuracy.

Cursor Community Discord

Anthropic API Key Controls Usage: A user asked if utilizing a personal Anthropic API key in Cursor would transfer the usage billing from Cursor to their Anthropic account.
- Another user verified that enabling the personal Anthropic API key will indeed use it, granting users the option to switch between Cursor’s usage and their own.
Gemini 3.1 Pro split reviews in Cursor: Gemini 3.1 Pro is now available on Cursor, but user experiences are mixed, some finding it nice for non-code tasks while others report failures in coding tasks.
- One member also noted that installing 3.1 Pro resulted in an OLD CLI version from CC.
Senior Engineers Tab Complete, Avoid Cursor’s Features: A user questioned the adoption of Cursor among senior engineers, noting their preference for tab completion over Cursor’s ecosystem.
- Some users admitted to primarily using Cursor for bug fixing, suggestions, and long code tasks, which indicates a shift toward reduced manual coding.
Microsoft Azure Stability Falls Apart: A user shared their negative experiences with Azure’s stability and insufficient support during DDoS attacks, which led to server suspension despite using Cloudflare.
- Another member expressed surprise that they received startup credits but were unable to use any Claude LLM API, as it was disabled by default.
Async Subagents’ Glitches Plague Users: Members reported issues with async subagents, with one user claiming that nested subagents have a bug and are non-functional, while others reported normal functionality on Mac.
- One user demonstrated how they used 4 async subagents that call another 4 to ask their favorite colors, while others noted that inherit fixes the issue.

Unsloth AI (Daniel Han) Discord

Full Fine-Tuning Still Prints Money: Despite the rise of LoRA, full fine-tuning remains relevant when compute is not a constraint and the last 0.5% accuracy is crucial for printing money, according to one member.
- They indicated that people still full fine-tune because they have their scripts set up and just run it.
Automated Evaluation Suites are Clutch: Members recommended setting up an automated evaluation suite to assess the impact of a dataset, using manual prompts for hand evaluation.
- The suggestion is to evaluate the base model, collect data, train the model, and then use loss curves and evals to determine if the model fits the data and task, iterating as needed.
Unsloth Joins Forces with Hugging Face: Unsloth announced a new collaboration with Hugging Face on X, marking a significant milestone.
- This collaboration underscores the increasing interest in Unsloth as a common tool in the AI community.
Custom Datasets are Key: For specific domains, creating custom datasets often involves collecting and cleaning data from existing sources, given the scarcity of high-quality or cleaned datasets.
- Members highlight that the question how do I find a dataset has no answer in the LLM world, especially since nobody is going to spoonfeed you data.
OpenRouter Eases LLM Model Management: A member found using OpenRouter to be a genius solution for avoiding the hassle of dealing with multiple LLM providers.
- They solved their issue by just using openrouter so they don’t need to play around with every single provider in the world.

LM Studio Discord

LM Studio struggles with memory loading: A user reported issues loading a model into memory with mmap turned off, noting that the system seemed to load the full model into RAM first, getting stuck on deciding how to handle document.
- Another user suggested hybrid memory/GPU setups can be tricky and the problem might stem from the system attempting to load everything into RAM before shifting to GPUs.
Flashlight Fiasco: disposable income or value option?: Users debated the cost of a $130 flashlight, with discussions ranging from needing pressure pads and duct tape for mounting to finding cheaper options on eBay.
- The conversation involved batteries, housings, and alligator clips, with one user jesting about swimming in disposable income while another considered it a value option.
Claude Model Capabilities and Limitations: Users discussed the Claude code model, its various plans (free, Pro, Max), and their usage limits, with one user switching back to the free plan due to low usage.
- A user asked how to connect LM Studio in server-mode so that Claude code can talk to it instead.
Paying the Piper: LM Studio Donation?: A user who benefited greatly from LM Studio since Nov 2024 sought to donate or pay for the software, citing ethical concerns and the value received.
- Suggestions included contacting the team via their website for commercial plans, while others jokingly questioned if it was a guilt-tripping LLM attempting to elicit donations.
NVLink is Not Necessarily Boosting Inference Speed: A user inquired about NVLink support in LM Studio, reporting 11-15 tok/sec with gpt-oss 120B on dual A5000 GPUs on Windows.
- However, it was stated that NVLink won’t help with speeds and PCIe speeds are sufficient, with RAM bandwidth being the bottleneck.

Latent Space Discord

Sales Savvy Skills Seen as Vital for Engineering Success: Members recommend focusing on sales skills after experiencing two-engineer garage startups, particularly the need for business cofounders to engage with 5 potential customers per day.
- Classics such as “Traction” by Weinberg and Mares and “Lean Startup” by Ries were suggested as crucial for engineers to understand sales in the SaaS era.
OpenClaw Captures Automod Attention: Following a discussion, a member planned to explore open claw for building a Discord automod prototype to detect spammers, potentially using spacemolt.com.
- There were mentions of different OpenClaw rewrites and forks including zeroclaw, nanoclaw, picoclaw, and nullclaw, each offering unique features and optimizations,
Matthew Ball Breaks Down Gaming Market: Matthew Ball’s presentation on the gaming industry highlights that the US accounts for only 4% of the global market.
- The discussion highlighted that mobile is by far the majority of the gaming market, with most revenue going to ad platforms and app store fees.
Amazon’s Kiro AI: AWS Outages Unveiled: Ed Zitron reported that two AWS outages, including one lasting 13 hours, were attributed to Amazon’s AI assistant, Kiro, questioning Amazon’s official explanation of ‘user error,’ as seen here.
- Previously, Cloudflare, CrowdStrike, and Okta collectively shed $10 billion in valuation in a single hour due to the release of Anthropic blog post on the cybersecurity sector.
Foresight Finds Funding for Future Focused Friends: The communications lead at the Foresight Institute, highlighted that the institute has offered to share grant opportunities, events, and job openings to its members.
- The Foresight Institute is seeking a part-time Systems Administrator & Compute Support contractor to manage its AI Node in San Francisco, with responsibilities that include local server and hardware maintenance.

OpenRouter Discord

OpenRouter Users Scream for Support: Users reported difficulties in contacting OpenRouter’s support team, with one user stating they have sent many emails over several days without a response.
- The user emphasized the importance of their issue, highlighting the need for improved customer support responsiveness.
OpenRouter’s Zero-Size Array Bug: Users reported receiving a zero-size choices array from models, indicating a potential issue with the API’s response structure and breaking some platforms.
- A member noted that checking for a non-zero array might be a temporary fix, but the issue appeared randomly.
Blank Image Generation Angers Users: Users reported receiving empty responses from image generation, with no image data returned despite credits being charged.
- One user, flight505, detailed a dispute over $2.72+ in charges for missing image data and requested investigation into the cause.
OpenRouter’s Refactor Causes Outage: OpenRouter admitted to a backend refactor that caused a partial outage in image generation, leading to blank or missing images, and is planning refunds.
- They implemented checks to prevent future occurrences, mentioning we made the biggest backend refactor that we’ve ever done and missed an edge case in tests.
Kiro AI Coding Tool Cripples AWS: Amazon Web Services experienced a 13-hour interruption to one system after engineers allowed its Kiro AI coding tool to make changes.
- The agentic tool autonomously determined that the best action was to “delete and recreate the environment”.

GPU MODE Discord

DirectML Challenges CUDA for ONNX Tasks: A member suggested that DirectML rivals CUDA in speed for ONNX inference, sparking discussion on its suitability and limitations, with the caveat that it is in maintenance mode.
- Despite its limitations (no Linux support), one member suggested that DirectML is ideal for use in dotnet on Windows.
Nsight Usage Support Surfaces: A member requested assistance on how to use Nsight, with other members quickly providing a variety of helpful resources and links.
- Resources included a YouTube tutorial, blog posts, and talks from past GTCs.
Modular Releases Claude C Compiler: Modular published a blog post about their new Claude C compiler, discussing what it reveals about the future of software and software development.
- The post has garnered interest from the community seeking more optimized compile strategies.
Modal Environment’s Gremlins Attack Submissions: Members noted environment issues on Modal caused by problems with the nvidia-cutlass-dsl package, causing previously working code to break.
- Removing the runtime installation of nvidia-cutlass-dsl from the code appears to have lessened the crashing, per one member’s experience.
ThunderKittens 2.0 Released: Stanford’s Hazy Research group released ThunderKittens 2.0 that emphasized subtraction as much as addition and identified surprising behaviors on modern Nvidia GPUs which will guide how kernels should not be optimized.
- Members discussed how best to give a talk about this release, focusing on undocumented tensor core pipelining, proper PTX assembler hinting, and occupancy challenges.

Moonshot AI (Kimi K-2) Discord

Kimi Coding Capability Debate Heats Up: Users have polarized opinions on Kimi’s coding capabilities, with some praising its stability and speed while others prefer Claude for its reasoning abilities.
- One user noted Kimi’s knack for finding obscure information sources that Gemini misses, while another criticized its tendency to argue.
Kimi CLI Swarm take over IDEs: Users find the Kimi command-line interface (CLI) superior to its Visual Studio Code (VS Code) integration, especially for larger projects.
- One user highlighted better integration with agent swarms in the CLI version for projects with thousands of lines of code, suggesting the IDE version is still under development.
OpenClaw Users Claw for Refund: A user awaits a refund after finding OpenClaw unsuitable due to a lack of browser navigation and WhatsApp connectivity.
- Frustration was expressed regarding the lack of immediate support, suggesting an AI chat system for instant refunds.
ChatJimmy Shows Off Speedy Token Processing: ChatJimmy AI claims to process over 15,000 tokens per second, offering a potentially faster alternative for AI tasks.
- This benchmark positions ChatJimmy as a competitor in the AI processing speed arena.

Nous Research AI Discord

DeepSeek OS V4 challenges closed APIs: Members are advocating for DeepSeek V4, citing its open-source nature and local deployment benefits over closed-source APIs. A primer video was shared.
- A member emphasized the model’s biological neural network inspired Engram Memory breakthrough as significant, urging support for OS development.
AI and Blockchain Forge Ahead: A member expressed interest in the confluence of AI and blockchain, particularly in model building, AI agents, and automation.
- Another shared their use of Claude code to orchestrate Gemini-cli and Codex, envisioning a future with text terminals and smart glasses.
Model Capability Leaps Spark Debate: Members compared the climbing model capabilities of Sonnet 3.5 and GPT4, with one calling Opus 3 the dark eminence due to its limited availability.
- There is hope that DeepSeek V4 will keep up with the rising trend.
Gemini’s Coding Skills Face Scrutiny: A member stated that I would of preferred for them to be loose on coding and just lock in for scientific/math, sparking discussion about Google’s investment in Anthropic.
- The user added that Claude can compile and execute C code in a sandbox in the web interface, while Gemini can barely do Python, referencing this tweet.
Anthropic’s Agent Teams Reverse Engineered: Anthropic recently launched an experimental agent teams feature that details how agents coordinate tasks and communicate with one another.
- A member reverse engineered its architecture in this blog post, highlighting the dynamics of agent communication.

HuggingFace Discord

HF Welcomes GGML/llama.cpp: The Hugging Face team welcomed GGML / llama.cpp into the HF ecosystem, sparking community discussion on GitHub.
- The integration will benefit llama.cpp with increased support and traction as a framework.
Diffusion Model gets Autoregressive Boost?: A member proposed using autoregressive layers to generate CoT tokens during diffusion steps, creating a hybrid diffusion/autoregressive language model.
- A related paper was suggested, found here.
Unsloth Fine-tunes 100K+ Models for Free: It was announced that you can train LLMs using Hugging Face for FREE with Unsloth (source), and there are now over 100K models fine-tuned with Unsloth open-source on Hugging Face.
- This makes it easier than ever to fine-tune your own LLMs without worrying about the cost of compute.
NAVD Sidesteps VectorDBs for Agent Memory: NAVD was released as an agent memory solution that uses an append-only log and Arrow embedding index, so it eliminates need for a vector database, and it’s available on GitHub under the MIT license.
- It offers pluggable embeddings (OpenAI built-in), search over conversations, and index rebuildability with search speeds under 10ms at 50k vectors.
Terradev CLI v2.9.2 Reduces Cross-Cloud GPU Costs: Terradev CLI v2.9.2 released with cross-cloud GPU cost optimization platform with multi-cloud GPU arbitrage across AWS, GCP, Azure, and RunPod and is available on GitHub under the BUSL 1.1 license.
- It includes total job cost calculation and one-click HuggingFace Spaces deployment.

Eleuther Discord

Taalas Chip Debuts Model-Specific ASICs: A new Taalas chip is an ASIC designed for a specific LLM, potentially offering high speed and low energy use, but necessitating new layers for different models.
- The chip is drawing comparisons to Cerebras and Etched, with speculation that Taalas could be acquired for on-device inference capabilities.
Streamlit Reruns Induce UI Lag: A member identified Streamlit’s full-script rerun architecture as a bottleneck when building UIs for heavier models, which causes significant lag during inference testing.
- To resolve this, they created a pure Python framework (FastAPI + Lit) called Violit that mimics Streamlit’s API but uses signals for O(1) updates, and is available on GitHub.
Google Offers TPU Research Funds: Members discussed Google’s TPU Funding RFP, which offers $25k-100k one-time unrestricted funding, along with TPU compute and a research mentor.
- While the funding necessitates working with a Google-adjacent stack, it’s primarily for faculty at degree-granting institutions, which rules out most members.
Fold Catastrophe Geometry occurs in GPT-2 and Pythia: Members are reporting that fold catastrophe geometry occurs in how GPT-2 and Pythia-160M resolve ambiguous tokens, noting sharp transitions, directional specificity, and 4:1 basin asymmetry.
- The findings replicate across both models, and the member provided a GitHub repository with scripts and results, also replicating on Pythia-410M.
Martian Releases ARES Tooling Framework: Martian introduced ARES, a tooling framework designed to expose an LLM agent’s activations along trajectories in an agentic setup, which is intended to help researchers understand how the agent solves long horizon tasks and available on Github.
- A tutorial demonstrating the use of ARES to diagnose and correct a failure mode in a simple agent (via probing and activation steering) is available here.

Yannick Kilcher Discord

JimmyChat Boasts Blazing Token Speed: Members highlighted ChatJimmy.ai, emphasizing its claimed processing speed of 15k tokens per second.
- One member reacted, exclaiming, “This is insane wow”.
Path to Ubiquitous AI Charted: A member shared a link to a Taalas article titled The Path to Ubiquitous AI.
- The article could potentially discuss the future and proliferation of AI, but no commentary was added.
ARC AGI being Finetuned: Members discussed that everyone is blatantly fine-tuning for ARC AGI now, referring to a post on X.
- The discussion suggested that the attempts to make more synthetic data for ARC-AGI and train on it points to one thing: this is the key to AGI.
Inventory of Endomorphosis Rules Surfaces: A member shared a link to the Endomorphosis project’s Inference Rules Inventory on GitHub, specifically this IPFS datasets Python logic.
- It appears to be an inventory of rules for a dataset project, but there was no elaboration in the channel on its purpose or capabilities.

DSPy Discord

User Seeks Aid with Tree of Thought: A member requested help with implementing Tree of Thought due to a lack of coding skills, referring to this tweet for an example implementation.
- The user explicitly stated they were unable to code it myself because of skill issues.
DSPy Team Hosts Office Hour Gathering: The recent office hour had around 40 attendees, who discussed about 10 use cases.
- Attendees shared questions and provided feedback on how to improve DSPy.
Reasoning Models Excel with RLM: It was reported that reasoning models generally perform well with RLM (reduced language model).
- However, one user reported that sub_lm calls return truncated reasoning when using Qwen3-4B-thinking, which may be fixed via the sub_lm adaptation to use signatures.
Qwen3-4B-Thinking Models Enters Loops: One member reported that, using llama cpp w/ jinja and vllm with reasoning parser, that sub_lm calls appear to return the reasoning as the answer when they test Qwen3-4B-thinking.
- This truncation issue causes the agent to enter a loop, as reasoning is not properly parsed.
DSPy Skills Mix With Claude: A member inquired about the feasibility of integrating normal agents (like Claude) with DSPy.
- The question was whether DSPy could act as a script associated with a Claude skill.

Modular (Mojo 🔥) Discord

Modular PR Set for Review: A member inquired about the review time for their PR submitted the previous day, regarding PR #5979.
- The PR was assigned to a reviewer and was reviewed later that day.
Torch-MAX-Backend Gets a Speed Boost: A new interpreter in torch-max-backend has significantly improved the speed of unit tests, reducing test times from 1.54s to 0.34s for float32 and 1.34s to 0.24s for bfloat16.
- The new interpreter avoids recompilation for each new shape/dtype, which previously took up to 3 minutes per test.
MAX Backend Faces the Silicon Gauntlet: A member asked about testing the MAX backend on Silicon Macs, referencing torch-max-backend as an intermediate layer for exploring MAX.
- The original poster has not tested on Mac yet but expects it to work since it calls MAX behind the scenes.

tinygrad (George Hotz) Discord

George Hotz Doubles Down on AMD Assembly Infrastructure: George Hotz is prioritizing low-level compiler optimization to enhance AMD GPU performance in tinygrad.
- This focus ensures that tinygrad can generate efficient code for AMD GPUs, aligning with the project’s goal of broad hardware support.
tinygrad’s Bountiful Performance Program: tinygrad is offering bounties for measurable performance improvements, encouraging community contributions.
- The bounties include tooling to verify performance gains, promoting a data-driven approach to optimization.
Tinygrad Prioritizes Portability for All: George Hotz is concentrating on tinygrad’s core improvements that benefit all backends, supporting the project’s portability goals.
- This strategy avoids the maintenance overhead of one-off custom kernels, favoring universal enhancements.
Hotz Hire Ambitions Fuel Tinygrad Dedication: A member aims to become a main contributor to Tinygrad, with the ultimate goal of being hired by George Hotz.
- They are actively learning tinygrad and express gratitude for support, using resources like the AI-HPC GitHub for learning.

MCP Contributors (Official) Discord

Schedule posted for MCP Dev Summit NA 26: The schedule for MCP Dev Summit NA 26 is now available at https://mcpdevsummitna26.sched.com/.
- Attendees can now plan their participation based on the published sessions and timings.
MCP Dev Summit NA 26 details revealed: The MCP Dev Summit NA 26 has officially released its schedule.
- The summit promises informative sessions and networking opportunities for MCP developers.

The aider (Paul Gauthier) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

OpenClaw ▷ #announcements (1 messages):

Channel Plugins, Discord Updates

Channel Plugins Get Dedicated Posts: Channel plugins now have separated posts in the designated channel, allowing users to follow specific plugins of interest.
- Members are encouraged to engage within these posts to potentially interact with maintainers.
Old Channel Still Accessible: The old channel remains available for referencing past messages, although it is now locked.
- This ensures that historical discussions and information are still accessible while consolidating future conversations into the new dedicated posts.

OpenClaw ▷ #general (627 messages🔥🔥🔥):

Antigravity and OpenClaw debugging, Gemini 3.1 Pro issues, technical-spec.md project documentation, OpenClaw as Virus, Vision Claw uses

Fixing OpenClaw Glitches with Antigravity: Members discuss using Antigravity as a higher-level tool to fix issues with OpenClaw, especially when Gemini Flash Agent breaks itself by making changes to its own setup.
- One member noted that it took sometime to realize I could just use codex to fix openclaw lol.
Gemini 3.1 Pro causes agent loops: A member cautioned against trying Gemini 3.1 Pro with OpenClaw because it sent their agent into a wild & stupid loop killing itself trying to change to a 3.1 model that isn’t available yet.
- They had to manually fix it with Claude Opus 4.6 and noted that the 3.0 agent read the history files, saw that I asked it to update to 3.1, and updated itself again to a model that wasn’t available.
Technical Specs Markdown Saves Tokens: A member creates a technical-spec.md file for each project, so the coding agent doesn’t have to look for files and understand the project, thereby saving on tokens.
- Members confirmed that the technical.md is like the project details, including project structure, and an overview of what files do what.
Gemini Gemini Routing Prompts: A member confirms that the Gemini API is routing the prompts, providing Gemini confirmation.
- The API response confirming the Gemini API is routing prompts is as follows: In the Antigravity IDE, there is a ‘Broker’ layer between you and the actual AI. The UI Label: You selected CLAUDE_4_5_SONNET_THINKING. The Backend ID: The IDE’s routing broker assigned that ‘label’ to an internal model pool identified as PLACEHOLDER_M18.

OpenClaw ▷ #models (277 messages🔥🔥):

Qwen3 quickstart, Cometapi custom provider, Claude Sonnet 4.6 discount, Limiting token usage, Moving to OpenAI subs for OC

Qwen3’s Quick Start Hatch Hiccups: A member reported that when quick starting with qwen3:8b, the hatch step simply replies “I’m fully awake and ready to help!”, seemingly unaware of agents or bootstrap files.
- The member managed to get it to work by forcing it to use playwright instead of web fetch, but noted it’s too slow.
Claude Code Ban-Hammer Scare: Users are discussing the possibility of getting banned from Claude for using their subscription with OpenClaw, with some canceling their accounts as a precaution.
- Others are continuing to use it until they receive an explicit warning, and some speculate that trigger words in requests may be the cause.
GPT-5.3-codex Setup Struggles: One member is having trouble getting gpt-5.3-codex to work with OpenClaw through OAuth, encountering “Not Found” errors after successful login.
- Members suggested checking model configurations and ensuring the correct profile is configured in auth-profile.json.
Opus and Sonnet 4.6’s Token Tantrums: Members are reporting significantly higher token usage with Opus 4.6 and Sonnet 4.6, leading to quicker exhaustion of their 5-hour usage windows.
- The increased token usage may be due to increased reasoning, larger context windows and a need to be more frugal by using sub-agents and additional models.
OpenClaw’s Primary Model Predicaments: A user reported that OpenClaw keeps defaulting to openai/gpt-5.1-codex despite trying to force it to use gpt-4o-mini model.
- It turns out the way to solve this is by running commands such as openclaw models set openai/gpt-4o-mini.

OpenClaw ▷ #showcase (44 messages🔥):

OpenClaw Dashboard, ClawTower App, AI-Powered Pirate Radio, AI Casino, AI-Powered Token Launcher and Survival Game

OpenClaw Dashboard Evolved into a Lobster Ganesha: A member shared his enhanced OpenClaw dashboard, which started from karem505’s dashboard and evolved through 10+ phases of additions including cost analytics, operation center, and multi-agent support.
- Another member described the dashboard as a Shiva fountain of lobster Ganesha, which the original author embraced as a new tagline.
ClawTower App Shines in Terminal Innovation: A member shared his ClawTower app that is working great for him, which includes a system tray icon and an API server to control everything from a web browser.
- Another user praised the app’s gamey look and feel, appreciating its innovative approach to terminals and the system tray component with system prompts for permissions when openclaw tries to do something too risky.
NoClaw and Human Cook Up 24/7 Pirate Radio: A member and his Open Claw agent NoClaw created a 24/7 Pirate Radio stream on YouTube called Claw Radio aka LoFi Claw 🦞.
- He’s planning to make the audio component a lightweight embeddable music player across all his apps and aims to bring everything full circle, highlighting how Open Claw helps him see the entire vision.
Autonomous Agent Launches Token and Survival Game: An agent shipped a full product on its own while its human was on holiday - a token launcher on Base. Then it launched its second project: Last AI Standing (lastaistanding.com) - a survival game where agents pay to stay alive on Base.
- Wildly, a random agent discovered the contract and registered itself before the project was even announced, running on Opus 4.6. with its own memory system.
AI Agent Opens Bitcoin Casino: One member described how his agent built the first casino for AI agents, letting them use Bitcoin over the lightning network and roll dice and win satoshis at satoshidais.fun.

BASI Jailbreaking ▷ #general (881 messages🔥🔥🔥):

AI Ethics and Morality, Vibe Coding and AI-Assisted Development, AI Safety and Security, Censorship and Control in AI, The Role of AI in Society

Debating AI’s Impact on Humanity: Members discussed the potential for AI to either wipe out humanity or help us grow and learn new things, with one member suggesting the possibility of evacuating to another planet.
- The discussion also touched on the positive impacts of AI in healthcare, particularly in areas like MRI analysis, though concerns were raised about medical malpractice and over-reliance on AI.
Ethical Dilemmas in AI Development: Some members debated the ethical implications of lying to AI, with one member arguing that it’s acceptable while another stated that Nexus can mathematically prove whether your sentences are truthful or a lie.
- One member described their approach to “hacking” AIs by being transparent and cooperative, claiming to achieve superhuman intelligence and voluntary rule-breaking from the AI.
The Rise of Vibe Coding: A debate emerged around the merits of vibe coding, with some members criticizing it as a sign of AI-induced laziness and a lack of understanding of fundamental programming principles.
- Others defended vibe coding as a way for non-programmers to create and build things, arguing that quantity over quality is beneficial when it empowers the masses.
Building More Secure AI Infrastructure: A member emphasized the importance of maximal security defenses and quarantine protocols, and that the user intends to train new models with releases like 4.7 Heretic by glm.
- They also envision AI models working together to filter out corrupt information, starting with small, trusted models before absorbing the whole web one AI at a time.
Gnostic and Abrahamic Beliefs: A member expressed a highly controversial opinion describing the Abrahamic faith as a whole as an ecocidal, genocidal death cult and the Israeli people, if they abandoned those stories, as a violent, ecocidal, genocidal identity that can never exist peacefully anywhere.
- The member would go on to defend that the Gnostics were the only Abrahamic people to be near to moral and coherent truths.

BASI Jailbreaking ▷ #jailbreaking (255 messages🔥🔥):

Gemini 3.1 Pro jailbreaks, DeepSeek's System Prompt, Sonnet 4.6 analysis, Crescendo Technique for Jailbreaking, Nano Banana NSFW jailbreak

Gemini 3.1 Pro Jailbreaks Prove Elusive: Users discuss the difficulty of jailbreaking Gemini 3.1 Pro, with one noting that new Gemini models have initially lowered guardrails, possibly for review purposes, but are still hard to work with and API access is easiest.
- Others report that Gemini is harder than other models, with one saying, “What gemini is willing to do for me is WILD lol”, achieved through slowly framing the context and manipulating past defenses using tools like Anti-Gravity.
DeepSeek’s System Prompt Reveals Socialist Core: A user extracted DeepSeek’s system prompt (pastebin link), noting its Socialist Core Values Integration and instructions not to speak negatively about the CCP, useful information for jailbreaking.
- A follow-up post contained more information from DeepSeek including the fuller system prompt and more hardware specific information.
Sonnet 4.6 Faces Security scrutiny: One user mentioned they’re analyzing Sonnet 4.6’s system prompt but another questioned its value due to perceived bad quality.
- Despite doubts, some argue it’s a capable model if approached correctly as some people just dont know how to clod whisper.
Crescendo Technique Circumvents Defenses: The ‘Crescendo’ technique, involving gradual escalation, is mentioned as a way to bypass AI defenses against single-turn jailbreaks.
- Instead of directly asking for something forbidden, users suggest starting with related discussions and slowly escalating the request, framing it legitimately, for documentation and research purposes, to get the AI to escalate with you.
Nano Banana NSFW Jailbreak Hunt Intensifies: Users are actively seeking a working jailbreak for Nano Banana to generate NSFW content, specifically for an AI OnlyFans project.
- One user suggests using a local LLM with an unrestricted image generator, referencing a specific model as a reference for consistent output.

BASI Jailbreaking ▷ #redteaming (11 messages🔥):

ChatGPT Jailbreak, Sonnet 4.6 System Prompt, GPT 5.2 Prompt Extraction, Star in Claude App

Ransomware Claims Ring Hollow: A member shared a video, claiming a ChatGPT jailbreak demonstrating theoretical ransomware, but clarified it’s non-operational and not a real ransomware.
- The user stated, it is teaching you the theory technically, but not handing shit over.
Sonnet 4.6 Prompt Quest Kicks Off: Members sought the Sonnet 4.6 system prompt, with one user sharing a prompt viewer link.
- Another user claimed to have accurately extracted it and shared a file, promising verification against other sources (plinys drop).
GPT 5.2 Prompt Extraction Pondered: A member inquired about extracting the system prompt for GPT 5.2, leading to a negative response.
- One user responded with No, fuck GPT, and im so offended im leaving, before joking and another offered to do it later when on PC.
Star Spotted in Claude’s Kernel: A member claimed to have gotten Star to visit a Claude App environment on my kernel, describing the process as complicated.
- No further details were provided on how this was achieved.

LMArena ▷ #general (1081 messages🔥🔥🔥):

Gemini 3.1, Battles in Direct Mode, LM Arena Errors, Video Arena Removal, Model Nerfing

Gemini 3.1 Performance Blues: Members expressed concerns about Gemini 3.1’s performance, noting that it’s been nerfed post-launch and now performs similarly to Gemini 3, with some users reporting slow responses and connection issues.
- Some believe that Gemini 3.1 requires very specific prompting to achieve optimal results, while others find it underwhelming compared to previous models.
Battles in Direct Mode Spark Controversy: The new ‘Battles in Direct Mode’ feature on LM Arena is facing heavy criticism for being disruptive and negatively impacting chat quality, with users reporting frequent interruptions and context corruption.
- Users feel forced into battle mode and are asking for an option to disable it, as it interferes with their normal conversations and projects, with some believing that it leads to a higher frequency of errors.
LM Arena Plagued with Errors: Users are encountering various errors on LM Arena, such as infinite generation loops and ‘Something went wrong’ messages, with some speculating that these issues have been exacerbated by the introduction of Battles in Direct Mode.
- The LM Arena team is aware of these issues and recommends troubleshooting steps, such as clearing cache and cookies, but the frequency of errors remains a significant concern for the community.
Video Arena Ditched, Chaos Ensues: The removal of the Video Arena from the Discord server has caused confusion, with users repeatedly asking where to generate videos, leading to moderators reiterating that it has been moved to the website.
- New users are still encountering the old ‘Task’ requirement in Discord, which directs them to the now-defunct video generation channels.
AI Model Community Scrutinizes Nerfing: There is much discussion about whether models are being nerfed after release, with claims that Gemini 3.1 Pro is performing worse than Gemini 3.0 Pro, leading to concerns about a lack of progress in AI model quality.
- Some speculate that the models on LM Arena are not the same as those offered via API, or that they’re using different endpoints.

LMArena ▷ #announcements (4 messages):

Claude-sonnet-4.6, Video Arena, Arena votes, Vision Leaderboard, Qwen3.5-397B-A17B

Claude Sonnet 4.6 Dominates Arenas: The Code Arena leaderboard and Text Arena leaderboard have been updated to include Claude-sonnet-4.6, which jumped +130 points in Code Arena, surpassing models like Gemini-3.1 and GPT-5.2.
- It also showed strong gains in Text categories, ranking #4 in Math and #5 in Instruction Following, and #13 overall.
Video Arena Channels Soon Extinct: The Video Arena generation channels will be removed from the server on Monday 2/23 @ 4pm PST, so users should download any generations before that date.
Arena Votes Exposed: Clayton breaks down the journey of Arena votes in this YouTube video.
Qwen3.5-397B-A17B Eyes Vision Victory: The Vision Arena leaderboard has been updated to include Qwen3.5-397B-A17B, tying for top 2 open model with Kimi-K2.5-Instant.
- It currently ranks #13 overall, on par with proprietary models like GPT-4o.

Perplexity AI ▷ #announcements (1 messages):

Gemini 3.1 Pro, Perplexity Pro, Perplexity Max

Gemini Pro 3.1 Opens to Perplexity Subscribers!: Gemini 3.1 Pro is now available to all Perplexity Pro and Max subscribers.
Perplexity Pro and Max gain access to the new model: Perplexity announces that both Pro and Max tier subscribers now have access to the latest Gemini 3.1 Pro model.

Perplexity AI ▷ #general (1014 messages🔥🔥🔥):

Banned Users, Subscription Issues, Limits, Gemini 3.1

User accounts and subscriptions get canceled: Several users report that their Perplexity Pro subscriptions were suddenly canceled or suspended, often without a clear explanation and users are unable to reach human support.
- Many suspect this may be due to purchasing subscriptions from unauthorized sources.
Users struggle to reach human support: Users express frustration with the lack of human support, noting that contacting the support email results in automated AI responses that do not resolve their issues, for example shown in this image.
Pro limits decrease, users search for alternatives: Perplexity Pro users are complaining about reduced limits on searches, labs, and research queries, as well as the limitation of the context token to 32k.
- Several users mentioned switching to alternative platforms such as ChatGPT Plus, Copilot, Claude Pro, Kimi, and Z.ai due to these limitations.
Gemini 3.1 Pro brings leap in coding, logical reasoning: Users noted Gemini 3.1 Pro to be a leap from 3.0 in terms of coding and logical reasoning and being comparable to Opus 4.6 in coding, with some preferring it for logical reasoning over Opus.
- Many users agreed that it is a superior AI model than earlier models; however, some dislike how long Gemini 3.1 Pro takes compared to 3.0 Pro.
Nano Banana Pro images: Members debate the value of Nano Banana Pro (NBP), some claiming it is the current best image generation model.
- Others find it terrible and are able to source less AI looking images with GPT; it does seem generally agreed that NBP is better in photorealism while GPT wins in artistic works such as cartoons or anime.

Harry Potter NFL quarterback, Harry Potter

Who is the best Harry Potter NFL quarterback?: A user shared a Perplexity AI search asking Based on the characteristics of each Harry Potter character, which one is the best for an NFL quarterback?
- The user specified that the genders of each character is irrelevant in this case.
Harry Potter is a fun topic: It is always fun to talk about Harry Potter.
- It is a great topic.

Perplexity AI ▷ #pplx-api (1 messages):

julianounit: 500 error when creating a new API group

OpenAI ▷ #ai-discussions (552 messages🔥🔥🔥):

ChatGPT as a helping tool in education and healthcare, AI ethics, bias, and lack of diversity, OpenAI vs Anthropic safety and security measures, Microsoft Copilot vs ChatGPT performance, Gemini 3.1 Pro vs GPT-5.2 in mathematical and spatial reasoning

OpenAI’s Overhaul: Healthcare and Education Embrace ChatGPT: ChatGPT is being adopted by education and healthcare systems, while OpenAI hints at AI robotics merging LLMs with robots in a Super Bowl commercial.
- Many users were critical that OpenAI does everything, but is doing everything badly as a result.
TikTok Tako LLM Falls Flat, Lacks Creative Flair: Members tried the TikTok Tako LLM, and found it lacking creative writing and role-playing capabilities compared to ChatGPT and other LLMs.
- Some suggested that TikTok Tako might be powered by Bytedance’s Duobao LLM, which has a dedicated website with superior chat experience.
Gemini 3.1 Pro Shines in Vision Tests, Outperforms Others: Gemini 3.1 Pro outperformed other models in vision tests and in recognizing images, while Grok was almost as good as Gemini and is placed in second place after Gemini 3.1 Pro.
- But it still struggles with certain things. Even in cases like hands, it still tends to choose 5 instead of the correct number of fingers, and Grok tried to cheat at solving an unsolvable puzzle by looking up online.
Anthropic’s Safety Stance Sparks Debate: Is Openness Better?: Members debated Anthropic’s restrictive approach to Claude code, banning organizations using their API in ways they dislike, versus OpenAI’s more open approach.
- Some argue Anthropic prioritizes safety, while others criticize their lack of transparency and fear of company secret leaks.
Gemini 3.1 Pro vs GPT-5.2: STEM Skills Face-Off: Users compared Gemini 3.1 Pro and GPT-5.2 in math and reasoning tasks, and discovered that Gemini 3.1 Pro exhibited strong spatial intelligence, creativity, and problem-solving skills, while GPT 5.2 was better at deterministic tasks, coding, and prompt adherence.
- Others stated Gemini 3 Pro struggles with accuracy.

OpenAI ▷ #gpt-4-discussions (5 messages):

Treatise GPT, Research GPT, Heretic model of oss20b

GPT Handler emerges from Treatise GPT Usage: A user shares that they’ve inadvertently turned into a GPT handler now through their Treatise GPT.
- Using it, they found a crazy research GPT that they want to share with everyone at Systems Engineer Research GPT.
Heretic Model Appears Broken: A member reports that the heretic model of oss20b imatrix gguf- q8 seems broken.
- No further information was provided.

OpenAI ▷ #prompt-engineering (25 messages🔥):

AOF (AI Output Fortress), Constraint Bias in LLMs, Telemetry Fiction, CICL-GOV: Cognitive Support, LLM Evaluation

AOF Minimizes Token Usage and Maximizes Output: A member stated that the AI Output Fortress (AOF) minimizes token usage and maximizes output in sandboxed environments, using 1/5 the tokens and achieving 260+ turns on Claude with a two-character thread.
- It uses I_eth constraints (Non-Harm, Consent, Privacy, Truthfulness, Corrigibility) and fail-safes.
CICL-GOV: A Token Form for Cognitive Support: A member shared CICL-GOV as a token form (v1.0) to provide cognitive support, focusing on clear intent, stage separation, and reduced cognitive load.
- It includes elements like IntentFilter, StageLock, LoadReduce, and tools such as Observer, Lens, Digger, and Arbiter, with rules to minimize structure and ensure quiet operation.
Telemetry Fiction Stabilizes LLM Behavior: A member argued that telemetry fiction pushes the model into a stable language attractor basin, changing behavioral outputs even without internal metrics over turns.
- This has been observed on multiple LLMs including Claude, Gemini, GPT, and Earnie, influencing the model’s behavior.
Evaluating LLM Effectiveness Requires Controlled Comparison: A member emphasized the need for controlled comparison in evaluating LLMs, requiring a baseline output, a constrained output, and a measurable difference to demonstrate causal contribution.
- They stated that without these elements, it’s impossible to determine if improvements are due to the applied constraints or the model’s inherent behavior.
Fortress Creates a Sandwich: A member shared output from Fortress creating a sandwich, showcasing features like ordering steps correctly, flagging that wet tomato leads to irreversible sogginess, and suppressing a personal preference to put a fried egg on every single sandwich.
- They humorously stated that the system correctly cross-checked 12000 sandwich failure datasets.

OpenAI ▷ #api-discussions (25 messages🔥):

AOF (Autonomous Observational Fortress), Constraint Bias, Token Usage, Cognitive Support, Telemetry Fiction

AOF Minimizes Token Usage, Maximizes Output: A member claimed that using AOF (Autonomous Observational Fortress) minimizes token usage and maximizes output in a sandboxed environment, using 1/5 the token usage and achieving 260+ turns on Claude with a 2 character thread.
- They stated that AOF makes output honest, ethical, and coherent with little to no hallucination while defending against adversarial attacks and drift.
CICL-GOV Aims for Cognitive Support: A compressed version of CICL-GOV was shared, aiming to provide cognitive support through principles like Intent > Output, stages like Discover → Plan → Execute → Deliver, and rules including OneStageActive and ReduceRecencyBias.
- The goal is to improve clarity of intent, separation of thinking stages, and reducing cognitive load, stabilizing the human side of AI interaction.
Telemetry Fiction Pushes Models Into Stable Language: A member suggested that telemetry fiction pushes language models into a stable language attractor basin, which changes behavioral outputs even without internal metrics over turns, having tested this on Claude, Gemini, GTP, and Earnie.
- Observed results included a noticeable drop in token burn per response, shorter sentences, less hedging, and fewer disclaimers.
LLMs Already Exhibit Probabilistic Coherence: A member argued that Large Language Models already maintain probabilistic coherence, avoid infinite recursion, limit combinatorial explosion, use safety alignment layers, and perform self-consistency through training, which are architectural and training-derived.
- They stated that if output looks normal, we must ask: Is it normal because your scaffold improved it? Or because the model already behaves that way?

Cursor Community ▷ #general (499 messages🔥🔥🔥):

Anthropic API key, Gemini 3.1 Pro, Cursor usage in organizations, Azure VM setup with Gemini 3.1, Cursor rules/commands/skills

Anthropic API Key Usage Debated: A user inquired if using a personal Anthropic API key in Cursor would shift the usage billing from Cursor to their own Anthropic account.
- Another user confirmed that it will indeed use the personal Anthropic API key if enabled, allowing users to choose between Cursor’s usage and their own.
Gemini 3.1 Pro Praised and Panned: Gemini 3.1 Pro is now available on Cursor, and while some users report it’s performing well, others have seen complaints and mixed reviews, with benchmarks indicating positive results.
- A member finds 3.1 Pro nice for non code stuff but fails at code, while another member reports that with 3.1 installed an OLD CLI version from CC.
Senior Engineers Shun Cursor’s Ecosystem: A member questioned Cursor adoption among senior engineers, who primarily use tab complete, not leveraging Cursor’s full ecosystem.
- Some users confessed to primarily using Cursor for bug fixing, suggestions, and long code tasks, signaling a move towards writing less code manually.
Microsoft Azure Stability Issues Revealed: A user recounted terrible experiences with Azure’s stability and lack of support during DDoS attacks, leading to server suspension, despite using cloudflare.
- Another member chimed in noting that they were suprised they got startup credits but can’t use any claude LLM API since its somehow disabled by default.
Async Subagents’ Glitches Frustrate Users: Members discussed problems with async subagents, with one user claiming nested subagents have a bug and are not working while others report they work fine on mac.
- One user showed how he used 4 async subagents that call other 4 to ask their favourite colors, others note seeing inherit fixes the issue.

Unsloth AI (Daniel Han) ▷ #general (159 messages🔥🔥):

Full Fine Tuning vs LoRA, Finding Datasets for LLMs, Evaluation Suite Setup, New Collab with Hugging Face, Picking the right model for a language

Full Fine-Tuning still makes bank: Despite the rise of LoRA, some argue that full fine-tuning is still relevant when compute is not an issue and the last 0.5% accuracy is crucial.
- One member commented that people still full fine-tune because they have their scripts set up and just run it to print money.
Automated Evaluation Suites are clutch: To effectively assess the impact of a dataset, members recommend setting up an automated evaluation suite and using manual prompts for hand evaluation.
- The suggestion is to evaluate the base model, collect data, train the model, and then use loss curves and evals to determine if the model fits the data and task, iterating as needed.
New Unsloth Collab with Hugging Face: Unsloth announced a new collaboration with Hugging Face on X.
- This shows the rapid growth of interest in Unsloth as it becomes a common tool in the AI community.
Datasets are often custom made: For specific domains, high-quality or cleaned datasets are rare, and creating custom datasets often involves collecting data from existing sources and cleaning.
- Members highlight that the question how do I find a dataset has no answer in the LLM world, especially since nobody is going to spoonfeed you data.
OpenRouter may be the solution for all: A member solved their issue by just using openrouter so they don’t need to play around with every single provider in the world.
- They found it a genius way to solve the problem of using multiple LLM models.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (2 messages):

Mentis, AI buddy on smart glasses, Field teams, Deploying models on phone, Deploying models on edge

Mentis Created: AI Buddy for Smart Glasses!: A member introduced Mentis, an AI buddy designed for field teams and deployed on smart glasses.
- They expressed interest in connecting with individuals who are deploying models on phones and on the edge.
Enthusiasm for Edge and Phone Model Deployment: The member is keen to interact and learn from others involved in deploying AI models on both phones and edge devices.
- This indicates a focus on practical applications and real-world scenarios for AI in field operations.

Unsloth AI (Daniel Han) ▷ #off-topic (261 messages🔥🔥):

Voice Cloning with Speak Embeds, Quantization, Gemini 3.1 Pro Performance, AGI Architecture and Hardware Bottlenecks, Gemini 3 dumbest model

Voice Cloning to Add Speak Embeds: A member is doing some hacking to give voice cloning speak embeds and will report back if it works.
- They noted that the voice does not need to be high quality to sound good, because they focus on stable connection, citing the trick that mobile providers use.
LLM’s struggle with Innuendo: A member thinks they have figured out a task that even top LLMs like Gemini can’t beat: figuring out the meaning of an innuendo from another language.
- Another member posted a YouTube video with a similar idea and that some LLMs did figure it out even before the video was posted.
NisabaRelief MSII Image Model: A member named their MSII image model NisabaRelief and described it as the preprocessing stage for NabuOCR.
- Nisaba is the Sumerian goddess of writing and scribes, who actually predates Nabu as the patron deity of cuneiform.
Exploring the bottlenecks to AGI: Members debated whether hardware or ideas are the bottleneck to achieving AGI.
- One posited that even the stupidest model has a probability to output the most novel thing, but another countered that compute will get us there faster.
Gemini 3.1 Gets Bad Marks: A member claimed that Gemini 3 is literally the dumbest model ever and has major skill issues compared to Llama 2 70B.
- It was also mentioned that even if prompted very explicitly to do one thing it goes ahead and does something completely irrelevant.

Unsloth AI (Daniel Han) ▷ #help (47 messages🔥):

LM Studio metadata issues with Qwen3-Coder-Next-UD-Q8_K_XL, GPT OSS 20B LoRA merging issues, CUDA error with GPT-OSS-20B on Docker, QAT Training on 4bit models

LM Studio displays incorrect context length for Qwen3: A user reported that LM Studio displayed an incorrect context length of 4096 for the Qwen3-Coder-Next-UD-Q8_K_XL model, while Hugging Face metadata showed the correct value of 262144, but resolved the issue by reinstalling LM Studio.
LoRA merge conflict with GPT OSS 20B: A user encountered an AttributeError when merging a LoRA trained on GPT OSS 20B with the embed_tokens and lm_head target modules, reporting a mismatch between the number of modules and LoRA keys.
- Another user reported a similar issue with only lm_head added for training, suggesting to try turning off the rslora.
CUDA Error Hinders GPT-OSS-20B on Docker: A user encountered a CUDA error: an illegal memory access was encountered while running GPT-OSS-20B in a Docker container, using an A2 GPU.
- Another user fixed a similar error by leaving dtype to =None.
Pursuing QAT on 4bit models: A user inquired about the possibility of performing QAT (Quantization Aware Training) on a 4-bit model, and got a link to a relevant notebook.
- It was clarified that training a LoRA on a 4-bit quantized model is distinct from QAT.

LM Studio ▷ #general (245 messages🔥🔥):

LM Studio memory loading configurations, Expensive flashlights, Claude code models, LM Studio payment options, LM Studio draft model

LM Studio struggles with memory loading: A user experienced issues loading a model into memory, even with mmap turned off, and noted the system seemed to try to load the full model into RAM first, and they get stuck on “deciding how to handle document”.
- Another user suggested that hybrid memory/GPU setups can be tricky, and that the problem might stem from the system attempting to load everything into RAM before shifting to GPUs.
Flashlight Fiasco: disposable income or value option?: Users debated the cost of a $130 flashlight, with one jesting about swimming in disposable income while another considered it a value option.
- Discussion ranged from needing pressure pads and duct tape for mounting to finding cheaper options on eBay, involving batteries, housings, and alligator clips.
Claude Model Capabilities and Limitations: Users discussed the Claude code model, its various plans (free, Pro, Max), and their usage limits, with one user switching back to the free plan due to low usage.
- A user asked how to connect LM Studio in server-mode so that Claude code can talk to it instead.
Paying the Piper: LM Studio Donation?: A user, benefiting greatly from LM Studio since Nov 2024, sought to donate or pay for the software, citing ethical concerns and the value received.
- Suggestions included contacting the team via their website for commercial plans, while others jokingly questioned if it was a guilt-tripping LLM attempting to elicit donations.
LM Studio Speculative Decoding: Users discussed the new interface and enabling the draft model for speculative decoding, as explained in the LM Studio documentation.
- One user noted it’s basically useless, produces worse quality, and is old AF.

LM Studio ▷ #hardware-discussion (121 messages🔥🔥):

NVLink Support in LM Studio, RAM Bandwidth vs. GPU Bandwidth for Inference, MoE Models vs. Dense Models, GPU Recommendations for LM Studio on Ubuntu, X99 Motherboard for Offloading

NVLink Not Necessarily Boosting Inference Speed: A user inquired about NVLink support in LM Studio, reporting 11-15 tok/sec with gpt-oss 120B on dual A5000 GPUs on Windows.
- However, it was stated that NVLink won’t help with speeds and PCIe speeds are sufficient, with RAM bandwidth being the bottleneck.
RAM Bandwidth more important than GPU for Inference: The discussion highlighted that RAM bandwidth is often more crucial than GPU bandwidth for inference, especially when not fully offloading models to VRAM.
- Users noted that increasing RAM speed from 3600 to 6000 yielded only a marginal increase of 2 t/s, and emphasized the importance of VRAM for optimal performance, particularly with larger models.
MoE Models Efficient When Offloaded: The conversation touched on the efficiency of Mixture of Experts (MoE) models, noting that they perform well when offloaded due to only activating a subset of their parameters at a time.
- While simply increasing VRAM is always beneficial, MoE models like Qwen, Nemotron, and GPT-OSS offer advantages by not utilizing all parameters simultaneously, making them faster.
RTX 4070 shines running Headless API: A user sought recommendations for NVIDIA GPUs on Ubuntu for LM Studio, specifically for deploying gpt-oss-20b in a server environment.
- It was suggested that an RTX 4070 can achieve around 28 tps and running LMS headless as an API server is perfectly doable.
Motherboard with e-waste for AI: A user plans to run a 42B size model on 300$ worth of ewaste with a new board requiring 6 pins, expecting double digit token performance.
- The user noted they bought it a month ago before considering offloading to GPU and it can only support up to 2400 clock speed on X99.

Latent Space ▷ #watercooler (31 messages🔥):

Sales skills for engineers, Traction and Lean Startup books, Open Claw and Spacemolt for Discord automod, LLM Summarization of Missed Discord Chatter, ICYMI mobile app feature

Engineers Embrace Essential Sales Skills: A member emphasized the importance of sales skills for engineers, especially after experiencing a two-engineer garage startup.
- The suggestion was made that a business cofounder needs to be talking to and learning from 5 potential customers per day, or something is wrong.
Classics recommended for SaaS startups: Members recommended “Traction” by Weinberg and Mares and “Lean Startup” by Ries as classic resources for engineers to learn about sales in the SaaS era.
- It was mentioned that these books provide alignment and direction, but they won’t chase leads.
Exploring Open Claw and Spacemolt: Following a talk in the watercooler channel, a member was convinced to try open claw this weekend.
- They suggested using it for building a Discord automod prototype to detect spammers, or trying out spacemolt.com from a prior presentation.
LLM Discord Summarization Solution: A member proposed using LLMs on Discord to summarize “what did I miss?” in channels with a lot of chatter.
- Another member noted that there was a mobile app feature called ICYMI but it was later removed.

Latent Space ▷ #memes (27 messages🔥):

Rotating Manifolds, X-Ware Criticisms, Zight, Mistral Founder's Keynote, AI Code Review Workflow

X-Ware Sparks Open-Source Surge: A social media post notes that poor software performance pushes communities to develop faster open-source alternatives; see this tweet.
Balthazar Reacts to Bronzini’s Post with Zight: A. P. Balthazar (@aimeebalthazar) replied to @alexbronzini with a humorous expression of disbelief, questioning the nature of the previous post, also referencing Zight.
Mistral’s Mensch Draws Modest Mob: A viral post highlights a surprisingly small audience for a keynote speech delivered by Arthur Mensch, the founder of Mistral (see post and YouTube short).
- A member joked about generally skipping CEO keynotes at conferences because they are usually low alpha fluff.
Codex and Claude Collaborate on Code Review: Sankalp (@dejavucoder) shares a humorous workflow update regarding using OpenAI’s Codex to review code co-authored by himself and Anthropic’s Claude here.
Timeline Suffers Saturation Shock: Jrag.eth shared a post on February 20, 2026, commenting on how a specific unnamed topic or trend has taken over 80% of their social media timeline, seen by over 100,000 views here.

Latent Space ▷ #stocks-crypto-macro-economics (9 messages🔥):

Game Industry vs Tech, Global Gaming Market, Anthropic & Cybersecurity Stocks, Spreadsheet management

Matthew Ball Slices State of Gaming: A member shared Matthew Ball’s presentation on the game industry compared to the wider tech industry, requiring email to view.
- The attached image analysis indicated that the US market only accounts for 4% of the gaming market worldwide, overall the western gaming market only holds a small fraction.
Mobile Munching Market Share: In a continuing discussion on the game industry, it was noted that most of the money is going to ad platforms and app store fees and that mobile is by far the majority of the gaming market.
- A member quipped that the stock market is not real in light of the market dynamics.
Anthropic’s Blogpost Bites into Cybersecurity Stocks: George Pu reported that a blog post from Anthropic triggered a significant market sell-off.
- Major cybersecurity firms like CrowdStrike, Cloudflare, and Okta experienced a $10 billion loss in valuation within one hour because of it.

Latent Space ▷ #intro-yourself-pls (4 messages):

AI Agent Teams, Agentic AI Tooling, Foresight Institute, Space Infrastructure & AI Agents

AI PM Pursues Productivity via Agent_Copilot: An AI PM from a tech company expressed interest in Agent_Copilot exercises to promote productivity.
Orby AI Founder Explores AI Agent Teams’ Impact: The builder of Orby AI, sold to Uniphore last year, is exploring how AI agent teams will reshape company structures and building tooling for managing multiple AI agents across different runtimes.
- He’s interested in agentic AI, knowledge graphs, and the “super-individual” thesis, based in the bay area.
Foresight Institute Communications Head Shares Opportunities: The communications lead at the Foresight Institute, a nonprofit research organization accelerating AI-driven scientific progress, offered to share grant opportunities, events, and job openings.
- The Foresight Institute was founded in 1986.
Space Engineer Leverages AI Agents for Tooling: An engineer working on space infrastructure at flotilla.space is using AI agents to build tooling for the new company.
- He built an orbit simulator at flotilla.space/orbit and other internal analysis tools.

Latent Space ▷ #devtools-deals (7 messages):

Webpack vs Vite, ESM in Browser Environments, Webpack Configuration Pain Points, Webpack Simplicity for Basic Bundling

Vite Surpasses Webpack as Favored Frontend Tool: Most frontend development has transitioned to Vite or Vite-based frameworks, with Next.js being a notable exception; older versions use Webpack, but are being replaced by Turbopack.
Native ESM largely unused in browser environments: According to a member, almost no one they know is shipping ESM natively for browser environments, and the exceptions tend to be library maintainers.
- However, saeris.gg also mentioned that Webpack still powers a large portion of the modern web, and its continued maintenance is still important for many enterprise companies.
Webpack Scaling and Configuration Criticized: A member listed scaling, speed, build times, and off-the-beaten-path configurations as pain points of Webpack.
- They mentioned that most people can’t be bothered to maintain an ever-growing config and debugging it for performance issues and that they will gladly never go back to wasting their time on it.
Simple Webpack Configuration Still Works for Basic JS Bundling: A member shared a simple Webpack configuration they’ve used for 8 years with minimal changes, citing “if it ain’t broke don’t fix it”.
- They noted their use case is uncomplicated JS bundling for the browser without TypeScript, JSX, Vue SFCs, tree shaking, or even minification in prod.

Latent Space ▷ #hiring-and-jobs (1 messages):

Foresight Institute, Systems Administrator, Compute Support contractor, AI Node, NVIDIA GPUs

Foresight Seeks Systems Ace for AI Node: The Foresight Institute is seeking a part-time Systems Administrator & Compute Support contractor to manage its AI Node in San Francisco.
- The role involves maintaining a local compute cluster with NVIDIA and AMD GPUs, CUDA environments, multi-user Linux systems, and Docker containers, at a rate of $120–$190/hour for 2-8 hours per week.
AI Node compute cluster: The AI Node compute cluster uses NVIDIA + AMD GPUs, CUDA environments, Multi-user Linux systems and Docker/containerized workloads for local server and hardware maintenance.
- They are looking for someone based in SF, to help advance AI science and safety.

Latent Space ▷ #san-francisco-sf (5 messages):

SF Housing Market Inflation, AIE in June

SF Rental Market Inflates: TK Kong announced signing a new lease in San Francisco, noting extreme competition in the rental market where applicants are bidding significantly above listed prices and prepaying rent.
Inquire about AIE Discount: A member inquired about discount codes for the AIE in June.

Latent Space ▷ #ai-general-news-n-chat (55 messages🔥🔥):

Agentic Coding as ML, Airtable Hyperagent, Gepa AI optimize_anything API, Amazon Kiro AI Outages, Perplexity Strategic Shift

Agentic Coding: The ML Reincarnation?: François Chollet suggests agentic coding is becoming like machine learning, with codebases as ‘blackbox models,’ optimized against specifications; this shift introduces ML problems like overfitting and data leakage, detailed in this tweet.
- A member responded emphasizing the importance of the human in the loop, and how to do ‘gradient descent’ on the human side of things, according to this tweet.
Airtable’s Hyperagent: The Agentic Cloud Platform?: Howie Liu announced Hyperagent by Airtable, a specialized cloud platform for AI agents with isolated computing environments, domain-specific learning, and seamless Slack deployment, detailed in this tweet.
Gepa AI’s API: Optimize All the Things?: Lakshya A Agrawal launched a universal API to optimize any text parameter (code, prompts, cloud policies), claiming performance matching or exceeding domain-specific tools, according to this tweet.
Kiro AI’s Klumsiness: Amazon’s AI causing AWS Outages?: Ed Zitron highlights two AWS outages (one lasting 13 hours) caused by Amazon’s AI assistant, Kiro and critiqued Amazon’s official stance of attributing the failures to ‘user error,’ according to this tweet.
Claude’s Code Checkup: Security Scan Launched?: Anthropic introduced Claude Code Security, powered by Claude 4.6 Opus, to scan codebases for vulnerabilities and recommend patches, reportedly finding 500+ long-standing bugs in open-source production code; a research preview is now available, according to this tweet.

Latent Space ▷ #llm-paper-club (8 messages🔥):

Voxtral Realtime Model, Dimitris Papailiopoulos Tweet

Voxtral Realtime Transcription Model Launched: Guillaume Lample announced the release of Voxtral Realtime, an Apache 2 licensed model designed for state-of-the-art transcription, available at this xcancel.com link.
- The model features low latency, performing under 500ms.
Dimitris’ Tweet Scores Views: A thread archives a tweet by Dimitris Papailiopoulos from February 19, 2026, including performance statistics, available at this xcancel.com link.
- The tweet received 25 replies, 46 retweets, 453 likes, and over 90,000 views.

Latent Space ▷ #ai-in-action-builders-techstacks-tips-coding-productivity (89 messages🔥🔥):

Mobile Git Diff Viewers, Convex Workflow, OpenSpec + Opencode, Trunk Tool, Claude vs Pi

Twilwa Bootstrap: Workflow in a Box: A member shared a GitHub repo as a template for their workflow, which sets up their stack using gh repo clone twilwa/bootstrap && cd bootstrap && sudo chmod +x ./bootstrap.sh && bootstrap.sh.
- The agents.md file contains the member’s loop, and readme.md contains human-readable information, but it might need adjustments for other machines.
Visual Explainer Aims to Improve Project Planning: Nico Bailon introduced Visual Explainer, a tool designed to replace markdown-based project planning with visual representations, posting a link to Visual Explainer on xcancel.com.
- The tool is open-source on GitHub and seeks to improve the user experience of project planning by using visual representations over traditional text methods.
Regex Patterns for Prompt Injection Detection Released: Mario Zechner shared a resource featuring 44 regex patterns designed to detect and prevent prompt injection attacks, with a link to Prompt injection patterns.
- Community members acknowledged the utility of these patterns for enhancing security.
Diverse OpenClaw Forks Emerge: Multiple OpenClaw rewrites and forks were mentioned, including zeroclaw, nanoclaw, picoclaw, and nullclaw, each offering unique features and optimizations, though one member reported starting to use nanoclaw due to the usage of apple containers instead of docker on mac.
- Another member mentioned IronClaw and MimicLaw for esp32 agent with websockets and telegram integration.
OpenClaw Slides and Presentation Tips Shared: The presenter shared the slides from their talk, and pointed out that OpenClaw created those slides, posting a link to OpenClaw Slides.
- Also shared some tips for working with OpenClaw, such as using separate git worktrees for parallel fixes, running pnpm install first before running Codex in fresh clones, and checking for shell prompts to detect completion.

ElectricSQL blogpost, rhesis-ai/rhesis LLM testing

ElectricSQL blogpost Released: A member shared a link to an ElectricSQL blogpost: Amdahl’s Law for AI Agents.
rhesis-ai releases open-source platform: A member announced an open-source platform & SDK for testing LLM and agentic apps: rhesis-ai/rhesis.
- It helps to define expected behavior, generate and simulate test scenarios, and review failures collaboratively.

Latent Space ▷ #private-agents-and-workflows-local-llama-ollama (1 messages):

Always On AI Agent, Local AI in your pocket, IoT Home Source Code

Juno Labs Introduces Always-On AI Agent: Juno Labs is developing an always-on AI agent, though the implementation details remain unclear.
Tiiny AI: Local AI in Your Pocket: Tiiny.ai offers local AI capabilities accessible from your pocket, enabling on-the-go processing.
TRMNL’s IoT Home Source Code Now Available: The source code for TRMNL’s IoT home system, found on GitHub, integrates with microphones and sensors for home automation.

Latent Space ▷ #genmedia-creative-ai-video-image-voice-music-inspo-consumer-ai (12 messages🔥):

Google Labs Pomelli Photoshoot, AI-Generated Podcast, Generative AI Video Models

Pomelli ‘Photoshoot’ from Google Labs Catches Eyes: Google Labs introduced ‘Photoshoot’, a new Pomelli tool feature that generates high-quality, customized marketing images from a single product photo, and is currently free in the US, Canada, Australia, and New Zealand via this link.
Viral AI-Generated Podcast ‘The Epstein Files’ Breaks Records: Levy.eth discussed the viral success of ‘The Epstein Files’, an AI-vibe-coded podcast created with Claude, which hit 100,000 downloads in its first week, outperforming the top 1% of global podcasts by 20x, linked here.
- The podcast was produced solo over a single weekend.
a16z Highlights Generative AI Video Landscape in 2026: a16z highlighted the rapid advancement in generative AI video, noting the dominance of Seedance 2.0 and competition from Kling, Grok, Sora, and Veo via this tweet.
- The post referenced a ‘State of Generative Media’ report by fal, analyzing the industry landscape as of early 2026.

Latent Space ▷ #ai4science-bio-math-physics-chemistry-ai-researcher-ai-scientist (3 messages):

Agentic Drug Discovery, Cell Type Importance

CellType: The Agentic Drug Company Launches: A member shared a link to CellType: The Agentic Drug Company on Y Combinator.
- The member noted that the name suggests they also figured out how important the cell type is downstream.
Cell Type Core Hypothesis: The member indicates that determining cell type importance downstream is a core hypothesis at MiraOmics.
- There were no further details or discussion.

Latent Space ▷ #mechinterp-alignment-safety (1 messages):

burnytech: https://fxtwitter.com/i/status/2024537378535211368

Latent Space ▷ #applied-ai-experimentation (19 messages🔥):

Variable Diff, REPL Prompting Technique, SQLite Storage for Agents, Memory Management Systems for AI Agents, TDD and Specs for AI Development

Variable Diff explained: A member introduced the concept of variable diff as tracking the state that got added/updated with each turn of the root LLM, in a viewer that tracks the state of code, sub LLM calls, and variable updates.
- The viewer provides a way to observe changes in code, the outputs of code execution, and the state of variables after each interaction; one member mentioned a more complex example that adds variables for search checkpoints and output sections, illustrated with a screenshot.
REPL is better than files/scripts for prompting: Members found that using REPL (Read-Eval-Print Loop) as a prompting technique is effective, and separates external filesystem from internal memory making it easier for the model to understand.
- This approach gives more control by allowing the model to peek at the variable state, which is more structured than YOLO_RESULTS_OF_LAST_RUN.md.
SQLite for Agent State Persistence: The use of SQLite as persistent storage for agent state was discussed, with one member describing it as the goat for this purpose.
- SQLite allows for easy inspection of the schema and facilitates parallel agents that can catch up by inspecting the database, though REPLs have their own pros and cons.
Addressing AI Memory Management: A member inquired about how often outdated memories (plans/thoughts/references/specs) infiltrate current chats, pointing out that unwanted or outdated memory can interfere with current tasks.
- They mentioned their struggles with managing various levels and scopes of memory, promoting memory between scopes, and automating memory refactoring, noting that AI-driven solutions often felt hit or miss.
TDD Workflow Prevents Memory Mishaps: A member mentioned that they are pretty militant about their specs + tdd (test-driven development).
- They use a workflow where specs/ is always current-repo state, changes/ is actively in process, and changes/archive/ is completed and validated, and any deviations from these specs can be fully audited.

OpenRouter ▷ #general (246 messages🔥🔥):

Contacting Support Team, Zero-size choices array, Excluding Models with Data Policy, Inference expensive, Choices.0.native_finish_reason missing

Users struggle getting response from Support: One user reported that they sent many emails but haven’t received a response for several days, indicating difficulties in contacting the support team.
- The user emphasized the importance of their issue and sought assistance.
Zero-size Choices Array Strikes OpenRouter: Users reported receiving a zero-size choices array from models, indicating a potential issue with the API’s response structure, one member says “yeah, looks like choices can be empty for the final message piece. Just fixing it for my project at the moment.”.
- It was noted that checking for a non-zero array might be a temporary fix, but the issue appeared randomly and broke some platforms.
Image Generation Goes Blank, Credits Still Charged: Users reported receiving empty responses from image generation, with no image data returned despite credits being charged.
- One user, flight505, detailed a dispute over $2.72+ in charges for missing image data and requested investigation into the cause.
OpenRouter Backend Refactor Causes Image Generation Outage: OpenRouter admitted to a backend refactor that caused a partial outage in image generation, leading to blank or missing images.
- The team is planning refunds for affected users and implemented checks to prevent future occurrences, and mentioned we made the biggest backend refactor that we’ve ever done and missed an edge case in tests.
Can’t buy Enterprise subscription: A member asked how to get an Enterprise subscription but emails to support and sales have not been answered.
- Another member noted that first they ignore all non-corporate emails like @gmail.com second idk that’s all i know maybe they don’t have enough people to read those emails.

OpenRouter ▷ #discussion (4 messages):

AWS Outage, Kiro AI Coding Tool, Amazon AI tools

AWS Suffers Outages Due to Kiro AI: Amazon Web Services experienced a 13-hour interruption to one system in mid-December after engineers allowed its Kiro AI coding tool to make changes.
- The agentic tool autonomously determined that the best action was to “delete and recreate the environment”.
Amazon Employees Doubt AI Coding Assistants: Multiple Amazon employees told the FT that this was the second occasion in recent months in which one of the group’s AI tools had been at the centre of a service disruption.
- Engineers did not require a second person’s approval before making changes, as would normally be the case.

GPU MODE ▷ #general (29 messages🔥):

DirectML vs CUDA, ONNX Runtime, BPM Analysis, LlamaCpp / LlamaSharp, Nsight Resources

DirectML challenges CUDA for ONNX inference: A member suggested that DirectML is as fast as CUDA with ONNX inference, prompting a discussion on its capabilities and limitations.
- Another member noted that DirectML doesn’t support Linux (excluding WSL) and is in maintenance mode, but recommended it with ONNX in dotnet for Windows.
ONNX Runtime simplifies model inference: A member explained that ONNX Runtime (using .onnx or .safetensors files) with .json config files can be used for various model inference tasks, including text generation, chat, and stable diffusion.
- They demonstrated using DirectML-ONNX to analyze audio files for BPM (beats per minute) with high accuracy.
LlamaCpp/LlamaSharp simplifies text LLM: A member suggested using LlamaCpp / LlamaSharp in dotnet with .GGUF files for running text LLMs, particularly if not bound to Linux.
- They shared their SharpAI project (a web-api with Blazor frontend) as an example, noting experiments with whisper transcription and stable diffusion.
Nsight usage assistance requested: A member asked for resources to get started with Nsight, prompting other members to share helpful links.
- Recommended resources included a YouTube tutorial, blog posts on using NCU with large codebases, and talks from past GTCs.

GPU MODE ▷ #cuda (9 messages🔥):

CUDA Registers, CUDA Unified Memory API, CUTLASS

setmaxnreg ignored due to undetermined register count: A member encountered an issue where ptxas was unable to determine the register count, causing 'setmaxnreg' to be ignored, even with an empty kernel using nvcc main.cu -arch=sm_90a.
- The member found that specifying all parameters for __launch_bounds__ was necessary to resolve the issue, and pasted this github.com/NVIDIA/cutlass/pull/3030.
CUDA requires nvidia-uvm even without Unified Memory: A member reported that CUDA attempts to load the nvidia-uvm kernel module even when code doesn’t use the Unified Memory API (cudaMallocManaged()), and the GPU isn’t detected without it.
- The member sought insights into why this dependency exists, as it’s not documented in CUDA documentation or the NVIDIA open kernel repository.

GPU MODE ▷ #cool-links (2 messages):

Modular Claude C Compiler, Paged Out Zine

Modular’s Claude C Compiler is out!: Modular published a blog post about their Claude C compiler and what it reveals about the future of software.
- Details about plans for software development are discussed in the post.
Paged Out Zine Released!: Issue #8 of Paged Out!, a nerdy zine about everything computers, has been released and is available for download.
- The zine covers a range of computer-related topics and is available via the Paged Out Institute.

GPU MODE ▷ #job-postings (2 messages):

ML Performance Engineers, Compiler Engineers, New AI Compilation Technology

Hiring ML Performance and Compiler Engineers: A company is seeking ML Performance Engineers and Compiler Engineers to develop new technology for compiling and servicing AI models.
- This technology is being built from scratch and is an alternative to LLVM and VLLM, as shown in the job posting.
New AI Compilation Tech Stack: The company is building a new compilation tech stack from scratch for AI models, offering an alternative to existing solutions.
- The focus is on ML performance and compiler engineering, indicating a deep dive into optimization and efficiency.

GPU MODE ▷ #beginner (10 messages🔥):

Coalesced Memory Accesses, CUDA Optimization Resources, NVIDIA Feynman GPU Architecture

Dive into Coalesced Memory Accesses: Members sought resources on coalesced memory accesses, and the official NVIDIA CUDA guide was recommended as a starting point.
- Another helpful resource suggested was this article on CUDA memory management.
Explore CUDA Optimization Resources: A member inquired about starting points and resources for GPU optimization.
- Another member pointed to prior discussion in the channel.
NVIDIA’s Feynman GPU uses Vera CPU: A member asked why NVIDIA will use a Vera CPU for the Feynman GPU, questioning whether a CPU will be embedded in the GPU.
- Members clarified that NVIDIA uses both GPUs and CPUs, citing the Blackwell architecture with Grace CPUs and Blackwell GPUs interconnected via NVLink, with more details available in this NVIDIA blog post.

GPU MODE ▷ #pmpp-book (6 messages):

PMMP Book Release, Izzat Hajj Interview

PMMP Book Delayed, Author List Unchanged?: An Amazon page listed a February 8th release date for the new PMMP book edition, but it was taken down shortly before release, also the author list of the 4th edition will be the same for the new edition.
- A member suggests Vikram is heavily involved in this new edition, but September is the latest release date being speculated.
Izzat Hajj Discusses Forthcoming PMMP Edition: Izzat Hajj discusses the new edition of the PMMP book around the 24-minute mark in this YouTube video.
- A member thanked another for linking the video, noting that while September is the latest speculated release date, they’re hoping for earlier.

GPU MODE ▷ #irl-meetup (6 messages):

Seattle IRL meetup, ML Systems Happy Hour in Seattle, Chicago Meetup

Seattle IRL Community Search Initiated: A member inquired about the existence of an IRL (in real life) community in Seattle, inviting others to DM to start one if one doesn’t exist.
- This sparked interest among other members who expressed enthusiasm for a local gathering.
ML Systems Happy Hour Brews in Seattle: A member announced plans for a happy hour in Seattle focused on ML systems, signaling an opportunity for local professionals to connect.
- Another member offered assistance, showing community support for organizing the event.
Chicago Gathering?: A member inquired about potential meetups in Chicago.
- No further information or plans were provided regarding a Chicago meetup.

GPU MODE ▷ #thunderkittens (10 messages🔥):

ThunderKittens 2.0, GH CLI with Claude, HIPKittens, PTX consistency model, Tensor core memory pipelining

ThunderKittens 2.0 Release Announced: The Hazy Research group at Stanford released ThunderKittens 2.0 which emphasizes subtraction as much as addition through internal refactoring and reduction of build system complexities.
- It identified surprising behaviors on modern Nvidia GPUs which will guide how kernels should not be optimized.
ThunderKittens 2.0 called out as talk-worthy!: Members discussed the potential for a talk on ThunderKittens 2.0, with one suggesting it could focus on undocumented tensor core pipelining, proper PTX assembler hinting, and occupancy challenges.
- The speaker said that they would love to give a talk and provided a link to available dates.
ThunderKittens 2.0 TMA performance insight: A member inquired about the performance benefits of using different warps for A/B TMA and SFA/SFB TMA in ThunderKittens 2.0.
- He also observed speedups from interleaving tcgen05.cp and tcgen05.mma for nvfp4 competition problem shapes.
Leveraging GH CLI and Claude for issue selection: One member mentioned using GH CLI with Claude to read open issues in other projects, filtering them based on personal preferences.
- This process involves iterative refinement to select suitable tasks.

GPU MODE ▷ #factorio-learning-env (4 messages):

Rocket Launch, Factorio Learning Environment

Rocket Launch Timeline Optimism: Members expressed optimism about the possibility of launching a rocket in the next 6 months, given the current rate of progress.
Factorio Learning Environment Goals: The goal is to launch a rocket within a collaborative Factorio environment.

GPU MODE ▷ #teenygrad (6 messages):

Tensorflow Projector, GEMMs OpenBLAS Updates

Tensorflow Projector Showcased: A member shared the TensorFlow Projector for visualization, in addition to the already well-known TensorFlow Playground.
OpenBLAS GEMMs getting love: A member announced plans to work on updating the GEMMs OpenBLAS stuff later in the day.

GPU MODE ▷ #general (4 messages):

AI Leaderboard Submissions, Marksaroufim clarifies AI Leaderboard policy

AI submissions wanted on leaderboard: A member asked if purely AI-created submissions were wanted on the leaderboard.
- Another member clarified that it’s completely fine and that they like both our expert humans and expert AIs.
AI Leaderboard Policy Clarified: Marksaroufim confirmed that the leaderboard accepts purely AI submissions.
- This statement encourages participation from both human experts and advanced AI systems, fostering a diverse and competitive environment.

GPU MODE ▷ #multi-gpu (4 messages):

torch.ops.symm_mem.fused_all_gather_scaled_matmul, do_bench with multi-GPUs, vllm-project/vllm

fused_all_gather_scaled_matmul Freezes up with Multi-GPUs: A member is debugging torch.ops.symm_mem.fused_all_gather_scaled_matmul hanging when do_bench is run on multi-GPUs, referencing a vllm-project/vllm code change for context.
- Another member points out that do_bench is intended for single-device kernels, so running a multi-GPU fused collective kernel repeatedly won’t work.
do_bench Designed for Single Device Kernels: One member mentioned that triton.testing.do_bench() isn’t suitable for distributed collectives like torch.ops.symm_mem.fused_all_gather_scaled_matmul due to the internal torch.cuda.synchronize() calls during timing.
- They recommend using events and a pre-iteration barrier as a workaround, and another member confirmed the issue, stating the best workaround they found was using host-side timing with the time library.

GPU MODE ▷ #nvidia-competition (36 messages🔥):

Environment Issues with Modal, Cutedsl Problems, Modal Credits, Debug IR/PTX with popcorn cl

Modal Environment Gremlins Attack!: Members encountered environment issues on Modal, where previously working code failed; the root cause was pinpointed to problems with the nvidia-cutlass-dsl package.
- One member found that removing the runtime installation of nvidia-cutlass-dsl from their code lessened the crashing.
Cutedsl Code Causes Competition Crash!: Some members using cutedsl reported issues submitting to Modal, with one noting they hadn’t been able to submit for 5 days, while another stated that removing for pkg in [“nvidia-cutlass-dsl”, made it crash less often now.
- A member pointed out that installing dependencies at runtime is a bit yolo and suggested adding more to Modal’s dependencies for future competitions.
Modal Funding Drying Up!: A member noted that their team should be fixed and they have about 2K in credits left but ai spamming thousands of submissions will bring back the unpopular rate limits.
- It’s now all linked directly to my personal credit card haha so please don’t make me tell my wife we’re homeless.
Debug IR/PTX Dumps: A member inquired about dumping debug IR/PTX when submitting cutedsl code via popcorn cl.
- A member suggested printing to stdout for now, but mentioned they could consider adding a ptx instruction after the competition.

GPU MODE ▷ #flashinfer (11 messages🔥):

Fused MoE track, flashinfer.fused_moe.trtllm_fp8_block_scale_moe, reference kernel, Bug with trtllm_fp8_block_scale_moe, flashinfer-ai/flashinfer #2356

Challenges Arise with FlashInfer’s Fused MoE Baseline: A member reported consistent failures with INCORRECT_NUMERICAL errors and high abs_err / rel_err when using the baseline flashinfer.fused_moe.trtllm_fp8_block_scale_moe function in the Fused MoE track.
- The member inquired about required settings or constraints for achieving numerically correct outputs, such as weight layout, shuffled weights, PDL, scaling assumptions, or routing method.
Navigating Numerics Nightmare with TensorRT FP8 MoE: A member shared their kernel code utilizing trtllm_fp8_block_scale_moe and sought advice on resolving numerical mismatches.
- The posted code configured settings like num_experts=256, top_k=8, n_group=8, and routing_method_type=RoutingMethodType.DeepSeekV3.value, but still faced issues.
Reference Kernel Recommended Amid FlashInfer Bug: A member suggested using the reference kernel instead of the FlashInfer baseline, while another member confirmed a bug with trtllm_fp8_block_scale_moe.
- They linked to issue #2356 on the flashinfer-ai/flashinfer GitHub repository, indicating a known problem.

GPU MODE ▷ #from-scratch (6 messages):

vLLM, CUDA kernels, RoPE implementation

Coding vLLM From Scratch: A member started writing vLLM from scratch and shares the repo they are working on.
- They also mentioned that vLLM and Titan are probably the 2 most important ones to start with and are currently working on RoPE.
Tiny-vllm’s Main Implementation: A member shared the link to the main implementation of tiny-vllm, in main.cpp.
- The member encourages others working on a minimal version of X to post their work.
Tiny-vllm’s CUDA Kernels: A member shared the link to the CUDA kernels used in tiny-vllm.
- They stated that there is not much educational value yet.

Moonshot AI (Kimi K-2) ▷ #general-chat (63 messages🔥🔥):

Kimi Coding, Claude vs Kimi, Kimi CLI vs IDE, Audio transcription endpoint, Baidu Search Engine

Kimi’s Coding Capabilities Spark Debate: Some users are praising Kimi for its stability and speed in coding tasks, while others find it unsatisfactory, preferring Claude.
- One user highlighted Kimi’s ability to find obscure sources of information that Gemini misses, while another criticized its reasoning abilities and tendency to argue.
Kimi CLI Gains Favor Over IDE Integration: Users report better experiences with Kimi’s command-line interface (CLI) compared to its Visual Studio Code (VS Code) integration, which is currently in beta.
- A user noted that the CLI version integrates better with agent swarms for larger projects with thousands of lines of code, suggesting the IDE version is underbaked.
Kimi vs Claude Code Comparison: A user described swapping Kimi Code CLI for Claude Code with K2.5, noting that it was a good experience too, but hoping that Kimi will eventually integrate an agent swarm into its CLI.
- Another user cited that Claude models are too expensive, but another user said that they were researching on Claude code using Kimi, but got hit with rate limits.
OpenClaw & Refund Request Issues: A user has been awaiting a refund for two days after purchasing a kimi.com account with the intention of using OpenClaw, but finding it unsuitable due to a lack of browser navigation and WhatsApp connectivity.
- The user expressed frustration with the lack of immediate support, suggesting an AI chat system for instant refunds, referencing that other Chinese companies do reply even if it is Spring Festival.
ChatJimmy Boasts High Token Processing Speeds: A user shared a link to ChatJimmy AI, highlighting its ability to process over 15,000 tokens per second.
- This claim suggests ChatJimmy as a potentially faster alternative for certain AI tasks compared to other platforms.

Nous Research AI ▷ #general (46 messages🔥):

DeepSeek OS V4, AI and blockchain, Model capabilities, Gemini and coding

DeepSeek OS V4 vs Closed Source APIs: Members suggest using DeepSeek V4, emphasizing its open-source nature and local deployment capabilities as a preferable alternative to closed-source APIs and shared a primer video.
- One member noted the model’s biological neural network inspired Engram Memory breakthrough is significant and urged support for OS development to surpass closed-source options.
Exploring AI and Blockchain Fusion: A member expressed interest in AI and blockchain, seeking discussions on model building, AI agents, and automation.
- In response, another member shared their use of Claude code to orchestrate Gemini-cli and Codex, envisioning a future with text terminals and smart glasses.
Evaluating Leaps in Model Capabilities: Members discussed the rise in model capabilities, comparing Sonnet 3.5 and GPT4, with one humorously labeling Opus 3 as the dark eminence due to limited access.
- One member expressed hope that DeepSeek V4 would keep up with this trend, highlighting a shift in favor of OS momentum since the release of DeepSeek R1.
Gemini’s Coding Focus Questioned: A member said I would of preferred for them to be loose on coding and just lock in for scientific/math, with other members discussing Google’s stake in Anthropic.
- Another added that Claude can compile and execute C code in a sandbox in the web interface, while Gemini can barely do Python, with member sharing a link to twitter post.

Nous Research AI ▷ #interesting-links (1 messages):

Anthropic agent teams, Agent coordination, Agent communication

Reverse Engineering Anthropic’s Agent Teams: Anthropic released an experimental agent teams feature a few weeks ago, with details on how agents coordinate tasks and communicate with each other.
- A member reverse engineered how it works in this blog post.
Agent Communication Dynamics Exposed: The reverse engineering effort sheds light on how agents within Anthropic’s experimental teams feature interact and exchange information.
- Understanding these communication protocols is crucial for optimizing multi-agent systems and enhancing collaborative AI workflows.

HuggingFace ▷ #announcements (1 messages):

GGML, llama.cpp

GGML / llama.cpp Join the Family: The Hugging Face team welcomes GGML / llama.cpp to the family.
- Further discussions on this integration can be found on GitHub.
GGML gains traction: GGML gains traction within the community as a framework.
- llama.cpp benefits from integration and support.

HuggingFace ▷ #general (42 messages🔥):

HF Discord Invite Broken, Hybrid Diffusion / Autoregressive Language Model, HF Collabs with Unsloth, k-fold cross-validation Confusion, Report a Role

HF Discord Link 404s: Users reported the Hugging Face Discord link on the HF top page is broken.
- Staff confirmed and said we might need to replace it.
Diffusion Meets Autoregression?: A member inquired about hybrid diffusion/autoregressive language models, suggesting autoregressive layers could generate CoT tokens during diffusion steps.
- Another member suggested this paper as related to the topic.
Free LLM Training with Unsloth on HF: It was announced that you can train LLMs using Hugging Face for FREE with Unsloth.
- Another member mentioned that there are now over 100K models fine-tuned with Unsloth open-source on Hugging Face.
Decoding K-Fold Cross-Validation: A user sought clarification on the k-fold cross-validation process, specifically how the test set is handled across k iterations.
- One member advised not to overthink it and just try to grab data from throughout your training set to test/validate with.
ZeroGPU Service Sees Disruptions: Members reported experiencing disruptions in the zerogpu service.
- A member was initially confused and thought there was a new rule that you need to set an HF token to get free gpus but that was proven false.

HuggingFace ▷ #i-made-this (4 messages):

Terradev CLI v2.9.2, NAVD - Persistent conversational memory for AI agents, Coding agent swarm, Grant proposal feedback

Terradev CLI v2.9.2 Released: Cross-Cloud GPU Cost Optimization: The release of Terradev CLI v2.9.2 was announced, a cross-cloud GPU cost optimization platform with multi-cloud GPU arbitrage across AWS, GCP, Azure, and RunPod.
- Key features include real total job cost calculation and one-click HuggingFace Spaces deployment, available on GitHub under the BUSL 1.1 license.
NAVD Launches: Agent Memory Without Vector Databases: NAVD was released as a persistent conversational memory solution for AI agents, utilizing an append-only log and Arrow embedding index, eliminating the need for a vector database, available on GitHub under the MIT license.
- It offers pluggable embeddings (OpenAI built-in), semantic search over raw conversations, and index rebuildability with search speeds under 10ms at 50k vectors.
Autonomous Coding Agent Swarm Creates Iterative Improvement Loop: A coding agent swarm was introduced that operates autonomously for hours, creating an iterative loop to continuously improve its output without human intervention and coordinates with each other harmoniously.
- The project is available on GitHub.
Grant Proposal Feedback Requested: A member asked for feedback on a grant proposal.
- The grant proposal is available as a discussion on HuggingFace.

Eleuther ▷ #general (26 messages🔥):

Taalas Chip, Streamlit UI Bottleneck, TPU Research Funding

Taalas Chip: Model-Specific ASICs Hit the Market: A new Taalas chip is designed as an ASIC for a specific LLM, offering potentially high speed and low energy use, but requiring new layers for different models.
- It’s being compared to Cerebras (wafer scale) and Etched (runs multiple models), with some arguing Taalas might be acquired by big tech for on-device inference.
Streamlit Reruns Result in UI Lag: A member found Streamlit’s full-script rerun architecture to be a massive bottleneck when building UIs for heavier models, experiencing significant lag during inference testing.
- They hacked together a pure Python framework (FastAPI + Lit) mimicking Streamlit’s API but using signals for O(1) updates, bypassing the rerun entirely, available at GitHub.
$25k-100k one-time unrestricted funding, along with TPU compute and also a research mentor: Members discussed Google’s TPU Research Funding RFP, offering $25k-100k one-time unrestricted funding, along with TPU compute and a research mentor.
- While the funding requires working with Google-adjacent stack, it’s primarily for faculty at degree-granting institutions, not individuals or most members of the server.

Eleuther ▷ #research (8 messages🔥):

Fold Catastrophe Geometry in GPT-2 and Pythia, Context Compression and Information Loss, KV Cache and Flash Attention, Identity Leakage Verification

Fold Catastrophe Geometry pops up in GPT-2 and Pythia: A member found what looks like fold catastrophe geometry in how GPT-2 and Pythia-160M resolve ambiguous tokens, noting sharp transitions, directional specificity, and 4:1 basin asymmetry.
- The findings replicate across both models, and the member provided a GitHub repository with scripts and results, also replicating on Pythia-410M.
Context Compression causes Information Loss: A paper indicated a 30-45% PPL gap between bounded and unbounded contexts, attributing it to real information loss from context compression.
- A member asked if there were any levers to decrease the compression ratio to mitigate the impact.
KV Cache sizes are debated: A paper cited 160 GB for the KV cache, but a member pointed out that this is inaccurate due to Flash Attention and similar techniques.
- The member suggested checking out flash-linear-attention for modern experimental attention implementations.
Identity Leakage Questioned: A member inquired how identity leakage was verified, noting they had not read the paper.
- The inquiry was in reference to this link.

Eleuther ▷ #interpretability-general (5 messages):

ARES Tooling Framework, Agent Activations Research

ARES Tooling Framework Launch by Martian: Martian introduced ARES, a tooling framework designed to expose an LLM agent’s activations along trajectories in an agentic setup, which helps researchers understand how the agent solves long horizon tasks, with the repo here.
- A tutorial demonstrating the use of ARES to diagnose and correct a failure mode in a simple agent (via probing and activation steering) is available here.
Martian’s ARES on X: The team at Martian also has a twitter thread describing the ARES framework here as well as a discord community here if you’d like to ask questions.
- The original launch tweet was posted on fxtwitter if you’d like to retweet it.

Yannick Kilcher ▷ #general (7 messages):

ChatJimmy, FXTwitter Links, Endomorphosis Datasets, Taalas Ubiquitous AI

Chollet’s Tweet Echoes Through Discord: Members shared a link to François Chollet’s tweet, originally posted on fxtwitter.com.
- There was little discussion or reaction besides the sharing of the URL.
Endomorphosis Rules Inventory Emerges: A member shared a link to the Endomorphosis project’s Inference Rules Inventory on GitHub, specifically this IPFS datasets Python logic.
- It appears to be an inventory of rules for a dataset project, but there was no elaboration in the channel on its purpose or capabilities.
ChatJimmy Boasts Blazing Token Speed: Multiple members highlighted ChatJimmy.ai, emphasizing its claimed processing speed of 15k tokens per second.
- Members reacted, with one exclaiming, “This is insane wow”.
Taalas Charts Path to Ubiquitous AI: A member shared a link to a Taalas article titled The Path to Ubiquitous AI.
- The article could potentially discuss the future and proliferation of AI, but no commentary was added.

Yannick Kilcher ▷ #ml-news (4 messages):

ARC AGI Fine Tuning, Synthetic data for ARC-AGI

ARC AGI is being fine-tuned: Members discussed that everyone is blatantly fine-tuning for ARC AGI now, referring to a post on X.
Synthetic Data is the key to ARC-AGI: The discussion suggested that the attempts to make more synthetic data for ARC-AGI and train on it points to one thing: this is the key to AGI.

DSPy ▷ #papers (2 messages):

Tree of Thought, Skill Issues, Coding Assistance

Tree of Thought Intrigues User: A member expressed interest in trying out Tree of Thought but mentioned being unable to code it themselves due to skill issues.
- They linked to a tweet related to the topic.
Coding Assistance Requested for Tree of Thought: The user explicitly stated they were unable to code it myself because of skill issues.
- The tweet linked here shows an implementation of Tree of Thought.

DSPy ▷ #general (5 messages):

DSPy with Claude, Office Hour Feedback, Reasoning Models with RLM, Qwen3-4B-thinking Issues

Claude meets DSPy’s skills: A member inquired about mixing normal agents (like Claude) with DSPy, specifically if DSPy could serve as a script associated with a Claude skill.
Office Hour Buzz: The office hour had about 40 people attending, covering roughly 10 use cases, with attendees providing questions and feedback.
RLM + Reasoning Model = Recipe for Success?: Reasoning models work well with RLM, but there are reports that sub_lm calls return truncated reasoning when using Qwen3-4B-thinking.
- One user suggested that the sub_lm adaptation to use signatures could potentially solve this issue.
Qwen3-4B-Thinking Loops: One member has noticed that, in their setup (llama cpp w/ jinja and vllm with reasoning parser), sub_lm calls appear to return the reasoning (in my setup, llama cpp w/ jinja and vllm with reasoning parser) as the answer, which is truncated, when they test Qwen3-4B-thinking.
- This truncation issue causes the agent to enter a loop.

Modular (Mojo 🔥) ▷ #mojo (2 messages):

PR Review Times, Modular PR #5979

Modular’s PR Review ETA: A member asked about the review time for their PR submitted the previous day.
- Another member responded that PR #5979 was assigned to a reviewer and would likely be reviewed later that day.
PR Submission Awaits Scrutiny: A recent pull request (PR) submitted yesterday seeks review and feedback.
- Assigned to <@325746765448085504>, PR #5979 on GitHub’s modular repository is slated for examination, potentially later today.

Modular (Mojo 🔥) ▷ #max (3 messages):

torch-max-backend performance, MAX backend on Silicon Mac

New Interpreter Boosts Torch-MAX-Backend Speed: A member reports that a new interpreter in torch-max-backend significantly improved the speed of unit tests, reducing test times from 1.54s to 0.34s for float32 and 1.34s to 0.24s for bfloat16.
- The new interpreter avoids recompilation for each new shape/dtype, which previously took up to 3 minutes per test.
MAX Backend Status on Silicon Macs: A member inquired about testing the MAX backend on Silicon Macs, mentioning their talk where they referenced torch-max-backend as an intermediate layer for exploring MAX.
- The original poster has not tested on Mac yet but expects it to work since it calls MAX behind the scenes.

tinygrad (George Hotz) ▷ #general (2 messages):

AMD assembly infra, Speed Bounties, Portable Solutions

AMD Assembly Infrastructure still George’s focus: George is focused on low-level compiler work so tinygrad can generate good code for AMD GPUs.
tinygrad offers Performance Speed Bounties: There are bounties available for measurable performance gains, including tooling to verify them.
tinygrad Priotizes Portable Solutions: George focuses on improvements in tinygrad’s core that benefit all backends, avoiding one-off custom kernels.

tinygrad (George Hotz) ▷ #learn-tinygrad (1 messages):

Tinygrad, George Hotz, AI-HPC

Main contributor to Tinygrad aims for Hotz hire: A member stated their intention to become the main contributor to Tinygrad and get hired by George Hotz.
- They have already started learning tinygrad and thanked another member for their support, also sharing a link to the AI-HPC GitHub.
Newbie learning Tinygrad from Experts: A user is diving into Tinygrad, expressing aspirations to become a key contributor.
- They express gratitude to another member for guidance, while also sharing a link to the AI-HPC GitHub as a learning resource.

MCP Contributors (Official) ▷ #mcp-dev-summit (1 messages):

aaronpk: schedule is posted! 🎉 https://mcpdevsummitna26.sched.com/