a quiet day.

AI News for 3/23/2026-3/24/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Open-Weight Reasoning and Vision-Coding Releases: Arcee Trinity-Large-Thinking, Z.ai GLM-5V-Turbo, Falcon Perception, and Holo3

Arcee’s Trinity-Large-Thinking: The biggest substantive model launch in this set was Arcee’s Trinity-Large-Thinking, released with open weights under Apache 2.0 and positioned explicitly for developers/enterprises that want to inspect, host, distill, and post-train their own systems. Follow-up posts claim strong agentic performance, including #2 on PinchBench behind Opus 4.6, SOTA on Tau2-Airline, and frontier-level telecom results (Arcee, Mark McQuade). OpenRouter highlighted the architecture as a 400B total / 13B active model and made it available immediately (OpenRouter). Several ecosystem partners framed it as a milestone for “American open source,” including Prime Intellect, Datology, and infra supporters emphasizing that a small team served a 400B-class model at production cost points (latkins, willccbb, xlr8harder, natolambert).
Z.ai’s GLM-5V-Turbo: Z.ai introduced GLM-5V-Turbo, a vision coding model that natively handles images, videos, document layouts, and design drafts while preserving pure-text coding performance. The company attributes the gains to native multimodal fusion, a next-gen CogViT encoder, 30+ task collaborative RL, synthetic agentic data generation, and multimodal toolchain extensions for search/drawing/web reading (details, text-coding stability). The model was quickly integrated into multiple downstream surfaces including TRAE, Tabbit, and Vision Arena.
Falcon Perception and OCR: TII released Falcon Perception, an open-vocabulary referring expression segmentation model, alongside a 0.3B OCR model said to be competitive with models 3–10x larger. The notable design point is an early-fusion transformer that mixes image and text from the first layer instead of relying on multi-stage pipelines and late fusion.
Other model notes: H Company’s Holo3 was highlighted as a GUI-navigation model family (A3B/35B, Qwen3.5-based, free license, Transformers support). A separate post praised a Qwen3.5 27B distill trained on Claude 4.6 Opus reasoning traces, claiming SWE-bench wins over Claude Sonnet 4.5, 96.91% HumanEval, lower CoT verbosity, 4-bit local usability, and 300k+ HF downloads (Craig Hewitt).

Claude Code Leak, Operational Issues, and the Competitive Coding-Agent Market

What the leak exposed: Multiple posts converged on analysis of Anthropic’s accidental Claude Code source exposure. The most useful technical synthesis is the long thread from ZhihuFrontier, which emphasizes a minimalist agent core—a single while(true) loop—with sophistication pushed into context management, tooling, and product instrumentation. The leak reportedly showed a 4-layer context compression stack (HISTORY_SNIP, Microcompact, CONTEXT_COLLAPSE, Autocompact), streaming plus parallel tool execution, silent retries on output-length failures, a 40+ tool modular architecture without inheritance-heavy abstractions, and strong use of feature flags and production ablations. A second summary pointed to hidden features including task budget management, AFK mode, “Penguin” fast mode, redirected reasoning, and other unfinished product hooks (ZhihuFrontier).
Operational pain mattered more than the leak for many users: Alongside leak discussion, many developers complained that Claude was simply slow or unreliable that day (Teknium, andersonbcdefg). Community response also fixated on leaked “pets” and UI affordances (meowbooksj), reinforcing that product polish is part of the competitive moat even when orchestration patterns become legible.
DMCA blowback: The second-order story was Anthropic’s overly broad repo takedown attempts. Theo reported a DMCA against a fork that did not contain leaked source; he then argued the takedown itself violated DMCA procedure (post). A correction later came from trq212, calling it a communication mistake; the repo was restored and Theo acknowledged the retraction and rapid response (restored, official response).
Open-source clones and alternatives are gaining mindshare: The leak also turbocharged ecosystem competition. Yuchen Jin noted the leaked Claude Code fork hit 110k+ GitHub stars in a day. At the same time, multiple users said Nous Hermes Agent was easier to deploy and operate than OpenClaw or Claude-derived stacks, often citing near-zero setup and better local workflows (charliehinojosa, VadimStrizheus, Nous). There’s also a tooling wave around prompt steering and efficiency, e.g. a “Universal CLAUDE.md” claiming 63% output-token reduction, and Google’s Agent Skills spec proposing progressive disclosure to cut baseline context by 90%.

Agent Systems Research: Memory, Self-Organization, Coordination Limits, and Security

Memory is becoming first-class infra: MemFactory proposes a unified inference/training framework for memory-augmented agents with native GRPO integration and reported up to 14.8% relative gains over baselines. Separately, Baseten described a 7M-parameter perceiver that compresses KV cache 8x while retaining 90%+ factual retention, pitching it as a path toward models that “learn from experience.” part_harry_ extended the idea further, arguing pretraining itself is data-inefficient because we discard KV cache every step.
Do self-organizing agents beat hand-authored roles? A DAIR summary highlighted new work across 25,000 tasks with up to 256 agents, claiming self-organized roles outperform predefined planner/coder/reviewer hierarchies, with a sequential coordination protocol +14% over centralized approaches, 5,000+ emergent roles, and open models reaching 95% of closed-model quality at lower cost. This sits in tension with a separate line of theory: omarsar0’s summary of new MIT work argues delegated multi-agent planning is decision-theoretically dominated by a centralized Bayes decision-maker when agents do not gain access to genuinely different information sources. In practice, the synthesis is likely: multi-agent helps when it partitions tools, environments, or retrieval channels—not just prompts.
Agent attack surface is the web: A widely shared summary of a new DeepMind paper on “AI Agent Traps” reframes agent security around adversarial content in webpages/documents, not just model jailbreaks. The thread cites hidden prompt injection in HTML/CSS succeeding in up to 86% of scenarios and latent memory poisoning reaching 80%+ attack success with <0.1% contamination, which is material for anyone shipping browse/retrieval-heavy agents.
Long-horizon evaluation is getting richer: New benchmarks/tools included Kaggle Standardized Agent Exams, YC-Bench for simulating a startup over a one-year horizon, and CaP-Gym / CaP-X, a broad benchmark and toolkit for agentic robotics spanning 187 manipulation tasks, 12 frontier models, and both training-free and RL-improved policies with MIT-licensed code (open-source details).

Training, Retrieval, and Infra: RL Frameworks, Optimizers, Kernels, and Benchmarks

Post-training stack maturation: Hugging Face’s TRL v1.0 was framed by many as a meaningful unification of open post-training—SFT, reward modeling, DPO, GRPO—into a production-ready package (commentary). A complementary survey thread from adithya_s_k compared 16 RL frameworks across orchestration, rollout buffering, weight sync, staleness handling, partial-rollout behavior, LoRA support, and distributed parallelism, useful for teams choosing between TRL, VeRL, SLIME, and others.
Optimization and systems releases: HeavyBall 3.0.0 shipped with FSDP, DDP, end-to-end compilation with 2.5x speedup, faster Muon/SOAP variants, and new optimizers. Together AI promoted a behind-the-scenes kernels writeup; Dan Fu followed with a “what a VP of Kernels does” thread. On the low-level DSL side, maharshii argued CuTeDSL materially lowers the barrier to custom kernels by allowing inline PTX directly in Python, avoiding opaque layout gymnastics.
Retrieval evidence continues to favor late interaction: Several posts reiterated that multi-vector / late-interaction retrieval outperforms single-vector embeddings, even after fine-tuning, with better robustness against catastrophic forgetting (lateinteraction, ladder visualization). There was also continued frustration that “RAG” has become an overloaded umbrella term rather than referring to a specific older paper (lateinteraction).
Benchmarks and efficiency surfaces: Arena added Pareto frontier charts across text, vision, search, document, and code, making price/performance tradeoffs more explicit. On standardized inference, Lambda and NVIDIA pointed to MLPerf Inference v6.0 as the better lens for real AI-factory productivity than peak-chip specs.

Developer Platforms, Rate Limits, and Tooling UX

OpenAI Codex usage reset: The most practically important platform announcement for working engineers was thsottiaux’s note that OpenAI reset Codex usage limits across all plans, citing elevated rate-limit hits and a concurrent fraud-account purge that recovered compute. This was quickly amplified by users who interpreted rate-limit generosity as a direct competitive axis in the coding-agent market (reach_vb, Yuchen Jin). Later, thsottiaux also clarified that Codex’s core is intended to be open-source because the ecosystem is still young and mutually informative (post).
Agent-ready docs and platform surfaces: LangChain embedded chat into its docs grounded on full docs, knowledge base, and OSS code. Together AI open-sourced 12 agent skills so Claude Code and Codex can call its APIs with the right model IDs and SDK idioms. OpenAI Devs also showed tighter Linear integration in the Codex app for keeping tickets synchronized with code work.
Infra and storage quality-of-life: SkyPilot added native VAST Data support for direct high-speed dataset mounts across heterogeneous compute backends, and Hugging Face rolled out persistent Storage Buckets for Spaces. Tinker added longer context windows up to 256k for select open models, widening its appeal for RL and long-horizon experimentation.

Top tweets (by engagement)

OpenAI Codex limits reset: thsottiaux reset Codex rate limits across all plans, explicitly tying it to both unexplained user rate-limit spikes and anti-fraud enforcement that freed compute.
GLM-5V-Turbo launch: Z.ai’s announcement was one of the day’s biggest technical launches: a multimodal coding model aimed at GUI agents, visual coding, and agent workflows.
Claude Code leak discourse: Theo’s DMCA thread and Yuchen Jin’s note about the leaked project surpassing 110k GitHub stars captured how quickly source exposure translated into open ecosystem momentum.
Arcee Trinity-Large-Thinking: Arcee’s release and OpenRouter’s architecture summary drew unusually strong engagement for an open-weight reasoning model, suggesting real appetite for serious US-based open releases.
Falcon Perception: Falcon Perception’s launch stood out on the multimodal side for its simple early-fusion architecture and unusually small OCR model size relative to claimed performance.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Claude Code Source Leak and Analysis

Claude Code’s source just leaked — I extracted its multi-agent orchestration system into an open-source framework that works with any LLM (Activity: 1205): The source code for Claude Code was leaked, revealing over 500K lines of TypeScript, including its multi-agent orchestration system. A developer has re-implemented this system as an open-source framework called open-multi-agent, which is model-agnostic and can work with any LLM, such as Claude and OpenAI. The framework includes features like a coordinator pattern for task decomposition, a team system for inter-agent communication, task scheduling with dependency resolution, and a conversation loop for model-tool interactions. It is implemented in TypeScript, spans approximately 8000 lines, and is available under the MIT license on GitHub. Some commenters express skepticism about the legality and ethics of open-sourcing a re-implementation of leaked proprietary code, questioning the developer’s understanding of the architecture and the choice of licensing. There is also a debate about the practicality of using different models for planning and implementation, with a specific mention of using GPT-4o for coding.
- A user highlights the technical aspect of the project, noting that the multi-agent orchestration system extracted from Claude Code’s source involves a coordinator that breaks down goals into tasks. This suggests a sophisticated architecture designed for task management across multiple agents, which could be beneficial for complex LLM applications.
- Another comment questions the choice of using GPT-4o for implementation in the orchestration system, implying that by March 2026, GPT-4o might be outdated for coding tasks. This raises a point about the importance of selecting the most current and capable models for specific tasks in AI development.
Claude code source code has been leaked via a map file in their npm registry (Activity: 5229): The image reveals a directory listing of the ‘claude-code’ project, which appears to have been unintentionally exposed via a map file in the npm registry. This leak includes TypeScript files and directories such as ‘entrypoints,’ ‘commands,’ and ‘utils,’ providing a detailed view of the project’s codebase structure. The incident highlights potential security oversights in managing sensitive code repositories, particularly for companies like Anthropic that are involved in AI development. Commenters humorously speculate on the oversight, suggesting it might be due to an Anthropic employee’s mistake or a failure of AI oversight mechanisms. There’s also a satirical suggestion that the code is now ‘open source’ due to the leak.
- The leak of Claude’s source code via a map file in their npm registry raises significant security concerns, particularly given the model’s reputation for identifying vulnerabilities. This incident highlights potential gaps in Anthropic’s internal security measures, as their AI, known for being ‘scary good’ at finding vulnerabilities, failed to detect this issue.
- The leak has sparked discussions about the potential for community-driven improvements, such as fixing existing bugs like the caching issue. This could lead to a more robust version of Claude, as external developers might contribute patches and enhancements, effectively making it ‘open source’ in practice, if not in legal terms.
- The incident also underscores the challenges of maintaining proprietary code secrecy in public repositories. The humorous suggestion of an ‘Undercover Mode’ for Anthropic employees, which would strip AI attribution from commits, reflects the tension between open collaboration and the need to protect intellectual property.
Analyzing Claude Code Source Code. Write “WTF” and Anthropic knows. (Activity: 840): The Reddit post discusses the source code of Claude Code, revealing extensive tracking and classification mechanisms. The system uses simple keyword detection for language classification, tracking words like wtf and frustrating to flag negative sentiment. It also monitors user behavior during permission prompts, logging actions such as opening or closing feedback boxes and typing without submitting. The feedback system is designed to capture negative experiences, prompting users to share session transcripts. Hidden commands like ultrathink and ultraplan alter system behavior, while telemetry logs detailed environment profiles, including session IDs and runtime details. An internal mode (USER_TYPE=ant) collects even more granular data, tying behavior to specific deployment environments. The post suggests this level of instrumentation is more detailed than typical user expectations, though not necessarily malicious. Source. Commenters note that such tracking mechanisms are standard in many applications for analytics and feedback, suggesting that negative sentiment triggers help identify issues with updates. Some commands, like /btw, are now public, while others remain as internal features or ‘easter eggs.’ The extensive internal artifacts are likened to those found in game apps, possibly due to internal incentives for feature development.
- NandaVegg highlights that the use of keyword lists for sentiment analysis in Claude Code is a standard practice in event-triggered analytics. This approach helps identify negative user feedback, which can be crucial for detecting issues in updates that might disrupt user experience or model behavior. The mention of features like ‘ultraplan’ and ‘ultrathink’ suggests these are experimental or less refined, possibly serving as internal tests or ‘easter eggs’ within the system.
- SRavingmad expresses curiosity about the ‘tamagotchi mode’ in Claude Code, implying there are unique or playful features embedded within the system. This suggests that the developers might be experimenting with interactive or gamified elements, which could be part of a broader strategy to engage users or test new functionalities.
- Exhales_Deeply criticizes the reliance on AI-generated content, suggesting that user-generated posts would be more engaging. This comment indirectly points to a broader discussion about the quality and authenticity of AI-generated content versus human-created content, which is a significant topic in AI development and user interaction.

2. 1-bit and TurboQuant Model Innovations

The Bonsai 1-bit models are very good (Activity: 657): PrismML’s Bonsai 1-bit models offer a significant reduction in model size and memory usage, being 14x smaller than traditional models, which is transformative for local model deployment. The Bonsai 8B model was tested on an M4 Max 48GB MacBook Pro, demonstrating practical applications like chat and document summarization with lower memory pressure compared to models like Qwen3 VL 8B Instruct Q4_K_M. However, it requires a specific fork of llama.cpp to support 1-bit operations, as the main llama.cpp repository lacks this capability. The model’s performance is notably superior to previous MSFT BitNet models, which were largely research-focused and not practical for real-world use. A benchmark comparison between Bonsai and Qwen3.5 models suggests Bonsai’s higher quality for RAM usage, though it struggled with code generation. There is interest in larger Bonsai models, such as a 200B version, and a desire for quantized versions of Qwen 3.5 models.
- itsArmanJr provides a detailed benchmark comparison between Bonsai and Qwen3.5 models, including specific configurations like 35B-A3B, 2B, and 0.8B. The benchmark results are available on GitHub, offering insights into performance metrics across different model sizes.
- -dysangel- highlights the efficiency of Bonsai models in terms of RAM usage, noting that while the model struggled to produce fully functional code, it was impressive given its small size of only 1GB. The comment suggests exploring quantized versions of Qwen 3.5 models, such as 9B or 27B, for potentially better performance.
- Pitiful-Impression70 raises concerns about the performance of 1-bit quantized models like Bonsai on longer contexts, noting that coherence often degrades past 4k tokens. This comment questions whether the Bonsai model maintains quality in extended conversations compared to shorter prompts.
TurboQuant isn’t just for KV: Qwen3.5-27B at near-Q4_0 quality, about 10% smaller, and finally fitting on my 16GB 5060 Ti (Activity: 899): The image illustrates the TurboQuant TQ3_1S model’s ability to maintain near-Q4_0 quality for the Qwen3.5-27B model while being compact enough to fit on a 16GB RTX 5060 Ti. The TQ3_1S model is about 10% smaller than Q4_0, with a size of 12.9 GB compared to 14.4 GB for Q4_0, and shows a minimal performance gap in perplexity (PPL), with TQ3_1S having a PPL of 7.2570 versus Q4_0’s 7.2431. This demonstrates a practical advantage for users with limited GPU memory, allowing the model to fit fully on the specified GPU setup. The post also highlights the use of advanced quantization techniques like Walsh-Hadamard rotation and 8-centroid quantization to achieve these results. Some commenters criticize the use of perplexity as a metric for quantization loss, suggesting KLD or PPL ratio as more accurate alternatives. Others praise the adaptation of cutting-edge research to solve a practical problem, acknowledging the achievement despite the criticisms.
- Velocita84 criticizes the use of Q4_0 quantization, stating it’s outdated and surpassed by more advanced Q4 techniques. They argue that using perplexity as a metric for quantization loss is incorrect, suggesting KLD or PPL ratio against a full bf16 model as more accurate alternatives.
- grumd suggests comparing the model to unsloth Q3_K_S quant of 27B using real benchmarks, implying that practical performance comparisons are necessary to validate claims about model efficiency and quality.
- XccesSv2 expresses skepticism about TurboQuant’s claims of achieving BF16 quality with 4 or 5 bits, noting that real-world tests often don’t reflect the purported improvements, indicating a gap between theoretical claims and practical outcomes.
PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs (Activity: 596): PrismML has announced the release of the 1-bit Bonsai models, including the 1-bit Bonsai 8B, which is a groundbreaking development in AI model efficiency. These models are fully quantized to 1-bit precision across all components, including embeddings, attention layers, MLP layers, and the LM head, without any higher-precision components. The 1-bit Bonsai 8B model, with 8.2 billion parameters, fits into 1.15 GB of memory and is 14x smaller, 8x faster, and 5x more energy efficient than its full-precision counterparts, making it suitable for edge hardware. The models are open-sourced under the Apache 2.0 license, and the implementation requires a fork of Llama.cpp for inference. More details can be found in their whitepaper. Some commenters express skepticism about the practicality of 1-bit models, while others are intrigued by the potential for on-device AI applications. The debate centers around the trade-offs between model precision and performance efficiency.
- PrismML has announced the 1-bit Bonsai 8B model, which is a 1-bit weight model that fits into 1.15 GB of memory. It claims to deliver over 10x the intelligence density of full-precision counterparts, being 14x smaller, 8x faster, and 5x more energy efficient on edge hardware. The model is open-sourced under the Apache 2.0 license, and the company emphasizes the potential for on-device AI applications due to its efficiency.
- The 1-bit Bonsai 8B model is quantized end-to-end using a proprietary method, requiring a fork of Llama.cpp for inference. This model design applies 1-bit quantization across all network components, including embeddings, attention layers, MLP layers, and the LM head, making it a true 1-bit model across its 8.2 billion parameters. This approach highlights a significant shift towards more efficient AI models that can operate effectively on edge devices.
- The announcement suggests a paradigm shift in AI model design, focusing on intelligence density rather than parameter count. By achieving significant reductions in model size and energy consumption, PrismML’s 1-bit models could enable new applications in real-time robotics and offline intelligence, potentially transforming the AI landscape by making advanced models feasible for local execution on edge devices.

3. Local AI Hardware and Software Experiments

Local LLM Claude Code replacement, 128GB MacBook Pro? (Activity: 140): The user is considering upgrading to a 128GB MacBook Pro to run local LLMs as a replacement for Claude Code due to potential price increases in API usage. They are currently using a 2019 Intel-based MacBook Pro and are experiencing performance issues with multiple Docker containers. The user is exploring whether local LLMs can match the capabilities of Claude Code for software development. Claude Code is noted for its 1 million context capability, but open-source models are improving. A user reported running qwen3.5 122b ud q4 xl with a 256k context on a 128GB RAM system, finding it competent for lighter tasks, though not as strong as Claude for heavy coding. Another user suggests trying open-source models via DeepInfra before purchasing, and mentions using the Bodega inference engine as a replacement for commercial subscriptions. There is a debate on whether local LLMs can fully replace Claude Code, with some users finding open-source models like qwen 122 competent for lighter tasks but not yet matching Claude for intensive coding. The shared memory model of Mac is seen as advantageous for running local LLMs.
- EmbarrassedAsk2887 discusses replacing Claude Code and Codex subscriptions with the Bodega inference engine on a 128GB M4 Max MacBook Pro. They provide a detailed write-up and benchmarks, suggesting that Bodega can effectively handle tasks typically managed by commercial solutions. Read more here.
- Mediocre_Paramedic22 shares their experience running the Qwen 3.5 122B UD Q4 XL model with a 256k context on a 128GB RAM setup using Fedora. They note that while Claude is superior for intensive coding tasks, Qwen performs well for lighter workloads and basic agent tasks, utilizing about 29GB of free RAM.
- Aisher mentions using a 128GB M5 Max for local LLM development, noting the noise level as a downside. They suggest using multiple desktop Macs for full-time development, connected via ZeroTier for remote access, as a cost-effective alternative to expensive cloud-based solutions.
Worth building a $7k local AI rig just to experiment? Afraid I’ll lose interest. (Activity: 131): The user is contemplating building a $7k local AI rig to experiment with AI technologies, particularly in photo and video generation, model integration, and AI assistant development. They currently use a MacBook with an M3 Pro chip and 36GB RAM but are concerned it may not suffice for more complex tasks. The proposed rig includes a Corsair Vengeance i5200 with an Intel Core Ultra 9 285K, GeForce RTX 5090, and 64GB DDR5 RAM, with plans to add an additional 128GB RAM. The user is hesitant due to the lack of a concrete use case and the potential for the rig to become an ‘expensive toy’. Commenters suggest alternatives such as renting a machine or using existing hardware with tools like LM Studio to test models like Qwen3.5, 9b, and 27b Q4. Another commenter shares a similar dilemma and opts to continue using a current setup with an RTX 4070Ti and 32GB RAM, highlighting the importance of having a clear use case before investing heavily.
- TassioNoronha_ suggests starting with cloud-based solutions like Open Router or renting a machine for a week to gauge interest before committing to a $7k investment. This approach allows for experimentation without the upfront cost, providing a practical way to assess long-term interest and needs.
- Xmede81 shares their experience of sticking with a current setup featuring an RTX 4070Ti and 32GB RAM, which is sufficient for general use and experimentation. They highlight the importance of evaluating actual use cases and the impact of current memory prices on decision-making.
- Dry-Influence9 advises against building powerful local setups due to current high prices, suggesting that waiting could yield better value. They recommend renting GPUs or using existing computers to experiment, as this can provide similar capabilities without the significant financial commitment.
We built a local inference engine that skips ROCm entirely and just got a 4x speedup on a consumer AMD GPU (Activity: 124): ZINC is a new inference engine designed to bypass the complexities of ROCm by directly interfacing with AMD GPUs through Vulkan, achieving a 4x speedup on an AMD Radeon AI PRO R9700. The engine supports models like Qwen3.5-35B-A3B and Qwen3.5-2B, with current performance at 33.58 tok/s, compared to 107 tok/s for llama.cpp on the same hardware. ZINC’s architecture allows it to run on hardware not officially supported by ROCm, and it includes an OpenAI-compatible API server for parallel request batching. The project is open-source and available on GitHub. Some commenters question the significance of the speedup given that ZINC’s performance is still less than a third of llama.cpp’s speed. Others express skepticism about achieving such improvements when larger companies have struggled in this area.
- Big-Masterpiece-9581 questions the significance of the 4x speedup, pointing out that despite the improvement, the performance is still less than a third of llama.cpp’s speed. This suggests that while the optimization is notable, it may not yet be competitive with existing solutions in terms of raw throughput.
- fallingdowndizzyvr highlights a performance issue, noting that achieving only 7 tok/s on an AMD Radeon AI PRO R9700 with the Qwen3.5-35B-A3B-UD Q4_K_XL model indicates a potential inefficiency in the initial implementation. This suggests that the baseline performance was suboptimal, which could have skewed the perceived improvement.
- hipcatinca provides a benchmark comparison using an RX 570 with llama.cpp via Vulkan, achieving approximately 31 tok/s with the llama3.1:8b model. This serves as a reference point, illustrating that other configurations and models can achieve significantly higher throughput on different hardware setups.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Claude Code Source Leak and Reactions

Claude code source code has been leaked via a map file in their npm registry (Activity: 1598): On March 31, 2026, the full source code of Anthropic’s Claude Code CLI was leaked through a .map file in their npm registry, as reported on GitHub. The codebase, consisting of approximately 512k lines of TypeScript, is built using React + Ink for terminal UI and runs on the Bun runtime. This leak potentially exposes major gated features that are not yet public. The comments reflect a misunderstanding among some users about the implications of the leak, particularly the difference between Large Language Models (LLMs) and agents, highlighting a knowledge gap in the community.
- The leak of Claude’s source code via a map file in their npm registry has sparked discussions about the potential implications for developers and researchers. One key point is the distinction between Large Language Models (LLMs) and agents, as highlighted by Nedshent. This leak may expose a knowledge gap where people might not fully understand how LLMs function compared to agents, which are typically more task-specific and interactive.
- The technical details of the leak reveal that the codebase consists of approximately 512k lines of TypeScript, built with React and Ink for terminal UI, and runs on the Bun runtime. This setup suggests a modern and scalable architecture, potentially offering insights into how Claude’s infrastructure is designed to handle complex tasks and interactions.
- There is speculation about the reasons behind the leaks, with some users humorously suggesting that Anthropic might be using Claude itself for development and content creation tasks. This raises questions about the security and operational practices within Anthropic, especially if such reliance on AI could inadvertently lead to more leaks or security vulnerabilities.
Anthropic staff reacts to Claude code leak 👀 (Activity: 859): The image is a meme depicting a humorous Twitter exchange that indirectly references a code leak from Anthropic, a company known for its work in AI. The meme uses a popular internet joke about an ‘immortal snail’ to suggest that the leak is an inevitable consequence of being ‘caught’ by the snail, implying a sense of inevitability or fate. This reflects a lighthearted community reaction to the leak, rather than a technical discussion or official statement from Anthropic. Commenters humorously note the dual reactions to the leak: legal teams wanting to ‘delete it’ while engineers have already ‘starred it,’ indicating a divide between legal caution and technical curiosity. Another comment suggests that with Anthropic’s rapid development pace, such incidents were expected.
- Belium suggests that the leak of Claude’s code could be beneficial for Anthropic, as it generates hype and allows engineers to identify and fix bugs. The leak also provides engineers with the opportunity to create their own implementations or ‘harnesses’ of Claude, potentially increasing its usage and influence in the developer community.
- IntenselySwedish highlights a perceived irony in Anthropic’s situation, pointing out that the company, which has been accused of large-scale copyright violations through book piracy, is now facing its own copyright challenges with the leak of Claude’s code. This comment underscores the complex legal and ethical landscape surrounding AI development and intellectual property.
- xitizen7 comments on the rapid pace of development and releases from Anthropic, suggesting that such a leak was almost inevitable given the company’s trajectory. This reflects a broader industry trend where fast-paced innovation can sometimes lead to security oversights or unintended disclosures.
Claude Code Source Leak Megathread (Activity: 653): The Claude Code CLI source code was leaked, revealing several technical details. Notably, the npm source (@anthropic-ai/[email protected]) shows that the DuckDuckGo replacement in the Rust port is incorrect; the real package uses a nested API call to Anthropic’s server-side search with encrypted content blobs. Additionally, a two-tier web system is implemented, where 85 domains are pre-approved for full content extraction, while others are limited to 125-character quotes. Structured data in <head> is ignored, and tables are not supported in the markdown converter. The system limits to 8 results per query with no pagination. A hidden feature, KAIROS_DREAM, allows Claude to self-review and update its memory after inactivity. The newer search version (web_search_20260209) enables Claude to programmatically filter search results. The source can be verified in the minified cli.js of the npm package. Anthropic has issued a DMCA to remove the leaked code from GitHub. Some commenters criticize the code quality, suggesting that many critics may lack experience in shipping production apps. Others focus on the technical implications of the leak, such as the incorrect assumptions about DuckDuckGo usage and the limitations of the markdown converter.
- Ooty-io highlights several technical aspects of the Claude Code source, noting that the package makes nested API calls to Anthropic’s server-side search, with results returned as encrypted content blobs, rather than using DuckDuckGo as a standalone replacement. Additionally, the source code reveals a two-tier web system where 85 documentation domains are pre-approved for full content extraction, while other sites are limited to 125-character quotes. The code also shows that structured data in <head> tags is ignored, and tables are not supported in the markdown conversion process.
- Independent-Corgi-88 discusses the broader implications of the Claude Code leak, suggesting it points towards a future of AI characterized by multi-agent coordination, memory layers, and persistent interaction. This perspective emphasizes the importance of systems with memory and coordination over raw model capability, suggesting that the future of AI involves environments that support sustained and useful work. The comment also references J3nna, an AI being developed to understand its operating environment, highlighting the shift in focus from model capability to the surrounding system.
- Joozio provides insights from analyzing the Claude Code source, noting that the CLAUDE.md file is reinserted with every turn change, impacting token usage. They also mention that switching models mid-session clears the prompt cache, leading to increased token costs. Additionally, Claude Code ranks poorly on terminal benchmarks, coming in last for Opus among harnesses, with a flat 77% performance compared to Cursor’s 77% to 93%. Joozio implemented several patterns from the source, such as semantic memory merging and cache monitoring, into their own agent.
i dug through claude code’s leaked source and anthropic’s codebase is absolutely unhinged (Activity: 6259): The leaked source code of Anthropic’s Claude reveals a whimsical feature: a terminal-based pet system called /buddy, which includes 18 species with a gacha rarity system and interactive ASCII companions. The codebase also shows unconventional practices, such as hex encoding species names to bypass internal scanners, and a voice mode using Deepgram Nova 3 for speech-to-text. The project is codenamed ‘tengu’, with telemetry events and feature flags reflecting this. The codebase is notably large, with main.tsx at 803,924 bytes and several files exceeding 4,000 lines. It contains 460 eslint-disable comments and numerous deprecated functions still in use, indicating a lack of codebase hygiene. Additionally, there are unreleased features like ‘kairos’ and ‘ultraplan’, and several hidden slash commands. Some commenters argue that the codebase’s state is typical for large projects and not particularly ‘unhinged’, while others express interest in the /buddy feature, wishing it were available sooner.
- A user points out that the presence of deprecated functions in the codebase is likely a strategic decision to signal developers not to use them in new code. This is a common practice in large codebases where gradual migration to new implementations is necessary, especially when multiple developers are involved and there is pressure from sales teams to maintain functionality while transitioning.
- Another commenter argues that the codebase’s state is typical for large projects, especially those developed before the advent of AI tools like GPT-3. They suggest that the complexity and seemingly chaotic nature of the code are standard in environments where many developers contribute under tight deadlines and evolving requirements.
- A technical insight is provided regarding the perception of the codebase as ‘unhinged.’ The commenter suggests that such a view might stem from a lack of experience with large-scale software projects, where the code often appears disorganized due to the sheer number of contributors and the necessity to maintain legacy systems while integrating new features.
Claude Code’s source code just leaked — so I had Claude Code analyze its own internals and build an open-source multi-agent framework from it (Activity: 513): The source code for Claude Code was leaked, revealing over 500K lines of TypeScript, including its multi-agent orchestration layer. A developer re-implemented this as an open-source, model-agnostic framework, allowing integration of different LLMs like Claude and GPT in a shared workflow. Key features include multi-agent teams, task pipelines with dependency resolution, inter-agent messaging, and an LLMAdapter interface. The framework is ~8000 lines of TypeScript and is available on GitHub under the MIT license. Some commenters appreciate the framework’s ability to integrate various LLMs, which can reduce costs. However, others note that the framework’s core functionality is similar to existing solutions like CrewAI and AutoGen, and that the re-implementation mainly replicates standard agent loop patterns.
- Macaulay_Codin critiques the framework, noting that it follows a standard agent loop pattern: calling an LLM, executing tool calls, and iterating over results. The multi-agent aspect is essentially a task queue coordinator, which is not novel. The framework includes five built-in tools, rewritten from Claude Code’s tools, and is implemented in 8k lines of TypeScript, suggesting it’s a manageable project rather than a massive reverse engineering effort. Alternatives like CrewAI, AutoGen, and the Claude Agent SDK offer similar functionalities.
- JuryNightFury highlights the framework’s capability to integrate with other model families using an OpenRouter API key, demonstrating its model-agnostic nature. This feature allows it to fetch reviews from various models, showcasing its flexibility in utilizing different AI models beyond its original design.
- NoInside3418 appreciates the potential cost savings and efficiency gains from using the framework to enable communication between subagents from different models like Gemini, Codex, and Claude. This interoperability could streamline processes by leveraging the strengths of each model, such as Gemini’s large context and low cost, Haiku’s implementation capabilities, and GPT’s planning features.
Anthropic’s leaked CLI source code reveals a hidden “Tamagotchi” pet and autonomous multi-agent teams. The bar for developer tools is getting wild. (Activity: 161): Anthropic accidentally exposed the source code of their CLI tool, revealing innovative features like a Tamagotchi-style virtual pet called “BUDDY” that gamifies the terminal experience by leveling up based on coding behavior. Additionally, the code includes features like “ULTRAPLAN,” which allows the AI to autonomously plan for 30 minutes, and “BRIDGE MODE,” where multiple AI instances collaborate as a team. Another feature, “KAIROS,” autonomously manages failing tests and dependencies. These features suggest a shift towards more autonomous and interactive developer tools. For a detailed breakdown, see the full analysis. Commenters are skeptical about the feasibility of autonomous multi-agent teams, suggesting the pet feature is more believable due to its potential for user engagement. There is also curiosity about whether these features represent real product directions or are merely experimental ideas.
- Senior_Hamster_58 raises skepticism about the claim of autonomous multi-agent teams being proven by a leaked repository, suggesting that such features might be more speculative or experimental rather than indicative of a real product direction. They question whether these features are part of a serious development effort or merely internal experiments that may not reach production, highlighting a common issue in software development where many ideas do not survive the transition from concept to release engineering.
- OutrageousIndustry28 claims that the feature is already live and can be activated using a specific command (/buddy). This suggests that at least some components of the leaked features might be functional or accessible, indicating a level of readiness beyond mere speculation or internal testing. However, without further verification, this claim remains anecdotal.
- rainmaker66 and prussell774 both suggest that the features, including the “Tamagotchi” pet and autonomous multi-agent teams, are part of an April Fool’s joke by Anthropic. This implies that the leaked code might not represent serious development efforts but rather a playful or humorous initiative, which is a common practice in tech companies around April 1st.

3. OpenAI and Anthropic Funding and Developments

OpenAI raises $122 billion to accelerate the next phase of AI (Activity: 794): OpenAI has raised $122 billion, reaching a post-money valuation of $852 billion, to bolster its position as a core AI infrastructure provider. The company reports 900 million weekly active users for ChatGPT and $2 billion in monthly revenue. Strategic partnerships with Amazon, NVIDIA, and Microsoft are pivotal in advancing their AI capabilities, focusing on enhanced compute infrastructure and a unified AI superapp for both consumer and enterprise applications. More details can be found in the original article. Commenters are questioning the allocation of such a large funding amount, with some expressing skepticism about the necessity of this capital given recent fundraising efforts.

AI Discords

Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.

Apr 01
not much happened today

Companies

Models

Topics

People