a quiet day.

AI News for 3/23/2026-3/24/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!


AI Twitter Recap

Agent Infrastructure, Computer Use, and Design-to-Action Tooling

  • Anthropic’s agent harness and “computer use” shift the product surface: A recurring theme today was that agent capability is increasingly about the harness, not just the base model. Anthropic published a new engineering writeup on how it uses a multi-agent harness for frontend design and long-running software tasks, emphasizing orchestration over one-shot prompting (AnthropicAI). Multiple developers independently argued that “computer use” matters because it lets models act in messy software environments with no reliable APIs (glennko), though others noted this is still slow and likely transitional until more tools expose APIs/CLI surfaces (Yuchenj_UW). The broader operational takeaway was captured well by kerrsee: retries, rollbacks, webhooks, structured logging, and recovery paths remain the unglamorous bottlenecks in production agent deployment.

  • Figma/MCP/Cursor make design canvases directly agent-editable: The strongest concrete workflow launch was Figma’s MCP server and direct AI editing on the canvas, now in open beta (figma). GitHub highlighted that this works through Copilot CLI and other clients via MCP (github), and Cursor immediately extended the pattern to generating components/frontends in Figma using a team’s design system (cursor_ai). This is one of the clearest examples of tool-calling becoming product-native rather than chat-wrapper-native. LangChain also pushed in the same direction with framework-native tool rendering and Slack-native Fleet workflows, including custom Slack bots and an Inbox for human approvals (LangChain_JS, LangChain, hwchase17).

Open Agent Platforms, Benchmarks, and RL Environment Stacks

  • Hermes Agent v0.4.0 is becoming a full personal-agent runtime: Nous released a substantial Hermes Agent v0.4.0 update with roughly 300 merged PRs in a week, adding an OpenAI-compatible Responses API backend, background self-improvement loops, broader messaging integrations, improved context compression, and more CLI ergonomics (Teknium, Teknium, NousResearch). The most technically interesting feature is the post-response review agent that decides what to retain as reusable memory/skills (Teknium). Community reactions focused less on benchmark claims and more on operational value: exposing a personal coding/ops agent behind a standard API makes it usable from Open WebUI, LobeChat, or any OpenAI-compatible client (witcheer).

  • Open agent ecosystems are converging around environments, skills, and reproducible evals: AI2 released MolmoWeb, an open-source browser agent built on Molmo 2 in 4B and 8B sizes, claiming open-weight SOTA across four web-agent benchmarks and even surpassing some proprietary agents (allen_ai). In parallel, GenReasoning launched OpenReward, a platform exposing 330+ RL environments, autoscaled environment compute, and 4.5M+ unique RL tasks through one API—explicitly targeting the often-missing “environment compute” layer of agentic RL (GenReasoning, rosstaylor90). Zhipu contributed ZClawBench, a benchmark with 116 real-world agent tasks spanning office automation, coding, and analysis (HuggingPapers). Together, these point to a stack maturing from “agent demos” toward standardized environment serving + benchmarkable task suites + reusable harnesses.

Inference, Storage, and Systems Optimizations

  • vLLM and Transformers both reported material inference/runtime gains: vLLM’s GTC recap highlighted several systems upgrades: Model Runner V2 with GPU-native Triton kernels, a hybrid memory allocator, encoder prefill disaggregation with up to 2.5x P99 throughput gains for multimodal workloads, and modular MoE kernels (vllm_project, vllm_project). Separately, Hugging Face/Transformers-side optimization work claimed continuous batching plus torch.compile tuning now reaches 95% of vLLM throughput for 8K generation, effectively closing the previous gap for synthetic data generation workloads (remi_or_).

  • hf-mount is a notable agent/data primitive: Hugging Face released hf-mount, which lets users mount Hub datasets, models, and storage buckets as a local filesystem, including examples with a 5TB FineWeb slice (julien_c, ClementDelangue). This matters beyond convenience: several engineers pointed out that agents are unusually good at filesystem operations, making mounted remote storage a natural substrate for agent memory, scratchpads, team artifact storage, and lazy access to large corpora (Vtrivedy10, victormustar). This is one of the more practical infrastructure launches of the day because it reduces the friction between local tooling and cloud-scale data.

  • Moreau and TurboQuant show optimization pressure moving below the model layer: Optimal Intellect introduced Moreau, a GPU-native solver from the CVXPY team claiming orders-of-magnitude speedups over existing tools (opt_intellect). Google Research announced TurboQuant, a KV-cache compression algorithm reporting at least 6x memory reduction and up to 8x speedup with no accuracy loss (GoogleResearch). The common pattern: high-value gains are increasingly coming from runtime, memory, and systems layers, not just from larger model checkpoints.

Security, Supply Chain Risk, and Guardrails for Agentic Software

  • The LiteLLM PyPI compromise dominated infra/security discussion: Multiple posts warned that LiteLLM 1.82.8 on PyPI had been compromised, with malicious payloads attempting to exfiltrate credentials and replicate across environments (hnykda). simonw noted the package was later quarantined on PyPI, but the incident quickly became a broader conversation about software supply-chain fragility. karpathy gave the most detailed summary, listing possible exfiltration targets including cloud creds, SSH keys, Kubernetes configs, CI/CD secrets, wallets, and shell history, while noting transitive risk to packages like DSPy. The most important systems-level implication came from DrJimFan: in an agentic world, the entire filesystem becomes part of the attack surface, since any file likely to enter context can become a vector.

  • “De-vibing” and permissioning are becoming first-class product requirements: Several posts effectively converged on a new design principle: autonomous coding tools need stronger shells, better permission defaults, and fewer broad dependencies. Yuchen called the incident “nightmare fuel” for --dangerously-skip-permissions style workflows (Yuchenj_UW); Anthropic’s new Claude Code auto mode became controversial for exactly this reason, despite enthusiasm over the productivity jump (alexalbert__, kimmonismus). The practical response from many builders was a renewed preference for minimal bespoke routing, tighter audited deps, and stronger human approval loops.

Labs, Org Moves, and Product Strategy Shifts

  • AI2 loses leadership to Microsoft; Microsoft AI continues talent concentration: The clearest org move was the reaction to Microsoft poaching part of the AI2 leadership team, including mentions of Ali Farhadi, Hanna Hajishirzi, and Ranjay Krishna joining Microsoft Superintelligence (eliebakouch, NandoDF). The subtext in technical circles was concern over whether open research institutions can continue competing with hyperscalers for top talent and frontier-scale work (stanfordnlp).

  • OpenAI is reallocating resources hard: $1B Foundation spend, Sora wind-down, “Spud” coming: OpenAI announced its Foundation will spend at least $1B over the next year, with Wojciech Zaremba moving to lead AI resilience and additional hires across disease, civil society, and operations (sama, woj_zaremba, btaylor). At the same time, reports circulated that OpenAI had finished initial development of its next major LLM, codenamed “Spud,” and was winding down Sora’s app/product footprint to free compute (steph_palazzolo, kimmonismus). For engineers, the signal is straightforward: OpenAI appears to be narrowing product focus around core general models/infrastructure, even at the cost of cutting side products.

Top tweets (by engagement)

  • LiteLLM supply-chain compromise: karpathy gave the most technically complete and highest-signal breakdown of the PyPI attack and its blast radius.
  • Anthropic’s harness engineering post: AnthropicAI was one of the day’s most important engineering reads on how frontier labs are actually structuring long-running agent workflows.
  • Figma MCP launch: figma and github showed perhaps the cleanest mainstream example yet of agents acting directly on a production design surface.
  • OpenAI Foundation $1B commitment: sama and woj_zaremba marked a major organizational and safety/resilience shift.
  • Hermes Agent v0.4.0: Teknium / NousResearch stood out as one of the biggest open-agent runtime releases of the day.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Security and Malware Concerns in AI Tools

  • LM Studio may possibly be infected with sophisticated malware. (Activity: 1822): The image in the Reddit post shows a Windows Security alert indicating that a severe threat, identified as “Trojan:JS/GlassWorm.ZZ!MTB,” was quarantined from the LM Studio directory. This raised concerns about a potential malware infection in LM Studio. However, LM Studio and Microsoft have since confirmed that this was a false positive, likely due to Defender’s heuristic definitions conflicting with LM Studio’s obfuscated Electron bundle. The community discussion highlights the importance of security audits and the potential risks of obfuscation techniques that resemble malware patterns. Despite the false alarm, users are advised to take precautionary measures to secure their data. The comments reflect a consensus that the malware detection was a false positive, supported by historical instances of similar false alarms and VirusTotal’s low detection rate. However, there is criticism of LM Studio’s code obfuscation practices, which can inadvertently trigger such alerts and complicate security assessments.

    • Yags from LM Studio confirmed that the malware alert was a false positive, verified by Microsoft, and no longer appears in VirusTotal. Despite this, LM Studio is auditing their build machine scripts and environments to prevent any genuine security incidents in the future.
    • Denoflore_ai_guy provided a detailed analysis suggesting the malware alert was likely a false positive due to Defender’s heuristic updates conflicting with LM Studio’s obfuscated Electron bundle. However, they noted that LM Studio’s code obfuscation for IP protection could resemble malware techniques, which complicates detection.
    • Denoflore_ai_guy also outlined steps to mitigate potential risks if GlassWorm malware was indeed present, including changing passwords, moving crypto funds, and checking for malicious Chrome extensions. They emphasized the importance of a clean OS install and credential rotation to ensure security.
  • [Developing situation] LiteLLM compromised (Activity: 380): The LiteLLM library has been compromised, as detailed in GitHub issue #24512. The attack exploits a .pth file vulnerability, which executes code on interpreter startup without requiring imports, making it difficult to detect through standard code reviews. Users of version 1.82.8 are advised to rotate credentials immediately if used in production environments, as the compromise could expose sensitive information. A notable comment highlights the effectiveness of using Docker containers for isolating host secrets, which can mitigate some security risks. Another comment emphasizes the stealthy nature of the .pth file trick, which bypasses typical security scans.

    • The .pth file trick is highlighted as a significant security vulnerability. This method allows code execution on interpreter startup without needing imports, making it nearly invisible to standard code reviews. Users who ran LiteLLM versions 1.82.8 or 1.82.7 are advised to rotate credentials immediately due to potential exposure.
    • Aider, a tool that uses LiteLLM for LLM access, is reportedly safe as it operates on an older version (1.82.3) of LiteLLM, which is not compromised. The compromised versions are identified as 1.82.8 and 1.82.7, emphasizing the importance of version control and monitoring for security vulnerabilities.
    • The discussion touches on the use of Docker containers for security isolation. While typically not considered a security measure, in this case, Docker effectively isolated host secrets, demonstrating its potential utility in mitigating certain types of security breaches.
  • Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do not update! (Activity: 441): Litellm versions 1.82.7 and 1.82.8 on PyPI have been compromised, as confirmed by a blog post. The attack appears to be a supply chain compromise, potentially affecting thousands of users. The malicious versions were uploaded to PyPI, posing a significant risk to CI/CD pipelines that automatically update dependencies. The attack was executed through the GitHub account of the LiteLLM CEO, which was hacked, as evidenced by unauthorized commits and repository updates claiming ‘teampcp owns BerriAI’. Commenters emphasize the importance of pinning dependency versions to avoid such supply chain attacks, highlighting the risk of automatic updates in production environments. There is also concern about the potential for increased frequency of such attacks on AI tooling.

    • GroundbreakingMall54 highlights the critical importance of pinning dependency versions and avoiding auto-updates in production environments. They emphasize the risk of supply chain attacks, especially in AI tooling, as evidenced by the compromised Litellm versions on PyPI, which could have been automatically integrated into CI/CD pipelines overnight.
    • Gremlation and JockY discuss the breach by ‘teampcp’, who compromised the CEO’s GitHub account to inject malware into Litellm. This malware, embedded in versions 1.82.7 and 1.82.8, is designed to steal secrets upon startup. They note that versions <= 1.82.6 remain unaffected, and provide links to GitHub commits showing the unauthorized changes made under the CEO’s account.
    • kiwibonga points out a specific malicious payload in the compromised Litellm versions that executes a destructive command (rm -rf /) if the system’s timezone is set to Asia/Tehran. This highlights the severity and targeted nature of the attack, suggesting a broader geopolitical context to the cyber threat landscape.

2. Local LLM Development and Performance Enhancements

  • I built Fox – a Rust LLM inference engine with 2x Ollama throughput and 72% lower TTFT. (Activity: 212): Fox is a Rust-based local LLM inference engine designed as a drop-in replacement for Ollama, offering significant performance improvements. It features PagedAttention, continuous batching, and prefix caching, achieving 72% lower TTFT and 111% higher throughput on an RTX 4060 with the Llama-3.2-3B-Instruct-Q4_K_M model. The engine supports multi-model serving with lazy loading and LRU eviction, and provides a dual API compatible with both OpenAI and Ollama. The official Docker image is available, and the system supports hardware autodetection across CUDA, Vulkan, Metal, and CPU. The project is in beta, with thorough testing on Linux and NVIDIA, but less so on other platforms and configurations. GitHub and Docker Hub links are provided for access. A top comment highlights the impressive technical achievement of implementing vLLM-level features in Rust, noting the significant performance gains from prefix caching and continuous batching. There is a request for LoRA hot-swapping capabilities to further differentiate Fox from Ollama. Another comment expresses skepticism about the project’s authenticity and security, suggesting the need for independent verification and code auditing.

    • No_Strain_2140 highlights the technical achievements of Fox, noting its use of PagedAttention, continuous batching, and prefix caching, which contribute to its impressive performance metrics such as 87ms P50 on a 4060 with Q4_K_M. The commenter contrasts Fox’s approach with Ollama’s sequential processing, emphasizing Fox’s advanced features like multi-turn KV reuse that enhance throughput and reduce TTFT. They also inquire about the potential for LoRA hot-swapping, which could allow serving a base model with multiple LoRA adapters, positioning Fox as more than just a faster alternative to Ollama.
    • PettyHoe raises concerns about the security and credibility of the project, suggesting the need for independent verification and code audits to ensure there are no risks of exfiltration. They express skepticism about the project’s authenticity due to the AI-generated nature of the descriptions and comments, emphasizing the importance of cautious evaluation before adoption.
    • AIDevUK asks about Fox’s capability to operate over multiple GPUs, which is a critical consideration for scaling and performance in large-scale deployments. This question points to the need for understanding Fox’s architecture and its ability to leverage multi-GPU setups for enhanced computational efficiency.
  • RYS II - Repeated layers with Qwen3.5 27B and some hints at a ‘Universal Language’ (Activity: 695): The post discusses findings from experiments with the Qwen3.5 27B model, revealing that LLMs may process information in a ‘universal language’. This is evidenced by the similarity in latent representations of the same content across different languages, such as Chinese and English, during the middle layers of the model. The author also found that repeating blocks in the middle of the transformer stack enhances performance. The models are available on Hugging Face. The author suggests that fine-tuning these models, especially the RYS-Qwen3.5-27B-FP8-XL, could set a new state-of-the-art (SOTA) for models of this size. Additionally, there is ongoing work to optimize VRAM usage by keeping duplicated layers as copies, which could be beneficial for future implementations. Commenters appreciate the rigorous approach and potential implications of the research, noting its relevance to performance improvements seen in complex model merges. There is interest in how these findings might influence open-source tuning practices, particularly in creative writing and self-merging techniques.

    • ArsNeph discusses the intriguing performance improvements observed in self-merges like Goliath 120B, noting that not all models benefit equally. They reference historical discussions about VRAM-less duplicated layer inference, highlighting ongoing work on EXL3. The comment suggests that open-source tuners, particularly those focused on EQ performance, might find these insights valuable, especially in creative writing contexts where complex merge trees have shown significant improvements.
    • Kwigg reflects on past experiences with ‘frankenmerging’ during the llama2 era, questioning the efficiency of such methods with newer models that have advanced attention mechanisms. They note that older frankenmerges were memory inefficient, implying that modern models might handle these techniques differently, potentially leading to better performance outcomes.
    • TomLucidor suggests expanding the language testing of Qwen3.5 to include Japanese, Thai, French, German, and Italian. They also propose a comparative analysis between Qwen3.5 and other models like Nemotron-3, known for its speed and linear attention, and Granite-4.0, which offers a similar size variety but is less optimized. This could provide insights into the relative performance and optimization of these models.
  • FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton, written in Python. What it means for inference. (Activity: 364): FlashAttention-4 achieves 1613 TFLOPs/s on the Blackwell B200 GPU, utilizing 71% of its theoretical peak performance. It is 2.1-2.7x faster than Triton and up to 1.3x faster than cuDNN 9.13. The implementation is entirely in Python using NVIDIA’s CuTeDSL, which compiles in 2.5 seconds compared to 55 seconds for C++. This version supports GQA and MQA and is integrated into vLLM 0.17.0. However, it is limited to Hopper + Blackwell architectures, specifically H100/H800 and B200/B100 GPUs, due to reliance on specific hardware features like TMEM, 2-CTA MMA, and async TMA. The article also discusses how softmax has become the bottleneck and how selective rescaling optimizes performance. Commenters express frustration with NVIDIA’s marketing of GPUs as ‘Blackwell’ when they lack full compatibility with FlashAttention-4, highlighting a discrepancy between advertised and actual hardware capabilities.

    • JockY expresses frustration with NVIDIA’s marketing of the RTX 6000 Pro as ‘Blackwell’ when it is not fully compatible with Blackwell features, specifically mentioning that FlashAttention-4 (FA4) and NVFP4 are only supported on SM100 architectures. This highlights a discrepancy between NVIDIA’s product naming and actual hardware capabilities, which can mislead early adopters expecting full feature support.
    • Daemontatox points out that the issue with NVIDIA’s RTX 6000 Pro being marketed as ‘Blackwell’ is more related to the Streaming Multiprocessor (SM) architecture rather than the naming or overall architecture. The RTX 6000 Pro and DGX systems are sold under the ‘Blackwell’ name but actually use the SM120 architecture, which lacks some expected features, leading to consumer dissatisfaction.
    • STNKMyyy questions the relevance of such high-performance advancements like FlashAttention-4 for consumer-grade GPUs, implying that while these technologies are groundbreaking, they may not be accessible or beneficial for typical consumer hardware users. This reflects a common concern about the gap between cutting-edge research and practical consumer applications.
  • Created a SillyTavern extension that brings NPC’s to life in any game (Activity: 499): The post describes a new extension for SillyTavern that integrates NPCs into any game by using Cydonia as the role-playing (RP) model and Qwen 3.5 0.8B as the game master. This setup allows for dynamic NPC interactions by downloading a game’s wiki and feeding it into SillyTavern, enabling NPCs to have detailed lore and respond contextually. The system uses voice cloning from game files and provides NPCs with game state information, such as player stats and location. The RP model operates locally, ensuring low latency and strong narrative capabilities. A secondary model, Qwen 3.5, interprets RP interactions to trigger in-game actions, enhancing the realism and depth of older games without needing conversational input. The post highlights the effectiveness of specialized RP models over base models in gaming applications. Commenters express surprise and enthusiasm about the potential of AI in gaming, noting the innovative use of AI for NPC interactions and questioning why such technology isn’t already standard in games.

    • A user highlights the impressive use of a 0.8B parameter model for bringing NPCs to life in games, questioning if the project is open source. This suggests a lightweight model capable of running efficiently in real-time gaming environments, which is significant for integration into existing games without heavy computational demands.
  • Which local model we running on the overland Jeep fellas? (Activity: 459): The image depicts a Waymo self-driving car, highlighting the technological advancements in autonomous vehicle systems. The discussion centers around the prediction that future cars will require 300GB of RAM, a significant increase from current standards. This prediction is likely based on the assumption that more complex models, possibly involving real-time data processing and AI-driven decision-making, will be integrated into vehicles. The comments reflect skepticism about this prediction, with users questioning the necessity of such high memory requirements, especially when current vehicles operate efficiently on much less RAM. Commenters express skepticism about the prediction of 300GB of RAM for future cars, questioning the basis of this assumption and comparing it to current vehicle capabilities that require significantly less memory.

    • ForsookComparison questions the necessity of high RAM requirements for automotive models, noting that their car operated efficiently with just 16GB of RAM over a 600-mile journey. They challenge the assumption that 300GB is needed, suggesting that such figures might be based on models that require extensive tool-calls, which may not be applicable in all scenarios.
    • txdv highlights the potential cost implications of high RAM requirements in vehicles, expressing concern over the feasibility of 128GB upgrades. They point out that automotive pricing is sensitive, and a 5k cost for RAM could be prohibitive for consumers, indicating a need for balancing performance with affordability.

3. Chinese LLM Market and Model Evaluations

  • The current state of the Chinese LLMs scene (Activity: 639): The Chinese LLM landscape is dominated by major players like ByteDance, Alibaba, Tencent, and Baidu, each with proprietary and open-weight models. ByteDance leads with its dola-seed model, akin to OpenAI, and its Seedance T2V model is popular for video generation. Alibaba excels in open-weight models, particularly small ones, and is strong in T2I and T2V. Tencent’s Hunyuan model is noted for 3D mesh generation, though its latest versions are not open-sourced. Baidu’s Ernie model is less used, with a stronger focus on autonomous driving. Other notable players include Xiaomi with Mimo V2 Pro, Ant Group with Ling 2.5 1T, and Meituan with LongCat-Flash-Chat, which uses a dynamic MoE approach. Deepseek is highlighted for its innovation in attention mechanisms like MLA and DSA. The “Six AI Small Tigers” such as Zhipu and Minimax focus on releasing large open-weight models to gain recognition. Government-funded initiatives like BAAI and Shanghai AI Lab are also contributing, though with varying reputations. Commenters note the rapid pace of open-weight model releases in China compared to the US, with some labs releasing more in a quarter than US companies in two years. Tencent is recognized for its investment in game development-specific models, with Hunyuan 3.1 being state-of-the-art for 3D mesh generation.

    • Tencent is heavily investing in game development-specific models, such as Hunyuan 3.1 for 3D mesh generation and HY-Motion for text-to-animation, which are considered state-of-the-art. Initially, Tencent open-sources these models to build brand recognition, but transitions to closed weights once they reach commercial viability, as seen with the latest Hunyuan 3D models.
    • A list of popular models by token usage on OpenRouter over the last 7 days highlights the dominance of Chinese models, with Xiaomi MiMo-V2-Pro leading at 1.77 trillion tokens. Notably, only three Western labs are ranked, and the ‘Small Tigers’—smaller companies advancing AI rapidly—are prominent, indicating a shift in innovation dynamics.
    • Despite ByteDance’s significant contributions to AI, they have not released any open weight models, as confirmed by the absence of such models on Hugging Face. This contrasts with other Chinese labs that frequently release open weights, accelerating competition in the AI space.
  • So cursor admits that Kimi K2.5 is the best open source model (Activity: 629): The image is a tweet from Aman Sanger discussing the evaluation of base models, specifically highlighting that Kimi K2.5 emerged as the strongest model based on perplexity-based evaluations. The tweet notes that the model’s strength is attributed to continued pre-training and high-compute reinforcement learning, which enhance the capabilities of the Composer-2 model. The tweet also acknowledges an oversight in not mentioning the Kimi base in their blog, with plans to rectify this in future communications. One comment critiques the use of perplexity-based evaluations between models, noting that scores can be influenced by factors like dictionary size. Another comment questions the claim about the proportion of training done by Kimi K2, citing reports from Workshop Labs that suggest Fireworks’ K2 training code is not optimized for hyperscaled training, contrasting with claims of its efficacy.

    • The claim that Kimi K2.5 is the best open-source model is questioned due to the methodology of evaluation, particularly the use of perplexity scores which can be misleading as they depend on factors like dictionary size. This raises concerns about the validity of such comparisons between models.
    • There is skepticism about the training claims made by Fireworks regarding Kimi K2.5. Workshop Labs, known for optimizing training code, reported that Fireworks’ code is not optimized for hyperscale training, being only marginally better than basic implementations like HF Transformers 4.x. This suggests potential inefficiencies in Fireworks’ approach to training Kimi K2.5.
    • The assertion that Kimi K2.5 is the best ‘base model’ is attributed to its large parameter count and use of a standard attention mechanism rather than a linear one. This implies that the model’s architecture and scale contribute significantly to its performance, rather than any novel training techniques.
  • China’s open-source dominance threatens US AI lead, US advisory body warns (Activity: 922): A US advisory body has raised concerns about China’s growing influence in the open-source AI sector, suggesting it could threaten the US’s leadership in AI. The report highlights China’s strategic investments and advancements in open-source AI models, which are becoming increasingly competitive with US counterparts. The advisory body suggests that the US needs to bolster its open-source initiatives to maintain its competitive edge. Commenters argue that the US is lagging in open-source AI, with Chinese models being more cost-effective and efficient. There is also criticism of US models like Opus, GPT-5.4, and Gemini 3.1 Pro for their perceived dysfunctionality, contrasting with China’s contributions to AI freedom despite its authoritarian regime.

    • EffectiveCeilingFan highlights the competitive edge of Chinese AI models, noting that they are not only cheaper but also outperform US models in open weights. The commenter criticizes the performance of US models like Opus, GPT-5.4, and Gemini 3.1 Pro, suggesting that the US is lagging in terms of open-source AI development.
    • Lissanro emphasizes the importance of open research in AI development, citing the ‘Attention is All You Need’ paper as foundational. They mention that models like Kimi K2.5 owe their existence to open research shared by companies like DeepSeek. The comment also notes that large companies, such as Cursor AI, are adopting Chinese models like Kimi K2.5 for their products, indicating a preference for these open-source models in the industry.
    • Global_Estimate7021 provides a detailed analysis of why the US might be falling behind in AI, citing a significant AI acceptance gap (87% in China vs. 32% in the US) and the volume of AI research publications where China leads. They also mention the strategic advantage of China’s cheaper electricity and grassroots AI literacy initiatives, which contrast with the US’s top-down approach.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. AGI Achievements and Claims

  • The man who originally coined the acronym “AGI” now says that we’ve achieved it exactly as he envisioned. (Activity: 926): The image is a tweet by Mark Gubrud, who claims to have coined the term “AGI” (Artificial General Intelligence). He asserts that AGI has been achieved as he envisioned, with current models performing at a high-human level in language and general knowledge, while being much faster. However, there is debate about the originality of his claim, as the term “artificial general intelligence” is documented as early as 1989, attributed to G. Simons. Gubrud’s definition of AGI involves systems that match or surpass human brain complexity and speed, capable of reasoning with general knowledge in various operations. There is skepticism in the comments about Gubrud’s claim to have coined the term “AGI,” with some suggesting he misremembers the history. The Oxford English Dictionary attributes the earliest use of the term to 1989, in the writings of G. Simons, not Gubrud.

    • The term ‘artificial general intelligence’ (AGI) is documented as early as 1989, with the Oxford English Dictionary citing G. Simons as the earliest source. However, M. Gubrud is often credited with popularizing it in scientific literature, though he did not coin the term himself.
    • The original definition of AGI by its coiner describes it as systems that match or surpass human brain capabilities in complexity and speed, capable of handling general knowledge across various domains, including industrial and military operations. This definition suggests a broad and versatile intelligence, though there is skepticism about whether current systems meet this standard.
    • There is debate about the significance of achieving AGI without recursive self-improvement, which was expected to trigger a technological singularity. The lack of such transformative advancements leads to skepticism about the current excitement surrounding AGI developments.
  • Jensen Huang (NVIDIA) claims AGI has been achieved (Activity: 2562): In a recent interview, Jensen Huang, CEO of NVIDIA, claimed that Artificial General Intelligence (AGI) has been achieved, a statement that has sparked significant debate. The interview, available on YouTube, lacks detailed technical evidence to support this claim, leading to skepticism among experts. Huang’s assertion is seen as potentially influenced by his role in promoting NVIDIA’s products, which are heavily invested in AI technologies. The top comments reflect skepticism towards Huang’s claim, highlighting a distrust in business leaders’ statements about their own products. Commenters suggest that such claims may be more about marketing than factual advancements in AI.

    • Sweaty_Rub4322 highlights a critical issue in the AGI debate: the lack of a universally accepted definition of AGI. This ambiguity complicates discussions and assessments of whether AGI has been achieved, as both academia and industry struggle to agree on what constitutes AGI. This underscores the need for a clear, standardized definition to facilitate meaningful progress and evaluation in the field.

2. Claude Code Features and Updates

  • Claude can now use your computer (Activity: 2106): Claude, an AI developed by Anthropic, is now capable of using your computer to perform tasks via Claude Cowork and Claude Code. This feature, currently in research preview, allows Claude to open applications, navigate browsers, and manage spreadsheets, effectively automating tasks typically done manually. It prioritizes using connected apps like Slack and Calendar, but can also directly interact with apps on your screen with permission. This functionality is available on Pro and Max tiers for macOS users, requiring an updated desktop app paired with a mobile device. More details can be found here. Concerns were raised about the security implications of allowing an AI to control a computer, with some users expressing apprehension about potential job displacement. Others noted this as a strategic move by Anthropic in response to competitors like OpenAI.

    • A key concern raised is about security implications of allowing Claude to access a user’s computer. This involves potential risks such as unauthorized data access or manipulation, which could be exploited if not properly secured. The rapid pace of feature releases may exacerbate these concerns, as new functionalities might not be thoroughly vetted for vulnerabilities before deployment.
    • The introduction of Claude’s ability to use a computer is seen as a competitive response to OpenAI’s advancements, particularly in the context of AI models like GPT-4. This move by Anthropic could be aimed at maintaining parity or gaining an edge in the AI capabilities race, highlighting the competitive dynamics in the AI industry.
    • There is a sentiment that the rapid development and release of new features by Claude could lead to job displacement. As AI models become more capable of performing complex tasks traditionally done by humans, there is a growing concern about the impact on employment, especially in sectors heavily reliant on routine cognitive tasks.
  • Claude Code can now /dream (Activity: 1953): Claude Code has introduced a feature called Auto Dream, designed to enhance the agent’s memory management by mimicking human REM sleep processes. This feature reviews past session transcripts, identifies relevant information, prunes outdated or contradictory data, and consolidates it into organized files. It operates in the background, triggering after 24 hours and five sessions since the last consolidation, and ensures no conflicts by using a lock file. This approach aims to improve performance by managing memory more intelligently, rather than just expanding context windows. Some commenters express skepticism about the feature, suggesting it might lead to unnecessary token usage and questioning the AI’s self-promotion style. Others humorously suggest additional commands to manage AI hallucinations and errors.

    • AutoDream is a feature for Claude Code that acts like a ‘sleep cycle’ for its memory system, addressing the memory bloat issue introduced by the Auto Memory feature. Auto Memory, released in v2.1.59, allows Claude to take notes on projects, but over time, these notes can accumulate noise and contradictions, degrading performance. AutoDream mitigates this by periodically consolidating memories, similar to human REM sleep, through a four-phase process: Orient, Gather signal, Consolidate, and Prune & index.
    • The AutoDream process involves four phases: Orient, which scans existing memory to understand stored data; Gather signal, which identifies outdated memories and performs targeted searches; Consolidate, which merges new information and resolves contradictions; and Prune & index, which maintains a concise index and removes stale data. This process only triggers after 24+ hours and 5+ sessions since the last consolidation, ensuring it doesn’t interfere with active work.
    • AutoDream operates read-only on project code, modifying only memory files and not the actual codebase. This ensures safety and integrity of the code while managing memory efficiently. The full system prompt for this feature is available on GitHub under agent-prompt-dream-memory-consolidation.md, providing transparency and allowing users to understand its operation.

3. Sora Shutdown Announcements

  • Sora is officially shutting down. (Activity: 854): The image is a screenshot of an announcement from the Sora app’s official account on X.com, stating that Sora is shutting down. The message thanks users for their engagement and promises more details on the shutdown timeline for the app and API. This indicates a significant change in the app’s lifecycle, likely due to strategic shifts or financial unsustainability, as suggested by comments noting high costs and low engagement. Comments suggest that Sora’s shutdown is due to its unsustainable business model, particularly after changes to copyright handling that increased costs and reduced user engagement. The app was initially innovative but became a liability.

    • Chasemania highlights the unsustainable nature of Sora, pointing out that the product faced high operational costs and low user engagement. The attempt to respect copyright laws excessively led to a decline in user interest, turning the platform into a liability rather than an asset.
    • The discussion touches on the challenges of balancing copyright compliance with user engagement. Sora’s initial appeal was overshadowed by its inability to maintain user interest while adhering to strict copyright regulations, which ultimately contributed to its downfall.
    • The comments reflect on Sora’s initial success and subsequent decline, emphasizing the difficulty in sustaining a platform that requires high operational costs and strict adherence to copyright laws, which can deter user engagement and lead to financial instability.
  • Sora is officially shutting down. (Activity: 1429): The image is a social media announcement from the Sora team about the shutdown of the Sora app. The post expresses gratitude to the community and promises to provide more details soon regarding the app’s and API’s timelines and how users can preserve their work. This indicates a planned and structured shutdown process, aiming to minimize disruption for users. Comments reflect skepticism about the app’s impact and user base, with some users expressing surprise at the app’s longevity given its perceived lack of financial viability.

AI Discords

Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.