a quiet day.

AI News for 3/23/2026-3/24/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

ARC-AGI-3 Launch, Scoring Debate, and What It Claims to Measure

ARC-AGI-3 resets the frontier for “general” agentic reasoning: @arcprize and @fchollet introduced ARC-AGI-3, a new interactive benchmark built around puzzle/game-like environments where humans reportedly solve 100% of tasks while current frontier models score under 1%. Chollet framed the benchmark as measuring whether a system can approach new tasks without human intervention, with human-like learning efficiency, rather than excelling via task-specific harnesses or prior exposure (1, 2, 3). The project also shipped with substantial productization around the eval itself, including a replay system for verified scores highlighted by @mikeknoop.
The immediate controversy is the scoring protocol, not the core task design: a large share of technical discussion focused on ARC-AGI-3’s efficiency-based scoring, which compares agents against the second-best human action count and heavily penalizes extra steps. @scaling01 argued this makes the headline “<1%” difficult to compare with prior ARC versions and potentially harsher than a plain completion metric; related threads criticized the cap on superhuman efficiency and the exclusion of richer agent harnesses or longer-thinking modes (1, 2, 3, 4). Chollet responded that this is intentional: the benchmark is explicitly about zero-preparation generalization, not how well humans can custom-build systems around a task (1, 2). A useful outside critique came from @_rockt, who pushed back on claims that ARC-AGI-3 is the only unsaturated agent benchmark, citing NetHack.
Early read from the community: even critics generally seem to agree the benchmark surfaces a real weakness of current LLM agents in interactive, sparse-feedback environments. Supportive takes came from @mark_k, @andykonwinski, and @bradenjhancock; more skeptical-but-positive reactions came from @jeremyphoward and @togelius, who distinguished “general game playing” from the overloaded notion of AGI.

Agent Infrastructure, Harnesses, and Enterprise Productization

The agent stack is getting more opinionated and more deployable: several launches converged on the same theme: teams are packaging reusable skills, harnesses, and sandboxes as first-class product primitives. @LangChain launched Fleet shareable skills, a registry for codifying reusable domain knowledge across agents, with related commentary from @BraceSproul, @hwchase17, and @caspar_br. @AnthropicAI published how Claude Code auto mode works, describing classifier-mediated approval as a middle ground between full manual confirmations and unconstrained autonomy; @_catwu noted the feature is now broadly used internally and available to Team users.
Browser, coding, and workflow agents are becoming trainable systems rather than prompt wrappers: @browserbase partnered with Prime Intellect to let users train custom browser agents on BrowserEnv, with a follow-up from @PrimeIntellect and support for BrowserEnv inside verifiers from @willccbb. @cursor_ai launched self-hosted cloud agents, keeping execution and code inside a customer’s own network. @imbue_ai introduced Keystone, a self-configuring agent that generates dev containers for arbitrary repos; @SierraPlatform launched Ghostwriter, an “agent for building agents” for customer experience flows spanning chat, telephony, multilingual interaction, tool use, and guardrails.
The “agent = app” thesis is increasingly infrastructure-backed: multiple posts described agents as software entrypoints rather than mere assistants. @Base44 emphasized event-driven app behavior across Gmail/Calendar/Drive/Outlook. @weaviate_io shipped Agent Skills so coding agents can use current Weaviate APIs instead of hallucinating outdated syntax. @ben_burtenshaw showed a practical pattern for giving Codex/Claude a shared persistent workspace backed by Hugging Face buckets. A more strategic framing came from @gneubig, who argued there is now a genuine co-dependence between LLMs as infra and agent harnesses as apps, analogous to the earlier hardware/architecture coupling.

Model and Research Releases: Multimodality, World Models, and Self-Improvement

Google expanded Lyria 3 into a fuller music-generation platform: @Google, @GoogleDeepMind, and @GeminiApp announced Lyria 3 Pro, which extends generation from 30 seconds to up to 3 minutes, adds better control over song structure like intros/verses/choruses/bridges, and is available both in Gemini and via Google AI Studio / Gemini API. @_philschmid summarized pricing as $0.08/song for Pro and $0.04/song for Clip, with tempo control, time-aligned lyrics, image-to-music input, and SynthID watermarking.
LongCat-Next is a notable open multimodal release from Meituan: @Meituan_LongCat introduced LongCat-Next, a 68.5B total / 3B active MoE discrete-native autoregressive multimodal model covering language, vision, and audio in a unified token space. The release emphasizes native discrete multimodality, an any-resolution vision tokenizer (dNaViT), OCR/GUI/document understanding, image generation, and speech understanding/synthesis. Independently, @teortaxesTex highlighted the report’s architectural ideas around a unified latent/token pathway even while sounding less impressed by its image-generation quality.
World models and self-improving agents were the day’s standout research themes: @BrianRoemmele highlighted LeWorldModel, a compact JEPA-style world model trained from raw pixels with just two loss terms, reportedly using 15M parameters, a single GPU, and yielding much faster latent-space planning; the claimed simplification is that SIGReg stabilizes training without the usual JEPA hack stack. On the agent side, @omarsar0 and @fancylancer3991 surfaced Hyperagents, where the self-improvement process itself becomes editable; reported gains included paper review accuracy from 0.0 to 0.710 and robotics reward design from 0.060 to 0.372. Related memory work came from @dair_ai on MemCollab, which tries to separate universal task knowledge from model-specific biases for cross-agent memory sharing.
Sakana AI’s “AI Scientist” reached a publication milestone: @SakanaAILabs, @hardmaru, and @jeffclune noted that The AI Scientist is now published in Nature, consolidating the earlier system and v2 updates. The notable claim is not just end-to-end automation of idea generation, experimentation, drafting, and automated review, but evidence for a “scaling law of science”: stronger underlying foundation models produce better machine-generated papers.

Inference, Storage, and Local Hardware Economics

Storage and artifact movement are getting cheaper and more agent-friendly: @fffiloni teased Hugging Face’s storage push with “Your disk is no longer the limit,” while @LoubnaBenAllal1 and @victormustar compared HF Buckets favorably to S3 on both $/TB/month and transfer performance, citing Xet-style chunk-level deduplication as a meaningful win for datasets and checkpoints. The operational subtext showed up in @francoisfleuret, asking cluster operators how hard agents are hitting I/O.
Inference efficiency remains a fast-moving battleground across runtimes and architectures: @sudoingX reported unusually strong single-GPU long-context throughput from NVIDIA’s 3B Mamba2 Nemotron Cascade 2, claiming 187 tok/s flat out to 625K context on an RTX 3090, versus 112 tok/s for Qwen 3.5 35B-A3B to 262K with KV quantization. @finbarrtimbers noted Cursor’s Composer 2 report used Fireworks for RL inference due to a large efficiency gap over typical stacks like SGLang/TRT; @GoogleCloudTech published optimization guidance for frontier training on TPU v7x / Ironwood. On the quantization/compression side, @mirrokni flagged Google’s TurboQuant writeup with 6x speedups, while @vllm_project highlighted 4M+ KV-cache tokens on compact hardware.
Local AI hardware got two attention-grabbing data points: @digitalix spotlighted Intel’s new Arc Pro B70 with 32GB VRAM for under $1000, which several posters framed as a potentially important VRAM-per-dollar move despite software-stack caveats (example). Separately, @xenovacom demoed a 24B model in-browser via WebGPU/Transformers.js at roughly 50 tok/s on an M4 Max, a striking signal for how quickly browser-side inference ceilings are moving.

Top tweets (by engagement)

Personalization and memory quality: @karpathy argued that long-lived memory in assistants often overfits stale user facts, causing distracting, low-quality personalization rather than better assistance.
Claude-as-super-app narrative: @kimmonismus and @Yuchenj_UW both pointed to Anthropic’s product trajectory as increasingly resembling a super-app rather than a narrow model endpoint.
Codex ecosystem activity: @OpenAIDevs launched a student Codex Creator Challenge with API-credit prizes and starter credits; @reach_vb also reminded developers that the Codex App Server is open source.
Sora de-emphasis as strategic refocusing: while much of the chatter was secondhand, multiple roundup and commentary posts suggested OpenAI is winding down Sora to prioritize coding/agent products and core infra, with @TheRundownAI and @thursdai_pod treating it as one of the day’s major industry signals.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Intel GPU Launch and Features

Intel will sell a cheap GPU with 32GB VRAM next week (Activity: 1300): Intel is set to release a new GPU with 32GB VRAM on March 31, priced at $949. The GPU offers a bandwidth of 608 GB/s and a power consumption of 290W, positioning it slightly below the NVIDIA 5070 in terms of bandwidth. This GPU is anticipated to be beneficial for local AI applications, particularly for models like Qwen 3.5 27B at 4-bit quantization. More details can be found in PCMag’s article. Commenters express skepticism about the price being considered ‘cheap’ at $989, while others compare it to the R9700 AI PRO, noting similar VRAM and bandwidth but with slightly higher power consumption. There is curiosity about how Intel’s offering will compete, especially for AI and LLM applications.
- Clayrone discusses their experience with the R9700 AI PRO, highlighting its 32GB VRAM and 640 GB/s bandwidth, which they find satisfactory for their small form factor server build. They mention using llama.cpp built for Vulkan, which operates flawlessly, and note the GPU’s 300W power consumption. They express curiosity about how Intel’s upcoming GPU will compare, suggesting it could be a direct competitor.
- KnownPride suggests that Intel’s decision to release a GPU with 32GB VRAM is strategic, as it caters to the growing demand for large language models (LLMs). This indicates a market trend where consumers are increasingly interested in hardware capable of supporting AI and machine learning workloads, which require substantial VRAM.
- wsxedcrf references a statement by NVIDIA, “Free is not cheap enough,” to emphasize that the value of a GPU is not just in its price but in the entire ecosystem it supports. This suggests that Intel’s success with their new GPU will depend on more than just hardware specifications; the surrounding software and support infrastructure will be crucial.
Intel launches Arc Pro B70 and B65 with 32GB GDDR6 (Activity: 493): Intel has launched the Arc Pro B70 and B65 GPUs, featuring 32GB GDDR6 memory. The B70 is priced at $949 and offers 387 int8 TOPS with a memory bandwidth of 602 GB/s, compared to the NVIDIA RTX 4000 PRO’s 1290 int8 TOPS and 672 GB/s. The B70’s power draw is 290W, higher than the RTX 4000’s 180W. A 4-pack of B70s costs $4,000, offering 128GB of GPU memory, which is considered a competitive deal for local inference on 70B models. Source. Commenters highlight the collaboration between Intel and vLLM to integrate B-series support into mainline vLLM, ensuring day-one support and solid performance. The price point of $949 for 32GB is seen as favorable for local inference, making it practical for 70B models.
- Intel’s collaboration with vLLM to integrate B-series support into mainline vLLM ensures that the Arc Pro B70 and B65 GPUs will have day-one support with solid performance. However, the B70’s performance lags behind the RTX 4000 PRO, achieving 387 int8 TOPS compared to the 4k PRO’s 1290. The B70 offers 602 GB/s memory bandwidth versus the 4k’s 672 GB/s, and while it has more VRAM (32GB vs. 24GB), it also has a higher power draw (290W vs. 180W).
- The Arc Pro B70 is priced at $949, making it an attractive option for local inference, especially for 70B models, due to its price-per-GB advantage. This positions it as a practical choice for those needing substantial memory capacity without the higher costs associated with other GPUs like the RTX 3090.
- Despite the Arc Pro B70’s slower inference speed compared to the RTX 3090 and lack of CUDA support, it offers more memory and improved efficiency, which can enhance prompt processing. However, users express concerns about Intel’s driver support, which could impact the overall user experience.

2. LiteLLM Supply Chain Attack and Alternatives

After the supply chain attack, here are some litellm alternatives (Activity: 372): The image is a tweet by Andrej Karpathy discussing a supply chain attack on the Python package litellm, which was compromised with credential-stealing malware in versions 1.82.7 and 1.82.8. The attack highlights the risks associated with dependency management in software development, as the compromised package could have exfiltrated sensitive data like SSH keys and database passwords. The post suggests alternatives to litellm, such as Bifrost, Kosong, and Helicone, each offering different features and performance benefits, such as Bifrost’s ~50x faster P99 latency compared to litellm and Helicone’s extensive provider support and analytics capabilities. Commenters express concerns about the risks of large dependency trees in Python and Node.js projects, suggesting that these can lead to vulnerabilities and reliability issues. They recommend practices like restricting network access, pinning dependencies, and monitoring network traffic to mitigate risks associated with supply chain attacks.
- FullstackSensei highlights the issue of large dependency trees in Python and Node.js projects, noting that even small projects can have gigabytes of dependencies. This complexity often leads to infrequent updates due to fear of introducing bugs, which in turn can create vulnerabilities. The comment suggests a need for more discussion on managing and minimizing dependency chains to improve reliability and security.
- _realpaul discusses strategies to mitigate supply chain attacks, emphasizing the importance of restricting network access, avoiding the immediate adoption of new libraries, and pinning dependencies. They also recommend running tools in a sandbox environment and monitoring network traffic before deployment to enhance security.
- RoomyRoots and Living_Director_1454 both point out the over-reliance on third-party libraries, which increases the risk of supply chain attacks. Living_Director_1454 references a specific incident involving a compromised security scanner, Trivy, used in LiteLLM’s CI/CD pipeline, illustrating the potential vulnerabilities in the software supply chain.
Litellm 1.82.7 and 1.82.8 on PyPI are compromised, do not update! (Activity: 555): The litellm package versions 1.82.7 and 1.82.8 on PyPI have been compromised, as confirmed by FutureSearch.ai. The attack appears to be a supply chain compromise, potentially affecting thousands of users. The breach was discovered by Callum McMahon, who provided a detailed postmortem here. The attack was executed through the GitHub account of the LiteLLM CEO, which was hacked, leading to unauthorized changes in repositories, including a message stating “teampcp owns BerriAI”. This incident highlights the growing risk of supply chain attacks in AI tooling, emphasizing the importance of version pinning and cautious updates in production environments. Commenters emphasize the importance of pinning dependency versions and avoiding automatic updates in production to mitigate risks from supply chain attacks. There is also concern about the potential for automated bots in discussions, as evidenced by repetitive, non-substantive comments.
- The compromised versions of LiteLLM, 1.82.7 and 1.82.8, were reportedly injected with malicious code that executes a destructive command (rm -rf /) if the system’s timezone is set to Asia/Tehran. This highlights the critical risk of supply chain attacks in AI tooling, emphasizing the importance of pinning dependency versions and avoiding automatic updates in production environments.
- The attack appears to have been executed by a group known as ‘teampcp’, who previously compromised Trivy. They gained access through the GitHub account of LiteLLM’s CEO, Krrish Dholakia, and used it to push malware that steals secrets upon LiteLLM startup. This incident underscores the vulnerability of high-profile accounts and the potential for widespread impact when they are compromised.
- The GitHub repositories of LiteLLM’s CEO were altered to display the message ‘teampcp owns BerriAI’, indicating a breach. The CEO’s account was used to make unauthorized commits, suggesting a significant security breach. Users are advised to use versions <= 1.82.6, as these are confirmed to be safe from the malicious code.

3. New AI Model Releases and Benchmarks

New open weights models: GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B (Activity: 624): GigaChat-3.1-Ultra-702B and GigaChat-3.1-Lightning-10B-A1.8B are newly released open-weight models by AI Sage, available under the MIT license on Hugging Face. The Ultra model, a 702B MoE, is optimized for high-resource environments, outperforming models like DeepSeek-V3-0324 and Qwen3-235B in benchmarks such as MMLU RU and Math 500. The Lightning model, a 10B A1.8B MoE, targets local inference, achieving high efficiency with native FP8 DPO and MTP support, and excels in multilingual tasks across 14 languages. Both models are optimized for English and Russian, with the Lightning model scoring 0.76 on the BFCLv3 benchmark. Detailed metrics show significant improvements in general knowledge, math, and coding domains compared to previous versions and competitors. Comments highlight geopolitical concerns, noting the models’ development in Russia with potential state influence on training data, and the implications of using infrastructure under Russian jurisdiction, which may be subject to local intelligence access.
- Specialist-Heat-6414 highlights the technical significance of the GigaChat-3.1-Ultra-702B model, noting that a 702B MoE (Mixture of Experts) model under an MIT license is a substantial addition to the open weights ecosystem. This contribution is noteworthy regardless of the geopolitical context surrounding its development.
- The Qwen comparison is a focal point, with users suggesting that benchmarks against models like Qwen 3.5 are necessary to establish the GigaChat models’ relevance. The comment suggests that simply being ‘better than GPT-3.5’ is not a sufficient benchmark in 2026, indicating the need for more rigorous evaluation metrics.
- Investolas and others express interest in the GigaChat-3.1-Lightning-10B-A1.8B model, particularly its potential for local inference. If the model’s active parameter count is around 1.8B and it can achieve 250+ tokens per second on a single GPU while maintaining quality, it could be practical for use on commodity hardware, making it a significant development in the field.
DeepSeek Employee Teases “Massive” New Model Surpassing DeepSeek V3.2 (Activity: 427): A purported leak from a DeepSeek employee suggested the development of a new model surpassing the capabilities of DeepSeek V3.2. The leak, which was quickly deleted, hinted at a significant advancement in model architecture, potentially involving integrations with platforms like SillyTavern, MiniMax, ZAI, and Moonshot. However, the authenticity of this leak was later debunked as a fake, as confirmed by a tweet. Commenters expressed a desire for DeepSeek to balance the timing of their releases amidst aggressive competition, and some hoped for smaller, efficient versions of the new model. There was also surprise at the mention of using multiple platforms, indicating a broad integration strategy.
- TheRealMasonMac highlights the use of multiple AI platforms by DeepSeek, including SillyTavern, MiniMax, ZAI, and Moonshot, suggesting a broad integration strategy that could enhance innovation. This indicates DeepSeek’s approach to leveraging diverse AI technologies to potentially improve their models’ capabilities.
- ambient_temp_xeno expresses concern about the potential resource requirements of the new model, implying that it might be too demanding for personal use. This reflects a common issue in AI development where newer models often require more computational power, limiting accessibility for individual users.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Sora Shutdown and Impact

OPENAI TO DISCONTINUE SORA !! (Activity: 2452): OpenAI is set to discontinue its Sora Video Platform App, which was launched the previous year. The app allowed users to insert themselves into famous movie scenes, but it was criticized for being overly restrictive and not user-friendly. Financially, it was unsustainable, reportedly losing $500k per day. The decision reflects broader concerns about resource allocation in AI projects, emphasizing the need for careful consideration of the value and impact of such technologies. Commenters largely agree that Sora was a resource-intensive project with limited practical value, highlighting the importance of evaluating the resource costs versus benefits in AI development.
- TheTeflonDude highlights a significant financial issue, noting that OpenAI was losing $500k per day on Sora, which likely contributed to the decision to discontinue the service. This underscores the high operational costs associated with maintaining such a platform, especially when it doesn’t generate sufficient revenue or user engagement to justify the expenses.
- Willing_Leave_2566 discusses the broader implications of low-effort content creation enabled by platforms like Sora. They argue that without an effort barrier, users may not consider the resource costs of their creations, leading to inefficient use of compute resources. This reflects a critical perspective on the sustainability and value of open creative platforms.
- Pakh provides context on the strategic shift by referencing a previous collaboration between Disney and OpenAI, where Disney invested $1 billion and licensed over 200 characters for Sora. This partnership was expected to enhance Sora’s appeal for fan-made content, making the discontinuation surprising and indicative of a major strategic pivot by OpenAI.
Sora is officially shutting down. (Activity: 1954): The image is a screenshot from the Sora app’s official account on X.com, announcing the shutdown of the Sora app. The announcement expresses gratitude to the users and mentions that more information regarding the app and API timelines will be provided soon. This indicates a significant change for users and developers who relied on Sora’s services, as they will need to transition to alternative solutions. The comments reflect a mix of humor and criticism, with one user sarcastically noting the app’s comedic value and another expressing concern over the loss of functionality for generating controversial content.
SORA IS SHUTTING DOWN??? (Activity: 1234): OpenAI has announced the shutdown of Sora, its video generation app and API, despite its recent popularity as the #1 app on the App Store. This decision comes unexpectedly, especially after a recent blog post on Sora’s safety standards. The shutdown is reportedly to reallocate compute resources towards coding and enterprise applications, possibly influenced by Anthropic’s focus on coding over video. This move disrupts a significant partnership with Disney, which included collaborations with Marvel, Pixar, and Star Wars. The AI video space is expected to experience a shift as creators migrate to other platforms like Runway and Kling. Some commenters argue that Sora’s shutdown was inevitable due to its poor performance and high costs, suggesting it was not widely used by serious AI video creators. Others express surprise at the sudden decision, noting the app’s previous prominence.
- echox1000 highlights that Sora was a financial drain due to its high compute costs and poor performance, suggesting that its shutdown was inevitable. The commenter expresses surprise that the project was maintained for as long as it was, indicating that its results were subpar compared to expectations.
- bronfmanhigh points out that Sora was not competitive in the AI video creation space, as no legitimate creators were using it. This suggests that Sora lagged significantly behind other tools in terms of functionality and adoption, which may have contributed to its shutdown.
- KnightAirant criticizes the lack of open-sourcing Sora, implying that the ‘Open’ in OpenAI is misleading. The comment reflects a sentiment that the project was short-lived, lasting less than a year, and questions the transparency and accessibility of AI projects from major companies.
No more Sora ..? (Activity: 1061): The image is a tweet from the official Sora account announcing the discontinuation of the Sora app. The tweet expresses gratitude to the community and acknowledges potential disappointment, while promising further updates on timelines for the app and API, and information on preserving users’ work. This suggests a significant shift for users relying on Sora, potentially impacting workflows that depend on its services. Comments reflect a sentiment that local solutions are more reliable, as centralized services like Sora can be discontinued. There’s also a call for open-sourcing the app, reflecting a desire for community-driven development and control.
- PwanaZana highlights the challenge of running large AI models locally due to hardware constraints, emphasizing the need for smaller, efficient models that can operate on less powerful machines. This reflects a broader trend towards optimizing AI for local deployment, balancing performance with accessibility.
- Sudden-Complaint7037 points out a growing skepticism among investors regarding the profitability of AI, suggesting a shift in the industry as companies reconsider their investments. This indicates a potential reevaluation of business models in AI, focusing on sustainable and profitable strategies.
Sora is officially shutting down. (Activity: 2831): The image is a screenshot of an announcement from the Sora app’s official account on X.com, stating that the app is shutting down. The message thanks users for their contributions and mentions that further details about the shutdown timeline for the app and its API will be provided soon. This indicates a significant change for users and developers who relied on Sora’s services. Comments reflect skepticism about the app’s impact and user base, with some users expressing surprise at the app’s longevity given its financial challenges.

2. Claude Code Features and Issues

Claude Code now has auto mode (Activity: 962): Claude Code has introduced an ‘auto mode’ feature that automates permission decisions for file writes and bash commands, replacing the need for manual approval or the use of --dangerously-skip-permissions. This mode employs a classifier to evaluate each tool call for potentially destructive actions, allowing safe actions to proceed automatically while blocking risky ones. This feature is currently available as a research preview on the Team plan, with broader access for Enterprise and API users forthcoming. More details can be found here. There is a significant user concern regarding reduced usage limits, with reports of session limits being reached much faster than before, despite no official communication from Anthropic. Users are expressing frustration over the lack of transparency and communication regarding these changes.
- Users are experiencing significant issues with usage limits on Claude Code, with reports of session limits being reached much faster than before. A user on the Max 5x plan noted that they used 50% of their weekly limit in a single day, suggesting a possible change in policy or a bug. The lack of communication from Anthropic is causing frustration among users who rely on the service for their work.
- The new auto mode in Claude Code employs a classifier-before-execution approach to enhance safety by defaulting to isolation methods like containers or VMs. However, there are concerns about how well the classifier handles ambiguous commands, such as differentiating between rm -rf in a temporary directory versus a project root. Users suggest that an auto mode that provides explanations for blocked actions would be more beneficial than silent fallbacks.
- There is a call for Anthropic to address rate limit issues before focusing on new features like auto mode. Users are concerned that the current rate limits could severely restrict the use of new functionalities, as evidenced by recent experiences where users hit their limits much faster than expected.
Saying ‘hey’ cost me 22% of my usage limits (Activity: 883): The Reddit post highlights a significant issue with Claude Code where revisiting inactive sessions leads to a substantial increase in usage limits, reportedly up to 22% for a simple message. This is attributed to the system’s caching mechanism, where each message resends the entire conversation context, including system prompts and conversation history, to the API. The cache, which has a TTL of 5 minutes on Pro and 1 hour on Max plans, expires when sessions are left open overnight, causing a full cache write on resumption, which is 1.25x more expensive than regular input. Additionally, the usage tracking uses 5-hour rolling windows, potentially causing accumulated context from old sessions to be charged against new windows, leading to unexpected usage spikes. A GitHub issue also notes increased usage for the same workloads since March 23rd, with no official response from Anthropic yet. Commenters suggest that the issue is known and worsening, with some attributing it to Claude’s retry mechanism during system issues. The recommended workaround is to start fresh sessions or use /clear and /compact commands to manage conversation history and avoid excessive token consumption.
- Fearless_Secret_5989 explains that Claude Code’s architecture involves resending the entire conversation context with each message, which includes system prompts, tool definitions, and conversation history. This can lead to high token usage, especially when session caches expire (5 minutes on Pro, 1 hour on Max plans), causing a full cache write that is 1.25x more expensive than regular input. A GitHub trace showed 92% of tokens in a resumed session were cache reads, consuming 192K tokens for minimal output.
- Fearless_Secret_5989 also highlights a rate limit window boundary issue where Claude Code uses 5-hour rolling windows for usage tracking. If a session started in one window resumes in another, the accumulated context from the old session can be charged against the new window, leading to high usage spikes. Users have reported up to 60% usage consumed instantly due to this rollover, with no new work done.
- Fearless_Secret_5989 mentions a potential bug or backend change affecting Max plan users since March 23rd, where workloads that previously consumed 20-30% of a window now take 80-100%. Users on Max 5x and Max 20x plans report hitting limits rapidly, with one user going from 21% to 100% on a single prompt. Anthropic has not officially responded, leaving the cause unclear.
Claude Code Limits Were Silently Reduced and It’s MUCH Worse (Activity: 1229): Users of Claude Code are reporting a significant and unannounced reduction in usage limits, with some describing it as a “hundredfold” decrease. This change has been particularly noticeable for users working on simple projects in PHP and JavaScript, who are now hitting limits much faster than before. The lack of transparency from the developers has led to frustration, as users feel uninformed about the changes and how to adapt to them. Some users speculate that the reduction might be a bug, while others suggest it could be a strategic move to disguise a quota reduction. One theory is that a temporary increase followed by a drastic cut could obscure a permanent reduction, leaving users confused about the actual limits.
- -becausereasons- highlights a significant reduction in Claude’s code limits, suggesting it might be a bug due to the drastic nature of the change, described as a ‘hundredfold’ decrease. This indicates a potential issue in the system that needs addressing.
- zirouk presents a theory on how companies might obscure quota reductions by manipulating user perceptions. They suggest a strategy where a temporary increase is followed by a significant reduction, then a partial restoration, effectively achieving a net reduction without users realizing the full extent of the change.
- Dry-Magician1415 criticizes the lack of transparency in LLM usage limits, comparing it to more quantifiable industries like telecoms. They argue that without clear quantification and auditing, companies can arbitrarily adjust limits, leading to user dissatisfaction and mistrust.
Claude Code can now /dream (Activity: 2731): Claude Code’s new feature, Auto Dream, addresses the issue of memory bloat caused by the Auto Memory feature. Auto Dream mimics human REM sleep by reviewing past session transcripts, identifying relevant information, and pruning stale or contradictory memories. It consolidates this information into organized files, replacing vague references with actual dates. This process runs in the background, triggered after 24 hours and 5 sessions since the last consolidation, and operates read-only on project code while modifying memory files. This approach is likened to a garbage collector and defragmenter for AI memory, enhancing memory management beyond just expanding context windows. Some commenters humorously suggest additional features like ‘/acid’ for handling hallucinations and ‘/shit’ for cleanup. Another commenter notes the lack of an official announcement from Anthropic, pointing to a YouTube explanation by Ray Amjad.
- AutoDream is a new feature for Claude Code that acts as a ‘sleep cycle’ for its memory system, addressing the memory bloat issue introduced by the Auto Memory feature. AutoDream operates in four phases: Orient, Gather signal, Consolidate, and Prune & index. It consolidates memories by scanning existing memory, identifying drifted memories, merging new information, and removing contradictions, much like human REM sleep. This process only modifies memory files and not the actual codebase, ensuring safety.
- The AutoDream feature is designed to optimize Claude Code’s memory management by periodically consolidating and organizing stored information. It runs only after 24+ hours and 5+ sessions since the last consolidation, ensuring it doesn’t interfere with ongoing work. The process involves scanning memory directories, identifying outdated or contradictory information, and updating memory files to maintain a concise and accurate index, akin to a garbage collector for AI memory.
- The AutoDream system prompt is available on GitHub under the repository Piebald-AI/claude-code-system-prompts, specifically in the file agent-prompt-dream-memory-consolidation.md. This feature is accessible in Claude Code via the /memory command, providing users with a tool to manage AI memory effectively, addressing the context window problem by acting as a defragmenter for AI memory.
Claude can now control your mouse and keyboard. I tested it for a day — heres what actually works. (Activity: 184): Claude’s new Computer Use feature allows it to control a Mac’s mouse and keyboard, performing tasks like file management, spreadsheet data entry, and browser form filling. It operates by taking screenshots to understand the screen context, but it requires the user to step away as it takes over the entire machine. The feature is currently in a research preview for Pro/Max plans and shows 80% reliability on simple tasks and 50% on complex ones. However, it struggles with tasks requiring speed, captchas, 2FA, and complex interactions. The feature’s potential lies in automating tasks while the user is away, as demonstrated by combining it with remote phone commands via Dispatch. More details can be found in the full breakdown. Commenters express skepticism about the security and reliability of Claude’s control over a computer, with concerns about captchas and the potential for misuse. There’s also a humorous comparison to human-powered ‘AI’ farms, highlighting doubts about the technology’s autonomy.
- A user mentioned using Claude to automate testing in their app development workflow. They plan to push a new build and have Claude test the changes, provide feedback, and fix any issues it finds. This highlights the potential for AI to streamline software development processes by automating repetitive tasks and improving efficiency.
- There is a concern about security and privacy, as one commenter humorously suggests the possibility of a random person gaining control of their PC. This reflects broader apprehensions about AI systems with control over hardware, emphasizing the need for robust security measures to prevent unauthorized access.
- Another commenter humorously notes that Claude cannot bypass CAPTCHAs, which are designed to differentiate humans from bots. This limitation underscores the challenges AI faces in tasks requiring human-like perception and decision-making, despite advancements in other areas.

3. AI Model Releases and Benchmarks

ARC AGI 3 is up! Just dropped minutes ago (Activity: 1198): The image depicts the ARC-AGI-3 Leaderboard, which evaluates AI models based on their performance scores against their operational costs. The models shown, including Gemini 3.1 Pro (Preview), Anthropic Opus 4.6 (Max), and Grok 4.20 (Beta Reasoning), are positioned towards the lower end of the graph, indicating relatively low performance scores despite varying costs. This visualization highlights the current state of AI models in achieving AGI, with the ARC Prize marked as a benchmark. The comments reflect skepticism about the progress towards AGI, noting the low score percentages despite significant financial investment. Commenters express skepticism about the current state of AI models achieving AGI, noting the low performance scores relative to the costs involved. One comment highlights the disparity between the perceived progress towards AGI and the actual performance metrics, suggesting that claims of having reached AGI are premature.
- A key point of discussion is the benchmark saturation of AI models, with a specific focus on the ARC AGI 3 achieving only 0.2% improvement despite significant investment ($10K). This raises questions about the diminishing returns on benchmarks and whether AI models are merely optimizing for these tests without genuine improvements in generalization capabilities.
- The mention of GPT-5.4 (High) as a reference point in the benchmark highlights the competitive landscape among top AI models. The comparison suggests that while newer models like ARC AGI 3 are being released, they may not significantly outperform existing models like GPT-5.4, indicating a potential plateau in performance gains.
TheInformation reporting OAI finished pretraining new very strong model “Spud”, Altman notes things moving faster than many expected (Activity: 931): OpenAI has reportedly completed pretraining a new model named “Spud,” which is anticipated to be very strong. This development comes as Sam Altman shifts his focus from OpenAI’s safety and security teams to scaling operations, indicating a strategic reallocation of resources. Additionally, OpenAI is shutting down the Sora video app, suggesting a prioritization of AI model development over other projects. The community is speculating on the potential improvements in OpenAI’s pretrained models, which have previously been criticized despite their strong reinforcement learning capabilities. Some commenters speculate that the announcement of “Spud” might be a strategic narrative to overshadow the shutdown of the Sora app. Others highlight the significance of improving OpenAI’s pretrained models, which have been considered weaker compared to their reinforcement learning strengths.
- Dylan Patel has commented that OpenAI is known for having the best reinforcement learning (RL) capabilities in the industry, but historically, their pretrained models have not been as strong. If OpenAI has indeed improved their pretrained models with the new ‘Spud’ model, it could represent a significant advancement in their AI capabilities.
- A user noted the rapid pace of AI development, referencing the quick succession of updates from Codex 5.3/Opus 4.6 to 5.4, which brought substantial improvements in coding agents and computer usage. The introduction of a new pretrained model, ‘Spud’, within weeks of these updates, highlights the accelerating pace of AI advancements, causing both excitement and nervousness among those closely working in the field.
- The discussion touches on the broader implications of AI advancements, with some expressing concern over the rapid development cycle. The quick release of new models and updates, such as the transition from Codex 5.3/Opus 4.6 to 5.4, and now ‘Spud’, suggests a steepening curve of technological progress that is both fascinating and unsettling for professionals in the AI space.
DeepSeek had a moment, Kimi just had an entire week (Activity: 182): Moonshot AI’s model, Kimi, introduced a novel concept called “Attention Residuals” in a paper on arXiv, proposing a significant change to the architecture of modern LLMs. This approach allows each layer to selectively reference previous layers with learned, input-dependent weights, achieving performance equivalent to 1.25x more compute with less than 2% inference overhead. This innovation has drawn attention from key figures like Elon Musk and Andrej Karpathy, suggesting a potential paradigm shift in deep learning. Additionally, Cursor was found using Kimi’s model under the guise of their own, and MiniMax was caught copying Kimi’s code, indicating Kimi’s growing influence and potential undervaluation in the AI landscape. Some commenters argue that Kimi, while innovative, is not as impactful as DeepSeek’s engram, which is considered more sophisticated. Others believe Kimi excels specifically in context handling, suggesting its strengths may be niche rather than broad.
- BriguePalhaco mentions that Kimi is based on DeepSeek and identifies Qwen as its only serious competitor, suggesting a competitive landscape in AI models where Kimi and Qwen are prominent players.
- Alternative_You3585 highlights that DeepSeek’s engram is significantly more sophisticated than Kimi’s, implying that DeepSeek may have a more advanced architecture or algorithmic approach that sets it apart in terms of technical capabilities.
daVinci-MagiHuman : This new opensource video model beats LTX 2.3 (Activity: 1127): The daVinci-MagiHuman is a new open-source audio-video model with 15 billion parameters, developed by GAIR. It claims to outperform the LTX 2.3 model in terms of speed and performance. The model is available on Hugging Face and GitHub. The model’s full size is approximately 65GB, and it is designed to run efficiently on hardware like the 4070ti GPU, although there are concerns about its performance on scenes with minimal movement, which may not fully demonstrate its capabilities. There is a debate about the validity of benchmarks used to claim model superiority, particularly when using still frames or low-motion scenes. Additionally, there is interest in the model’s practical application, such as redoing complex video projects like Game of Thrones.
- MorganTheFated criticizes the use of still frames or scenes with minimal movement as benchmarks for video models, arguing that they do not accurately represent a model’s performance. This highlights the need for more dynamic and varied testing scenarios to truly evaluate a model’s capabilities.
- intLeon discusses the technical requirements for running the daVinci-MagiHuman model, noting its full size of 65GB and questioning if a 4070ti with 12GB can handle it. They compare it to the fp8 distilled LTX2.3, which takes 5 minutes to process 15 seconds of video at 1024x640 resolution, indicating the computational intensity of these models.
- The elephant in the room points out a significant issue with the daVinci-MagiHuman model: its physical consistency is reportedly worse than LTX2.3, particularly in rendering hands, as observed in samples on its GitHub page. This suggests that while the model may excel in some areas, it struggles with maintaining realistic physical details.

AI Discords

Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.

Mar 24
not much happened today

Companies

Models

Topics

People