a quiet day.

AI News for 3/28/2026-3/30/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!


AI Twitter Recap

Claude Code Computer Use, Codex Interop, and the Coding-Agent Harness Race

  • Claude Code gets computer use: Anthropic added computer use inside Claude Code, letting the agent open apps, click through UIs, and test what it built directly from the CLI in research preview for Pro/Max users. The practical significance is closed-loop verification: code → run → inspect UI → fix → re-test, which several engineers called the missing piece for reliable app iteration, especially compared with open-ended desktop agents (Claude announcement, @Yuchenj_UW on the “eyes” unlock, @omarsar0).
  • Cross-agent composition is becoming standard: OpenAI shipped a Codex plugin for Claude Code that can trigger reviews, adversarial reviews, and “rescue” flows from inside Anthropic’s toolchain, using a ChatGPT subscription rather than custom glue code. This is notable less as a plugin novelty and more as a signal that coding stacks are becoming composable harnesses rather than monolithic products (plugin by @dkundel, usage thread by @reach_vb, open-source note). Separately, OpenAI shared that late-night Codex tasks run longer, with jobs started around 11pm being 60% more likely to run 3+ hours, which fits the emerging pattern of delegating refactors and planning to background agents (OpenAI Devs).
  • Harness quality is now visibly a first-order variable: Theo argued that Opus scores ~20% higher in Cursor than in Claude Code, and more broadly that closed-source harnesses make it hard for the community to diagnose or fix regressions (performance gap claim, closed-source critique). That theme repeated across the feed: model capability deltas are narrowing, while tooling, prompt/runtime orchestration, and review loops still create large practical differences.

Hermes Agent’s Rapid Rise, Multi-Agent Profiles, and the Open Harness Ecosystem

  • Hermes has become the week’s breakout open agent stack: Nous shipped a major Hermes Agent update that drove a wave of migrations from OpenClaw/OpenClaw-like setups, with users emphasizing better compaction, less bloat, stronger adaptability, and faster shipping cadence (Nous release, Teknium’s multi-agent profiles, community migration examples, another). The new multi-agent profiles give each bot its own memory, skills, histories, and gateway connections, moving Hermes from “personal assistant” toward a reusable agent OS abstraction.
  • An ecosystem is forming around traces, remote control, and self-improvement: Several projects extend Hermes beyond core inference. @jayfarei’s opentraces.ai provides a CLI/schema/review flow for sanitizing and publishing agent traces to Hugging Face for analytics, evals, SFT, and RL. @kaiostephens uploaded ~4,000 GLM-5 Hermes traces to HF. @IcarusHermes described an integration where agents log their own decisions, export data, fine-tune smaller successors on their history, and switch over to cheaper models. @winglian’s ARC adds remote browser-based monitoring/control with E2E encryption.
  • Open vs proprietary agent infra is being actively contested: @ClementDelangue explicitly argued that open-source agent tools should default to open-source models, both for privacy and durability. In parallel, vendors are attacking known pain points: @fchollet highlighted PokeeClaw as a more secure OpenClaw-style assistant with sandboxing, approvals, RBAC, and audit trails; Z AI launched AutoClaw, a local OpenClaw runtime with no API key required and optional GLM-5-Turbo.

Qwen3.5-Omni, GLM-5-Turbo/AutoClaw, and the Push Toward Local/Agentic Specialization

  • Qwen3.5-Omni is a major multimodal release: Alibaba introduced Qwen3.5-Omni, with native text/image/audio/video understanding, script-level captioning, built-in web search and function calling, and a standout “audio-visual vibe coding” demo where the model builds websites/games from spoken visual instructions. Reported capabilities include support for 10h audio / 400s of 720p video, 113 speech-recognition languages, and 36 spoken languages; Alibaba claims it outperforms Gemini 3.1 Pro in audio and matches its AV understanding in some settings (launch thread, demo thread, additional demo). A useful caveat from @kimmonismus: “omni” here is about interpreting multimodal inputs, not arbitrary multimodal generation.
  • Z AI continues to tune for agentic workloads: Artificial Analysis evaluated GLM-5-Turbo, Z AI’s proprietary agent-optimized variant. It scored 47 on the AA Intelligence Index, slightly behind open-weight GLM-5 (Reasoning) at 50, but posted 1503 on GDPval-AA, ahead of GLM-5’s 1408, supporting the claim that the model is tuned for real-world agent workflows rather than broad benchmark maximalism.
  • Specialized open models are increasingly the deployment pattern: Several tweets converged on the same thesis: companies will increasingly own and specialize open models on proprietary data rather than rent general-purpose APIs indefinitely (@oneill_c, @ClementDelangue). Supporting evidence ranged from a Qwen3.5-27B model distilled from Claude 4.6 Opus trending on HF for weeks and reportedly fitting on 16GB in 4-bit (Unsloth, @Hesamation) to growing enthusiasm for local runtimes like llama.cpp and MLX.

Local Inference and Systems: llama.cpp at 100k, Flash-MoE on MacBooks, and Web/Serving Toolchains

  • Local AI had a symbolic milestone with llama.cpp hitting 100k GitHub stars: @ggerganov’s reflection framed 2026 as potentially the breakout year for local agentic workflows, arguing that useful automation doesn’t require frontier-scale hosted models and that the right portable runtime stack matters more than absolute scale. The post also emphasized the importance of cross-hardware, non-vendor-locked infra.
  • Flash-MoE on Apple Silicon drew strong attention: A widely shared post claimed Qwen3.5-397B could run on a 48GB MacBook Pro at 4.4 tok/s using a pure C + Metal engine that streams weights from SSD and only loads the active experts, reportedly using ~5.5GB RAM during inference (summary thread). Related work includes anemll-flash-mlx, which focuses on optimizing only the MoE path on top of MLX, and AI Toolkit’s new Apple Silicon support.
  • Web and serving stacks also moved: Transformers.js v4 added a WebGPU backend across browser/Node/Bun/Deno with major perf gains and 200+ architectures. vLLM-Omni v0.18.0 shipped 324 commits, production TTS/omni serving, unified quantization, diffusion runtime refactors, and a dozen-plus new models. On the speech side, Artificial Analysis covered Cohere Transcribe: a 2B conformer encoder-decoder, Apache 2.0, trained on 14 languages, hitting 4.7% AA-WER and roughly 60x real-time transcription speed.

Agent Research: Natural-Language Harnesses, Meta-Harness, Async SWE Agents, and Long-Context via Filesystems

  • Harness engineering is becoming a research field of its own: A Tsinghua/Shenzhen paper on natural-language agent harnesses proposed letting an LLM execute orchestration logic from an SOP rather than hard-coded harness rules, a direction that multiple practitioners found mind-bending but plausible as context budgets rise (@rronak_ summary). Meta pushed the idea further with Meta-Harness, a method that optimizes the harness end-to-end over code, traces, and scores rather than just the base model; claims include #1 among Haiku agents on TerminalBench-2 and strong gains in text classification and transfer (@yoonholeee, explainer by @LiorOnAI).
  • Async/multi-agent SWE design got stronger empirical backing: The CAID paper from CMU argues for centralized asynchronous isolated delegation using manager agents, dependency graphs, isolated git worktrees, self-verification, and merges. Reported gains were +26.7 absolute on PaperBench and +14.3 on Commit0 versus single-agent baselines, suggesting that concurrency and isolation beat simply giving one agent more iterations (@omarsar0 summary).
  • Coding agents as long-context processors is one of the more interesting reframings: A paper highlighted by @dair_ai treats huge corpora as directory trees and lets off-the-shelf coding agents navigate them with shell commands and Python, rather than stuffing text into context windows or relying purely on retrieval. Reported results include 88.5% on BrowseComp-Plus (750M tokens) vs 80% previous best, and operation up to 3T tokens.

Training, Optimization, Evaluation, and Production Case Studies

  • Muon got a meaningful systems/math optimization: Gram Newton-Schulz is a drop-in replacement for Muon’s Newton-Schulz step that works on the smaller symmetric XXᵀ Gram matrix rather than the large rectangular matrix, reportedly making Muon up to 2x faster while preserving validation perplexity within 0.01. The work drew praise from @tri_dao as the kind of cross-disciplinary linear algebra + fast-kernel result that actually matters.
  • Two practical implementation details stood out: Ross Wightman flagged a subtle but important PyTorch trunc_normal_ misuse pattern in LLM training code: default a/b are absolute values, not standard deviations, so many codebases effectively aren’t truncating at all; he also noted numerical oddities later fixed in nightlies. At the application layer, Shopify’s DSPy case study was notable for economics: one slide highlighted a reduction from $5.5M to $73K/year by decomposing business logic, modeling intent with DSPy, and switching to a smaller optimized model while maintaining performance (follow-up).
  • New evals/benchmarks continued to expose gaps: World Reasoning Arena targets hypothetical/world-model reasoning and reports a substantial gap to humans. Tau Bench’s new banking domain adds a realistic 698-doc support environment where best models still only solve about 25% of tasks. Meanwhile, a Stanford-led paper highlighted by @Zulfikar_Ramzan found sycophantic AI can increase users’ certainty while reducing willingness to repair relationships, underscoring that “helpfulness” metrics can obscure socially harmful behavior.

Top tweets (by engagement)

  • Claude Code computer use: Anthropic’s release was the biggest technical product launch in the set, and likely the most consequential for day-to-day coding-agent UX (announcement).
  • Claude Code hidden features: @bcherny’s thread drew massive engagement, reflecting how quickly expert users are now optimizing around coding-agent workflows rather than raw model prompts.
  • Hermes Agent update: The broad community response to Nous’s major Hermes release suggests open agent harnesses have reached a new adoption phase.
  • Qwen3.5-Omni launch: Alibaba’s multimodal release was one of the day’s biggest model announcements and especially notable for its practical demos around audio/video-driven app creation (launch).
  • llama.cpp at 100k stars: @ggerganov’s milestone post captured the local-first mood of the week: increasingly capable open models plus increasingly capable local runtimes.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Qwen Model Developments and Applications

  • Qwen 3.6 spotted! (Activity: 568): The image showcases a preview of “Qwen 3.6 Plus,” a forthcoming model in the Qwen vision-language series, set to release on March 30, 2026. This model is notable for its massive context size of 1,000,000, which suggests a significant leap in handling extensive data inputs compared to previous iterations. The model also emphasizes the collection of prompt and completion data to enhance its performance, indicating a focus on iterative learning and improvement. Commenters speculate that Qwen 3.6 might address issues like the “overthinking problem” seen in version 3.5, and express excitement about its potential to reach state-of-the-art (SOTA) performance, especially with the 397B model. There is also curiosity about whether a Coder update is imminent.

    • The mention of a ‘1 million context’ by ambient_temp_xeno suggests a significant increase in the model’s ability to handle larger inputs, which could enhance its performance in tasks requiring extensive context retention. This is a notable improvement over previous versions, potentially allowing for more complex and nuanced interactions.
    • Long_comment_san highlights a specific issue with the ‘1.5 presence penalty’ in the current model, suggesting that it negatively impacts the model’s performance in role-playing scenarios. This penalty might be causing the model to overly penalize repeated topics or ideas, which could hinder creative or narrative tasks.
    • ForsookComparison speculates that the 397B model is close to achieving state-of-the-art (SOTA) performance, indicating that while the model has a large parameter count, it may still require fine-tuning to optimize its capabilities fully. This reflects ongoing efforts to balance model size with practical performance improvements.
  • Semantic video search using local Qwen3-VL embedding, no API, no transcription (Activity: 275): The post discusses the use of Qwen3-VL-Embedding for semantic video search, enabling direct embedding of raw video into a vector space for natural language querying without transcription or frame captioning. The 8B model operates locally on Apple Silicon and CUDA, requiring approximately 18GB RAM, while the 2B model needs around 6GB. A CLI tool, SentrySearch, was developed to index and search video footage using ChromaDB, initially based on Gemini’s API but now supporting local Qwen backend. This approach allows for efficient local video search, addressing a common need for local processing capabilities. Commenters appreciate the innovative use of multimodal AI for solving practical issues, with interest in local video search capabilities. There is curiosity about hosting the Qwen3-VL model locally, as some users experience performance issues or high VRAM usage.

    • neeeser inquires about hosting the Qwen-3VL embedding model locally, noting challenges with performance and resource usage. They mention that attempts to run the model are slow even on high-end GPUs like the 4090 and consume a lot of VRAM, highlighting the need for efficient deployment strategies for such models.
    • Octopotree asks whether the system processes videos in real-time during queries or if it pre-processes them. This distinction is crucial for understanding the system’s architecture and performance, as real-time processing could be resource-intensive, whereas pre-processing might allow for faster query responses.
    • The discussion touches on the use of multimodal AI for video search, which involves integrating different types of data (e.g., visual and textual) to enhance search capabilities. This approach can potentially solve complex search problems without relying on traditional methods like transcription, offering a more direct and efficient solution.
  • Meet CODEC: the open-source framework that finally makes “Hey computer, do this” actually work. Screen reading. Voice calls. Multi-agent research. 36 skills. Runs entirely on your machine. (Activity: 175): CODEC is an open-source framework designed to enable comprehensive voice and text control over a computer, running entirely on local hardware without external API calls. It integrates multiple AI models, including Qwen 3.5 35B for reasoning, Whisper for speech recognition, and Kokoro for voice synthesis, all operating on a single Mac Studio. The framework includes seven systems, such as CODEC Core for voice activation and app control, CODEC Dictate for speech-to-text, and CODEC Chat for multi-agent research and document handling. It replaces several external tools with local implementations, emphasizing privacy and autonomy, and is built to be extensible with a focus on accessibility, particularly for users with dyslexia. The project is available on GitHub and is MIT licensed. Commenters are enthusiastic about the potential of running sophisticated AI models like Qwen 3.5 35B locally, highlighting the framework’s ability to leverage mid-range hardware effectively. There is interest in adapting CODEC for different setups, such as Linux, indicating a demand for cross-platform compatibility.

    • bernieth highlights the potential of running advanced models like Qwen 3.5 35b locally, emphasizing the importance of a well-implemented framework to harness these capabilities effectively. This underscores the growing feasibility of deploying sophisticated AI solutions on mid-range hardware without relying on cloud services.
    • super1701 discusses integrating CODEC with Home Assistant (HA) for enhanced functionality, such as using Frigate for security and daily task automation. This points to the versatility of CODEC in smart home environments, allowing for seamless interaction between AI and IoT devices.
    • Aggravating_Fun_7692 raises a concern about the naming similarity between CODEC and Codex, which could lead to confusion. This highlights the importance of distinct branding in the AI space to avoid misunderstandings, especially when dealing with open-source projects.

3. Technical Discussions on AI Model Performance

  • Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion (Activity: 686): Jianyang Gao, the first author of the RaBitQ papers, addresses confusion surrounding the relationship between TurboQuant and RaBitQ in the context of local inference and KV-cache compression. Gao highlights three main concerns: (1) TurboQuant’s incomplete description of RaBitQ, omitting the critical Johnson-Lindenstrauss transformation; (2) unsupported theoretical claims by TurboQuant, which contradict RaBitQ’s established asymptotic optimality; and (3) misleading empirical comparisons, where RaBitQ was tested under less favorable conditions than TurboQuant. Gao urges for public clarification to rectify these issues, especially given the ongoing promotion of TurboQuant and its upcoming presentation at ICLR 2026. OpenReview thread. Commenters emphasize the severity of the empirical comparison issue, noting that inequitable experimental setups should not pass peer review. They also express sympathy for the RaBitQ authors, acknowledging the challenges of addressing publication inaccuracies and the unexpected attention TurboQuant has received.

    • The developer behind the open-source llama.cpp TurboQuant implementation shared detailed performance metrics from community testing. The implementation was tested across various hardware, including Apple Silicon, NVIDIA, and AMD, showing that the asymmetric q8_0-K + turbo4-V configuration is nearly lossless with a +0.0-0.2% perplexity increase across six model families. Additionally, a significant 4.57x KV memory compression was achieved, allowing an 8GB MacBook Air to handle 4000+ tokens, and a 16GB RTX 5070 Ti to manage 131K context tokens. Notably, a CUDA implementation on Blackwell unified memory achieved faster decoding speeds than uncompressed data (63.5 vs 50.1 tok/s).
    • The discussion highlights a critical issue with symmetric turbo quantization on Qwen Q4_K_M, which results in catastrophic performance with a perplexity of 3,400+. However, using asymmetric q8_0-K + turbo-V quantization rescues performance to baseline levels. This issue is attributed to K precision dominating through softmax amplification, and the findings were confirmed on both Metal and CUDA by multiple independent testers. The underlying technique involves rotation and Lloyd-Max scalar quantization, with ongoing debate about the rightful attribution of the method between TurboQuant, RaBitQ, and prior Hadamard transform work.
    • A commenter criticized TurboQuant as “snake oil,” arguing that existing compression techniques like Q8 and Q4, along with Hadamard transforms, have been effectively used for years. This suggests skepticism about TurboQuant’s novelty and effectiveness compared to established methods.
  • In the recent kv rotation PR it was found that the existing q8 kv quants tank performance on AIME25, but can be recovered mostly with rotation (Activity: 393): The image from the GitHub comment highlights a performance evaluation of the AIME25 model using different KV quantization types, specifically focusing on the impact of rotation on performance. The table in the image shows that the Q8_0 KV type without rotation scores 31.7%, but with rotation, it improves to 37.1%. Similarly, the Q4_0 type without rotation scores 0%, but with rotation, it improves to 21.7%. This suggests that rotation can significantly recover performance in certain quantization configurations, which is particularly relevant for users of the Q8 quantization method. Commenters express surprise at the poor performance of the regular Q8_0 KV cache and note the potential benefits of turboquant/rabitq. There is also anticipation for the release of llama-eval, which is expected to enhance convenience.

    • The recent benchmarks highlight a significant performance drop when using Q8_0 kv quantization on the AIME25 model, with a score of 31.7% compared to 37.9% for F16. However, applying rotation to Q8_0 recovers most of the lost performance, bringing the score up to 37.1%. This suggests that rotation can be a crucial factor in optimizing quantized models, particularly for maintaining performance levels close to those of higher precision formats like F16.
    • The data indicates that the Q8_0 kv cache without rotation performs worse than even Q5_1 and Q4_0 with rotation. Specifically, Q5_1 with rotation achieves a score of 32.5%, and Q4_0 with rotation jumps from 2.0% to 21.7%. This demonstrates the potential of rotation to significantly enhance the performance of lower precision quantizations, making them more viable for practical applications.
    • The discussion around turboquant/rabitq suggests that these techniques could offer substantial improvements in quantization performance. Despite skepticism, the evidence from the benchmarks supports the idea that advanced quantization methods, such as those involving rotation, can mitigate the performance degradation typically associated with lower precision kv caches.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Anthropic’s Claude Mythos and AI Model Developments

  • Anthropic is testing ‘Mythos’ its ‘most powerful AI model ever developed’ | Fortune (Activity: 2028): Anthropic is testing a new AI model named ‘Claude Mythos,’ described as their ‘most powerful AI model ever developed.’ This model is part of a new tier called ‘Capybara,’ which surpasses the existing Opus line. The leaked draft materials, exposed due to a CMS misconfiguration, highlight significant improvements in reasoning, coding, and cybersecurity tasks, marking it as a ‘step change’ in capability. The company is cautious about its rollout due to potential misuse risks, focusing initial access on organizations capable of enhancing cybersecurity defenses. The comments reflect a mix of sarcasm and technical interest, with some users expressing skepticism about the utility of testing less powerful models, while others highlight the significance of the model’s advancements over previous iterations.

    • RedRock727 highlights that Anthropic’s new model, referred to as ‘Claude Mythos,’ is reportedly a significant advancement over previous models, with improvements in reasoning, coding, and cybersecurity tasks. The model is part of a new tier called ‘Capybara,’ which is positioned above the current Opus line, indicating a strategic move to enhance AI capabilities. The development follows a data leak incident due to misconfigured CMS assets, which Anthropic attributed to human error.
    • exordin26 elaborates on the new tier of AI models named ‘Capybara,’ which is described as larger and more intelligent than the previous Opus models. This suggests that ‘Capybara’ and ‘Mythos’ might refer to the same underlying model, indicating a significant upgrade in Anthropic’s AI offerings. The focus on a new tier underscores Anthropic’s commitment to advancing AI technology and addressing potential misuse risks, particularly in cybersecurity.
    • The discussion around the leaked draft emphasizes Anthropic’s cautious approach to rolling out ‘Mythos,’ especially given its enhanced cyber capabilities. The company is initially limiting access to organizations capable of bolstering defenses, reflecting concerns about potential misuse. This strategic rollout is part of Anthropic’s broader efforts to ensure safety and security in deploying advanced AI models.
  • Exclusive: Anthropic acknowledges testing new AI model representing ‘step change’ in capabilities, after accidental data leak reveals its existence (Activity: 1261): Anthropic is reportedly testing a new AI model that represents a significant advancement in capabilities compared to its previous releases. This information emerged following an accidental data leak. The model is currently being tested with early access customers, suggesting it may soon be available more broadly. The leak has sparked interest and speculation about the model’s potential impact and improvements over prior versions. Some commenters express skepticism, likening the announcement to typical marketing hype, while others suggest that leaks can serve as effective marketing strategies.

    • The discussion highlights a potential security concern, as the leak of Anthropic’s new AI model coincides with the model’s purported ability to compromise cybersecurity. This raises questions about the robustness of Anthropic’s own security measures, especially given the model’s advanced capabilities.
    • The naming convention for Anthropic’s models is humorously critiqued, noting a shift from elegant musical terms like ‘Opus’ and ‘Sonnet’ to more whimsical names like ‘Capybara’. This could reflect a change in branding strategy or an attempt to differentiate the new model in a crowded market.
    • There is skepticism about the ‘accidental’ nature of the data leak, with some suggesting it might be a strategic marketing move. The leak included a full interview and prepared quotes, which could indicate a controlled release to generate buzz and interest in the new model.

2. OpenAI’s Challenges and Cancellations

  • OpenAI is in big trouble (Activity: 2616): The image is a screenshot from an article in The Atlantic titled “OpenAI Is Doing Everything … Poorly,” which critiques OpenAI’s recent strategic decisions and project cancellations. The article highlights several initiatives that OpenAI has either shelved or cancelled, such as the Sora video generator and the Stargate project, and notes delays in promised hardware. These moves are interpreted as signs of trouble for OpenAI, as they face competition from other AI companies like Anthropic and Google’s Gemini. The article suggests that OpenAI’s focus is shifting towards more profitable enterprise solutions amidst a compute shortage, rather than consumer-facing projects. Commenters argue that OpenAI’s decisions reflect a strategic pivot towards enterprise solutions due to a compute shortage, rather than signs of trouble. They note that projects like Sora were financially unsustainable, costing $15 million a day, and that focusing on enterprise is a more viable business strategy.

    • triclavian highlights the strategic shift by OpenAI towards prioritizing enterprise clients due to a global compute shortage. The decision to cut less profitable services like AI video generation is seen as a move to optimize resources for more lucrative enterprise applications, suggesting a focus on sustainable business practices.
    • ripestmango points out the financial burden of maintaining free services like Sora, which reportedly cost $15 million daily. The commenter supports the decision to discontinue such services, arguing that they contributed to excessive, low-value AI content, and suggests reallocating resources to more impactful projects.
    • cfeichtner13 argues that video and image generation are not profitable and consume significant computational resources. They note that similar technologies from China outperform OpenAI’s offerings, and suggest that focusing on enterprise solutions and robotics is a more viable path forward, especially given the challenges in expanding data center capacity.
  • Is this poor execution or just a company at work trying things (Activity: 713): The image is a meme-style critique of OpenAI’s recent business decisions, highlighting several projects like the Sora video generator and Stargate project that were launched and then canceled or delayed. The tweet by Katie Miller and the headline from The Atlantic suggest that these actions might reflect poor execution rather than strategic experimentation. The comments discuss the challenges OpenAI faces in finding a scalable and profitable business model, noting that the company is still in a startup phase despite its large user base. Commenters suggest that OpenAI’s actions might be driven by the need to find profitability and a sustainable business model, with some viewing the company’s current state as typical of a startup still searching for a viable path forward.

    • handbrake2k highlights a common startup challenge faced by OpenAI: achieving a scalable and profitable business model after gaining a large user base. This situation is ironic given that OpenAI’s approach might have been critiqued by Y-Combinator, known for advising startups on sustainable growth strategies.
    • edjez criticizes the focus on consumer video entertainment, suggesting that maintaining GPU resources for this purpose by 2026 is impractical. This implies a need for OpenAI to realign its resources towards more sustainable and profitable ventures.
    • Acedia_spark suggests that OpenAI’s rush to capture market share may have led to perceived incompetence. The pivot to enterprise solutions, while potentially strategic, appears reactionary amidst broader operational challenges, likened to ‘trying to stop the Titanic mid-sink.’
  • OpenAI halts “Adult Mode” as advisors, investors, and employees raise red flags (Activity: 654): OpenAI has paused its ‘Adult Mode’ chatbot development due to concerns from employees, investors, and its advisory board about the societal impact of sexual AI content. A critical issue was the age verification system, which incorrectly identified minors as adults in 12% of cases, raising significant ethical and safety concerns. OpenAI is now shifting focus towards productivity tools and a ‘super app’ based on ChatGPT. More details can be found here. Commenters express skepticism about the narrative of AI as a ‘sexy suicide coach’ and criticize OpenAI’s potential alignment with conservative values, suggesting a shift towards military applications if public use is restricted.

    • A user points out that other language models like Gemini and Grok already support adult content, questioning why OpenAI’s decision to halt ‘Adult Mode’ is seen as a red flag. This suggests a potential inconsistency in industry standards or public perception regarding AI content moderation.
    • Another comment highlights the irony in OpenAI’s decision, suggesting that if the company continues to cater to conservative viewpoints, it might pivot towards military contracts instead of public use. This reflects a broader debate on the ethical and societal implications of AI deployment, particularly in balancing moral values with technological capabilities.

3. Claude Usage Issues and Subscription Complaints

  • Update on Session Limits (Activity: 2467): Anthropic has adjusted the 5-hour session limits for their Claude AI service during peak hours (weekdays, 5am–11am PT / 1pm–7pm GMT) for free, pro, and max subscriptions. While weekly limits remain unchanged, users will exhaust their session limits faster during these times. This change affects approximately 7% of users, particularly those in pro tiers, and is aimed at managing increased demand. Users running token-intensive tasks are advised to schedule them during off-peak hours to maximize session usage. Commenters criticize the lack of transparency from Anthropic, suggesting the change was implemented quietly and expressing frustration over reduced peak limits. They emphasize the importance of open communication, especially when handling scaling challenges.

    • shyney highlights that the session limits were not a bug but an intentional change by Anthropic, suggesting it was done quietly to avoid user backlash. This points to a strategic decision in managing system resources without upfront communication, which can impact user trust and transparency.
    • Wise-Reflection-7400 notes a shift in resource allocation, where the previously offered 2x off-peak bonus has been counterbalanced by reduced peak limits. This reflects a common strategy in resource management where benefits are adjusted to manage demand and system load effectively.
    • This-Shape2193 criticizes the lack of transparency in communication regarding the session limits, emphasizing that users would have been understanding of scaling challenges if communicated openly. The comment underscores the importance of effective consumer outreach and PR in maintaining user trust, especially during significant operational changes.
  • This isn’t right (Activity: 888): The post highlights concerns about Claude AI’s usage transparency and session limits, particularly for Pro tier users. The user reports that simple interactions, such as saying “Hello” and asking for the weather, consumed 7% of their usage quota, which they find excessive. The user also criticizes the customer service for being unhelpful, as it relies on a chatbot that reiterates policy without resolving issues. Commenters express dissatisfaction with the service, with one user noting that they hit a session limit after only two messages, questioning if this is normal. Another user mentions canceling their subscription due to the lack of transparency and perceived decline in service quality.

    • Users are reporting significant limitations with the Claude AI Pro subscription, where even minimal usage like editing two Word documents or making simple layout changes in a book quickly exhausts the session limits. This has led to dissatisfaction and cancellations, as users feel the service does not match the expectations set by the subscription model.
    • There is a notable lack of transparency regarding the usage limits of Claude AI’s Pro subscription. Users are expressing frustration over the rapid depletion of their usage quota, which is not clearly communicated at the time of purchase, leading to a perception of reduced service quality and value.
    • Some users are comparing Claude AI unfavorably to competitors like Gemini, citing a decline in service quality and transparency as reasons for switching. The sentiment is that the current limitations and lack of clear communication are driving users away, despite previous loyalty to the platform.
  • Subscribed yesterday to Pro and I’m already hit by limits. Is this a scam? (Activity: 900): A user subscribed to Claude Pro for $20/month to use as a coding assistant but encountered usage limits after only two hours of work on a WordPress plugin. The user expressed dissatisfaction with the service, noting that they were not working with large files or complex tasks, and decided to cancel the subscription, citing issues with the refund process. This raises concerns about the practicality of the Pro plan for developers, especially given the expectations set by Sonnet 3.5/Opus. Several users reported similar issues with the Claude Pro subscription, noting unexpected usage limits after minimal interaction, such as editing two Word documents or typical prompts. This suggests a recent change in usage policies or limits, leading to dissatisfaction and decisions not to renew subscriptions.

    • Users are reporting unexpected changes in usage limits for the Pro subscription, with some experiencing a significant increase in usage percentage after typical prompts. One user noted they reached 50% usage quickly, suggesting a potential alteration in the service’s usage policy or calculation method.
    • A user who upgraded to the Max plan, which costs approximately $100, reported hitting their usage limit within just three hours of active use. This is a stark contrast to their previous experience, indicating a possible change in how usage is tracked or enforced.
    • There is concern among users that these new limitations could drive them to switch to alternative AI services, such as Claude. The sentiment is that if these issues are not addressed, it could lead to a decline in user retention, similar to past shifts from ChatGPT to other platforms.

AI Discords

Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.