a quiet day.
AI News for 4/21/2026-4/22/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
AI Twitter Recap
Open Models: Qwen3.6-27B, OpenAI Privacy Filter, and Xiaomi MiMo-V2.5
-
Qwen3.6-27B lands as a serious local/open coding model: @Alibaba_Qwen released Qwen3.6-27B, a dense, Apache 2.0 model with thinking + non-thinking modes and a unified multimodal checkpoint. Alibaba claims it beats the much larger Qwen3.5-397B-A17B on major coding evals, including SWE-bench Verified 77.2 vs 76.2, SWE-bench Pro 53.5 vs 50.9, Terminal-Bench 2.0 59.3 vs 52.5, and SkillsBench 48.2 vs 30.0. It also supports native vision-language reasoning over images and video. The ecosystem moved immediately: vLLM shipped day-0 support, Unsloth published 18GB-RAM local GGUFs, ggml added llama.cpp usage, and Ollama added a packaged release. Early user reports from @KyleHessling1 and @simonw were notably strong for local frontend/design and image tasks.
-
OpenAI quietly open-sources a practical privacy model: Multiple observers flagged OpenAI’s new Privacy Filter, a lightweight Apache 2.0 open model for PII detection and masking. According to @altryne, @eliebakouch, and @mervenoyann, it is a 1.5B total / 50M active MoE token-classification model with a 128k context window, intended for cheap redaction over very large corpora and logs. This is a more operationally interesting release than a generic “small open model”: it targets a concrete infra problem in enterprise/agent pipelines where on-device or low-cost preprocessing matters.
-
Xiaomi pushes agentic open models upward: @XiaomiMiMo announced MiMo-V2.5-Pro and MiMo-V2.5. Xiaomi positions V2.5-Pro as a major jump in software engineering and long-horizon agents, citing SWE-bench Pro 57.2, Claw-Eval 63.8, and τ3-Bench 72.9, with claims of 1,000+ autonomous tool calls. The non-Pro model adds native omnimodality and a 1M-token context window. Arena quickly listed MiMo-V2.5 in Text/Vision/Code evaluation, and Hermes/Nous integration followed via @Teknium.
Google Cloud Next: TPU v8, Gemini Enterprise Agent Platform, and Workspace Intelligence
-
Google’s infra announcements were substantial, not cosmetic: @Google and @sundarpichai introduced 8th-gen TPUs with a split design: TPU 8t for training and TPU 8i for inference. Google says 8t delivers nearly 3x compute per pod vs Ironwood, while 8i connects 1,152 TPUs per pod for low-latency inference and high-throughput multi-agent workloads. Commentary from @scaling01 highlighted an additional claim: Google can now scale to a million TPUs in a single cluster with TPU8t. The productization signal matters as much as the raw hardware: Google is clearly aligning chips, models, agent tooling, and enterprise control planes into one vertically integrated offering.
-
Enterprise agents became a first-class Google product surface: @GoogleDeepMind and @Google launched Gemini Enterprise Agent Platform, framed as the evolution of Vertex AI into a platform for building, governing, and optimizing agents at scale. It includes Agent Studio, access to 200+ models via Model Garden, and support for Google’s current stack including Gemini 3.1 Pro, Gemini 3.1 Flash Image, Lyria 3, and Gemma 4. Related launches included Workspace Intelligence GA as a semantic layer over docs/sheets/meetings/mail, Gemini Enterprise inbox/canvas/reusable skills, Agentic Data Cloud, security agents with Wiz integration, and Gemini Embedding 2 GA, a unified embedding model across text, image, video, audio, and documents.
Agents, Harnesses, Traces, and Team Workflows
-
The “agent harness” abstraction is hardening across vendors: OpenAI introduced workspace agents in ChatGPT, shared Codex-powered agents for teams that can operate across docs, email, chat, code, and external systems, including Slack-based workflows and scheduled/background tasks. Google made a parallel enterprise move with Gemini Enterprise Agent Platform, while Cursor added Slack invocation for task kick-off and streaming updates. The pattern is converging: cloud-hosted agents, shared team context, approvals, and long-running execution rather than single-user chat.
-
Developer ergonomics around harness/model independence improved: VS Code/Copilot rolled out bring-your-own-key/model support across plans and business/enterprise, enabling providers like Anthropic, Gemini, OpenAI, OpenRouter, Azure, Ollama, and local backends. This is strategically important because, as @omarsar0 noted, most models still seem overfit to their own agent harnesses. Cognition’s Russell Kaplan made the complementary business case: enterprise buyers want model flexibility and infrastructure that spans the full SDLC, not attachment to one lab.
-
Traces/evals/self-improvement are becoming the core agent data primitive: The strongest thread here came from LangChain-adjacent discussion. @Vtrivedy10 argued that traces capture agent errors and inefficiencies, and that compute should be pointed at understanding traces to generate better evals, skills, and environments; a longer follow-up expanded this into a concrete loop involving trace mining, skills, context engineering, subagents, and online evals. @ClementDelangue pushed for open traces as the missing data substrate for open agent training, while @gneubig promoted ADP / Agent Data Protocol standardization. LangChain also teased a stronger testing/evaluation product direction via @hwchase17.
Post-Training, RL, and Inference Systems
-
Perplexity and others shared more of the post-training playbook: @perplexity_ai published details on a search-augmented SFT + RL pipeline that improves factuality, citation quality, instruction following, and efficiency; they say Qwen-based systems can match or beat GPT-family models on factuality at lower cost. @AravSrinivas added that Perplexity now runs a post-trained Qwen-derived model in production that unifies tool routing and summarization and is already serving a significant share of traffic. On the research side, @michaelyli__ introduced Neural Garbage Collection, using RL to jointly learn reasoning and KV-cache retention/eviction without proxy objectives; @sirbayes reported a Bayesian linguistic-belief forecasting agent matching human superforecasters on ForecastBench.
-
The “minimal editing” problem in coding models got a useful benchmark treatment: @nrehiew_ presented work on Over-Editing, where coding models fix bugs by rewriting too much code. The study constructs minimally corrupted problems and measures excess edits with patch-distance and added Cognitive Complexity; it finds GPT-5.4 over-edits the most while Opus 4.6 over-edits the least, and that RL outperforms SFT, DPO, and rejection sampling for learning a generalizable minimal-editing style without catastrophic forgetting. This is one of the more practical post-training/eval contributions in the set because it targets a failure mode engineers actually complain about in production code review.
-
Inference efficiency work remained highly active: @cohere integrated production W4A8 inference into vLLM, reporting up to 58% faster TTFT and 45% faster TPOT vs W4A16 on Hopper; the details include per-channel FP8 scale quantization and CUTLASS LUT dequantization. @WentaoGuo7 reported SonicMoE throughput gains on Blackwell—54% / 35% higher fwd/bwd TFLOPS than DeepGEMM baseline—while maintaining dense-equivalent activation memory for equal active params. @baseten introduced RadixMLP for shared-prefix elimination in reranking, with 1.4–1.6x realistic speedups.
Top tweets (by engagement)
- OpenAI workspace agents: @OpenAI launched shared, Codex-powered workspace agents for Business/Enterprise/Edu/Teachers.
- Qwen3.6-27B release: @Alibaba_Qwen announced the new open 27B dense model with strong coding claims and Apache 2.0 licensing.
- Google TPU v8: @sundarpichai previewed TPU 8t / 8i, with training/inference specialization.
- Flipbook / model-streamed UI: @zan2434 showed a prototype where the screen is rendered as pixels directly from a model rather than traditional UI stacks.
- OpenAI Privacy Filter: @scaling01 and others highlighted OpenAI’s new open-source PII detection/redaction model on Hugging Face.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Qwen 3.6 Model Releases and Benchmarks
-
Qwen 3.6 27B is out (Activity: 2576): Qwen 3.6 27B, a new language model, has been released on Hugging Face. This model features
27 billion parametersand is designed to improve upon previous iterations with enhanced performance benchmarks. A quantized version is also available, Qwen3.6-27B-FP8, which allows for more efficient deployment in environments with limited computational resources. The release includes detailed benchmark results, showcasing its capabilities across various tasks. The community is expressing excitement about the release, with some users highlighting the significance of the model’s performance improvements and the availability of a quantized version for broader accessibility.- Namra_7 shared a benchmark image for Qwen 3.6 27B, which likely includes performance metrics such as inference speed, accuracy, or other relevant statistics. However, the specific details of the benchmarks are not described in the comment itself.
- challis88ocarina mentioned a quantized version of Qwen 3.6 27B available on Hugging Face, specifically in FP8 format. Quantization can significantly reduce the model size and improve inference speed, making it more efficient for deployment without a substantial loss in accuracy. The link provided leads to the Hugging Face model repository for further exploration.
- Eyelbee posted another image link, which might contain additional visual data or performance metrics related to Qwen 3.6 27B. However, the comment does not provide specific insights or details about the content of the image.
-
Qwen3.6-27B released! (Activity: 895): Qwen3.6-27B is a newly released dense, open-source model that excels in coding tasks, outperforming its predecessor, Qwen3.5-397B-A17B, on major coding benchmarks. It features strong reasoning capabilities across both text and multimodal tasks and offers flexibility with ‘thinking’ and ‘non-thinking’ modes. The model is released under the Apache 2.0 license, making it fully open-source and accessible for community use. More details can be found on their blog, GitHub, and Hugging Face. The comments reflect excitement and admiration for the Qwen team, with users expressing eagerness to utilize the model on their hardware and suggesting the team’s contributions are monument-worthy.
- ResearchCrafty1804 highlights the impressive performance of Qwen3.6-27B, noting that despite having only 27 billion parameters, it surpasses the much larger Qwen3.5-397B-A17B model on several coding benchmarks. Specifically, it achieves scores of 77.2 on SWE-bench Verified, 53.5 on SWE-bench Pro, 59.3 on Terminal-Bench 2.0, and 48.2 on SkillsBench, outperforming the larger model by significant margins in each case.
- bwjxjelsbd comments on the competitive landscape, expressing satisfaction that Alibaba is advancing with Qwen models after META’s perceived setbacks. The commenter hopes for continued competition and transparency, suggesting that META should open-source their Muse family models to maintain a healthy competitive environment.
-
Qwen3.6-35B becomes competitive with cloud models when paired with the right agent (Activity: 848): The post discusses the significant improvement in benchmark performance of the Qwen3.6-35B model when paired with the
little-coderagent, achieving a78.7%success rate on the Polyglot benchmark, placing it in the top 10. This improvement highlights the impact of using appropriate scaffolds, suggesting that local models may underperform due to harness mismatches. The author plans to test further on Terminal Bench and GAIA for research capabilities. Full details and benchmarks are available on GitHub and Substack. Commenters express surprise at the performance gains from scaffold changes, questioning the validity of benchmarks that don’t control for such factors. There’s also interest in using pi.dev for its extensibility in harnessing models.- DependentBat5432 highlights a significant performance improvement in Qwen3.6-35B when changing the scaffold, noting a jump from
19%to78%. This raises concerns about the validity of benchmark comparisons that do not control for such variables, suggesting that scaffold choice can dramatically affect model performance. - Willing-Toe1942 reports that Qwen3.6, when used with pi-coding agents, performs almost twice as well as opencode. This comparison involved tasks like modifying HTML code and searching online resources for documentation, indicating that the choice of agent can significantly enhance the model’s effectiveness in practical coding scenarios.
- kaeptnphlop mentions the strong performance of Qwen-Coder-Next when paired with GitHub Copilot in VS Code, suggesting potential for further exploration with other tools like little-coder. This implies that integrating Qwen models with popular coding environments can leverage their strengths effectively.
- DependentBat5432 highlights a significant performance improvement in Qwen3.6-35B when changing the scaffold, noting a jump from
-
Qwen3.6-27B released! (Activity: 368): The image is a performance comparison chart highlighting the capabilities of the newly released Qwen3.6-27B model across various benchmarks. It shows that Qwen3.6-27B outperforms its predecessor, Qwen3.5-27B, and other models like Gemma4-31B in categories such as Terminal-Bench 2.0 and SWE-bench Pro, indicating significant improvements in coding, reasoning, and real-world task performance. The chart visually emphasizes the model’s superior scores, suggesting advancements in its architecture or training methodologies. One commenter expresses anticipation for the release of a larger model, Qwen122b, while another discusses potential issues with the model’s ‘thinking’ process, indicating a need for optimization in certain use cases. A link to the model on Hugging Face is also shared, suggesting community interest in exploring and utilizing the model.
- MrWeirdoFace mentions an issue with the Qwen3.6-27B model, specifically when using the ‘unsloth Q5 quant’ version, where the model tends to get ‘lost in thought cycles’. This suggests a potential problem with the model’s inference process, possibly related to its quantization or optimization settings, which might need adjustment to improve performance.
- andreabarbato notes that the Qwen3.6-27B model in ‘q4’ quantization provides good output quality but also suffers from getting ‘lost in crazy loops’. This indicates a recurring issue with the model’s reasoning or decision-making processes, which could be a result of the quantization method affecting the model’s stability or coherence during inference.
- DjsantiX inquires about fitting the Qwen3.6-27B model into a ‘5060 ti 16gb’ GPU, highlighting a common challenge of deploying large models on consumer-grade hardware. This reflects the ongoing need for efficient model optimization and quantization techniques to enable the use of large-scale models on limited-resource environments.
2. Gemma 4 Model Capabilities and Comparisons
-
An actual example of “If you dont run it, you dont own it” and Gemma 4 beats both Chat GPT and Gemini Chat (Activity: 355): The post discusses the performance of various AI models in translating a Chinese novel, highlighting issues of model degradation and censorship. Initially, GPT OSS 120B and Qwen 3 Max were used, but both failed due to name mixing and censorship, respectively. Chat GPT 4o initially performed well but degraded with updates, leading to a 20% failure rate in translations. Surprisingly, Gemma 4 31B outperformed both Gemini Chat and GPT 5.3, providing natural and accurate translations. The results were confirmed by testing multiple models, where Gemma 4 consistently delivered superior performance, even compared to Google’s Gemini. Commenters noted that Gemma 4 has been widely praised for its language abilities, with some users initially underestimating it compared to Qwen 3.5. The model’s availability for free has been appreciated, and it is seen as a significant advancement for creative writing and role-playing communities. External benchmarks also support these findings, highlighting Gemma 4’s capabilities.
- Uncle___Marty highlights the distinct language capabilities of Gemma 4, noting that while initially it seemed inferior to Qwen 3.5, both models excel in different areas. This suggests a specialization in tasks, with Gemma 4 potentially outperforming in certain linguistic tasks. The comment underscores the accessibility of these advanced models, emphasizing the generosity of the Gemma team and Alibaba in providing them for free.
- Potential-Gold5298 references benchmark comparisons from dubesor.de and foodtruckbench.com, indicating that Gemma 4 is a significant advancement for the RP community, which had been reliant on older models like Mistral Nemo and Mistral Small. This suggests that Gemma 4 offers superior performance in creative writing and role-playing applications, filling a gap left by older models.
- Sevenos praises Gemma 4’s proficiency as a German chatbot, noting its ability to structure responses with minimal language errors. This indicates a high level of linguistic accuracy and usability in non-English languages, which is a significant achievement for AI models. The comment also hints at the potential for a larger version, suggesting that current performance is already competitive with Gemini.
-
Gemma 4 Vision (Activity: 409): The post discusses the configuration of the Gemma 4 Vision model, specifically focusing on its vision budget settings. The default configuration from Google sets the vision budget at
280tokens, which corresponds to approximately645K pixels, but this is considered insufficient for detailed OCR tasks. Users can adjust this inllama.cppby setting--image-min-tokensand--image-max-tokensto higher values, such as560and2240respectively, to improve image detail recognition. This adjustment significantly increases VRAM usage, from63 GBto77 GBfor a4096batch size. The post also notes that Gemma 4 outperforms other models like Qwen 3.5, Qwen 3.6, and GLM OCR in vision tasks when properly configured. A commenter inquires about the minimum token settings for smaller models, questioning whether the40token minimum applies only to larger models withc500mvision encoders. Another user requests detailed configuration options forllamacppandvllm, indicating a need for more comprehensive setup guidance.- Temporary-Mix8022 discusses working with vision encoders in smaller models, specifically mentioning a parameter size of
c150mand using70 tokensas a minimum. They inquire whether40 tokensis the actual minimum, or if this applies only to larger models withc500mvision encoders. This highlights the importance of understanding token limits in model configurations for optimal performance. - stddealer shares their experience using
--image-min-tokens 1024 --image-max-tokens 1536with Gemma4’s vision, a habit carried over from using Qwen3.5. This configuration choice led to confusion about the perceived underperformance of Gemma4’s vision capabilities, suggesting that token settings significantly impact model output quality. - eposnix points out a limitation in LM Studio for vision tasks, noting that it does not expose certain variables necessary for configuring vision models effectively. This lack of configurability is a barrier for users needing to adjust parameters for specific vision tasks, indicating a potential area for improvement in the software.
- Temporary-Mix8022 discusses working with vision encoders in smaller models, specifically mentioning a parameter size of
3. Ultimate Lists of Open Source Models
-
Ultimate List: Best Open Models for Coding, Chat, Vision, Audio & More (Activity: 313): The post provides a comprehensive list of the best open-source AI models across various domains, including audio generation, image generation, image-to-video, image-to-text, and text generation. Notable models include Qwen3-TTS for text-to-speech, VoxCPM2 for voice cloning, ACE-Step 1.5 for music generation, and GLM-5.1 for text generation. Each model is highlighted for its specific strengths, such as Qwen3-TTS for quality and speed balance, VibeVoice Realtime for real-time applications, and GLM-5.1 for agentic engineering and long-horizon coding tasks. The list includes links to repositories and emphasizes models’ unique capabilities, such as LTX-2.3 for 4K video generation and GLM-OCR for OCR speed and accuracy. The comments reflect skepticism about the reliability and factual basis of the list, with one user sarcastically suggesting that random chance could yield similar results. Another comment simply mentions ‘omnivoice,’ possibly indicating interest or skepticism about the audio models.
- SatoshiNotMe highlights the omission of specific Speech-to-Text (STT) and Text-to-Speech (TTS) models in the list, mentioning
PocketTTSfrom KyutAI andParakeet V3for STT. These models are noted for their regular use, suggesting they are reliable and effective in their respective domains. - ecompanda discusses the rapid evolution of AI models, noting that the ‘best models’ list becomes outdated quickly due to frequent updates and new releases. They mention that
Qwen 3.6 Plushas recently reshuffled the coding leaderboard, similar to the impact ofGemma 4. This highlights the challenge of maintaining an up-to-date list without frequent updates.
- SatoshiNotMe highlights the omission of specific Speech-to-Text (STT) and Text-to-Speech (TTS) models in the list, mentioning
-
Ultimate List: Best Open Source Models for Coding, Chat, Vision, Audio & More (Activity: 252): The post provides a comprehensive list of the best open-source AI models across various domains, including audio generation, image generation, and text generation. Notable models include Qwen3-TTS for text-to-speech with a balance of quality and speed, VoxCPM2 for high-quality voice cloning, and ACE-Step 1.5 for music generation. In image generation, FLUX.1 [schnell] is highlighted for its speed and quality on consumer GPUs, while Stable Diffusion 3.5 Large is noted for its versatility in fine-tuning and editing. For text generation, GLM-5.1 by Zhipu AI is a flagship model with a 744B MoE architecture, excelling in long-horizon coding tasks. The list also includes models for image-to-video and image-to-text generation, such as LTX-2.3 for 4K video generation and GLM-OCR for OCR tasks. Comments suggest a need for better formatting of the list for clarity. There is also a debate on the effectiveness of Qwen TTS for longer audio generation, with some users preferring Kokoro for certain tasks.
- Adrian_Galilea raises a technical point about the performance of the Qwen TTS model, questioning its effectiveness for audio longer than a minute. They suggest that Kokoro might be a better alternative, implying potential limitations in Qwen TTS’s handling of longer audio sequences.
- decentralize999 references an external resource, Artificial Analysis, which provides up-to-date leaderboards for model performance. They also mention Qwen3.6-35B as one of the top models currently, highlighting its significance in the field.
- oguza inquires about the inclusion of Flux.2 dev and Klein, suggesting interest in these models’ capabilities or performance. This indicates a potential gap in the original list regarding these specific models.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Claude Code Feature Changes and User Reactions
-
PSA: Claude Pro no longer lists Claude Code as an included feature (Activity: 4239): Claude Pro has removed Claude Code as an included feature from its Pro plan, as observed on their pricing page. The support article, now titled “Using Claude Code with your Max plan,” indicates a shift in availability, suggesting that Claude Code is now exclusive to the Max plan. The article was updated recently, reflecting this change, although cached results still show the previous inclusion in the Pro plan. The comments reflect dissatisfaction with the change, with users expressing frustration and considering unsubscribing due to the removal of Claude Code from the Pro plan.
-
Anthropic response to Claude Code change (Activity: 1975): Anthropic is conducting a test affecting
~2%of new prosumer signups, focusing on changes in subscription plans due to evolving usage patterns of the Claude Code feature. Initially, the Max plan was designed for heavy chat usage, but with the integration of Claude Code, Cowork, and long-running async agents, user engagement has increased significantly. This has led to adjustments like weekly caps and tighter limits during peak times. The test aims to explore options for maintaining service quality, with assurances that existing subscribers will be notified of any changes well in advance. Amol Avasare announced this on X, highlighting the shift of Claude Code from Pro to Max, which has increased costs for users. Commenters express skepticism about the transparency and communication of the test, with some perceiving it as a potential negative change for users. Concerns include the randomness of access to Claude Code for new signups and the perception of the test as a ‘gacha game’.- A user highlights that Anthropic is conducting a test where only
2%of new prosumer signups have access to Claude Code, yet the documentation has already been updated to reflect this change. This raises concerns about transparency and communication, as users are confused about whether they will have access to the feature upon signing up. - Another commenter questions the logic behind the test, suggesting that the randomness of access to Claude Code for new pro users resembles a ‘gacha game’ mechanic. This implies a lack of predictability and fairness in how features are distributed among users, which could affect user trust and satisfaction.
- A user speculates on the purpose of the test, humorously suggesting that it might be to observe user reactions when they find out they don’t have access to a feature they expected. This points to potential issues in user experience and expectation management, as well as the importance of clear communication from Anthropic.
- A user highlights that Anthropic is conducting a test where only
-
Does Claude’s $20 Plan No Longer Include Claude Code? (Activity: 1477): The image is a pricing table for Claude’s subscription plans, showing that the ‘Claude Code’ feature is not included in the $20 Pro plan, but is available in the Max 5x and Max 20x plans. This has caused confusion among users, as some recall ‘Claude Code’ being part of the Pro plan previously. The discrepancy between the information on Claude.com and Claude.ai adds to the confusion, suggesting a recent change or inconsistency in the feature offerings. Users are concerned about the impact on hobbyist programming and are considering alternatives like ChatGPT and Codex. Users express frustration over the removal of ‘Claude Code’ from the Pro plan, feeling it limits personal use and may push them towards other services. The inconsistency between different Claude websites adds to the dissatisfaction.
- There is confusion regarding the availability of Claude Code in the Pro plan, with some users reporting recent access while others note discrepancies between information on Claude.com and Claude.ai. This suggests potential inconsistencies in communication or implementation of plan features.
- A user provided a link to a support article that initially suggested Claude Code was available for both Pro and Max plans, but now redirects to a page indicating it’s only available for the Max plan. This change implies a possible shift in service offerings, though it’s unclear if this is intentional or an error.
- The uncertainty around Claude Code’s availability in the Pro plan is causing concern among users who rely on it for hobbyist programming. The potential removal could push users towards alternatives like ChatGPT and Codex, highlighting the importance of clear communication from service providers regarding feature availability.
-
Sama is on 🔥🔥 (Activity: 1164): The image is a meme-like screenshot of a Twitter exchange involving Sam Altman and discussions about Anthropic’s decision to remove Claude Code from the Pro plan, requiring users to upgrade to Max for access. This decision has sparked controversy, as highlighted by Amol Avasare’s clarification that this change affects new signups, not existing subscribers. The exchange includes a dismissive response from Sam Altman, ‘ok boomer,’ which has attracted significant attention. The post and comments reflect dissatisfaction with Anthropic’s A/B testing practices, which some users find unethical, and critique Sam Altman’s public persona. Commenters express strong disapproval of Anthropic’s decision-making, particularly the ethics of their A/B testing strategy, and criticize Sam Altman’s response as unprofessional and indicative of broader issues with his public image.
- SilasTalbot raises concerns about the ethics of A/B testing, particularly when 1 in 50 users receive less functionality without being informed. This practice can be seen as unethical, especially if it involves removing access to key features, as mechapaul also highlights. Such tests can negatively impact user trust and satisfaction.
- gloobit criticizes the decision to remove a key feature as part of a test, suggesting that it is unrealistic to expect users to upgrade to a $200/month plan immediately. This points to potential misjudgments in product strategy and user experience management, which could lead to customer dissatisfaction and churn.
-
Head of Growth at Anthropic regarding Claude Code removal from Pro (Activity: 2197): The image and accompanying discussion highlight a strategic shift by Anthropic in their subscription model, specifically affecting the availability of Claude Code. The company is transitioning this feature from the Pro plan to the more expensive Max plan, which costs at least
$100per month. This change is part of a limited test impacting about2%of new subscribers, while existing Pro and Max users remain unaffected. The move is seen as a response to resource constraints, particularly compute availability, which is a significant issue for AI companies. The decision has sparked debate about pricing strategies and resource allocation in the AI industry. Commenters express concern over the increasing costs and resource limitations in AI services, with some suggesting that Anthropic’s decision reflects broader industry challenges in managing compute resources. There is also criticism of the pricing strategy, with calls for a more affordable tier between Pro and Max.- samwise970 highlights that Anthropic’s decision to remove Claude Code from the Pro tier is likely due to a shortage of computational resources. They argue that if Anthropic had sufficient compute, the marginal cost of inference would be minimal, suggesting that the company is trying to manage limited resources by increasing prices.
- RemarkableGuidance44 discusses the broader issue of resource constraints in AI, noting that several companies, including GitHub Co-Pilot and OpenAI, are facing similar challenges. They mention that Anthropic’s token usage costs have increased, which reduces the value of subscriptions, and suggest that the recent performance improvements are merely fixes for existing issues rather than genuine enhancements.
- band-of-horses questions the usage patterns of Claude, suggesting that it is primarily used for coding rather than general chat. They note that users interested in general knowledge tend to prefer other AI models like Gemini and ChatGPT, indicating a potential niche market for Claude focused on coding applications.
-
We’re saved! Claude Code is back in the Pro plan! (Activity: 586): The image is a pricing plan comparison for a service called Claude, highlighting that “Claude Code” is now included in the Pro plan. This suggests a change or update in the service offerings, where previously “Claude Code” might not have been available in the Pro plan. The table also lists other features like “Chat on web, iOS, Android and Desktop” and “Claude Cowork,” indicating a tiered service structure with varying feature availability. The return of “Claude Code” to the Pro plan is met with relief or excitement, as indicated by the title and the red-circled checkmark in the image. Commenters express skepticism about the longevity of this change, with some suggesting it might be part of A/B testing. There is also a discussion about the value and limitations of the $20 plan, with some users indicating that they occasionally hit usage limits even on higher-tier plans.
- A user speculates that the $20 Claude Code plan might be restrictive, especially for those who hit usage limits even on the $100 plan. This suggests that the lower-tier plan may not provide sufficient resources for heavy users, potentially leading to frequent limitations on usage.
- Another user predicts a potential price increase for the Claude Pro plan or the introduction of a new Pro+ subscription tier at $50. This reflects a common strategy in subscription services where companies adjust pricing or introduce new tiers to balance demand and resource allocation.
- There is a concern that the company might reduce usage limits for the Pro plan without notice. This could be a strategy to manage costs or encourage users to upgrade to higher tiers, reflecting a common practice in subscription-based models to optimize revenue.
-
Claude Code no longer listed as a feature for Claude Pro (Activity: 2784): Claude Code has been removed from the feature list for the Claude Pro plan on the official website’s comparison chart. This change suggests a shift in the feature offerings for the Pro plan, potentially impacting users who relied on Claude Code for development purposes. The pricing page for Claude by Anthropic outlines various subscription plans, each with distinct features and usage limits, but now without Claude Code for Pro users. For more details, refer to the Claude Pricing. Some users express dissatisfaction with the removal of Claude Code, citing the high cost of $100/month as unjustifiable for hobby projects. Others suggest switching to alternatives like Codex.
- A user expressed dissatisfaction with the removal of Claude Code from the Claude Pro feature set, highlighting the cost of $100/month as unjustifiable for personal projects. This indicates a potential shift in user base towards alternatives like Codex, which may offer similar functionalities at a more competitive price point.
- Another user shared a screenshot confirming the removal of Claude Code from the feature list, suggesting that this change is indeed official. This visual evidence supports the claim that Claude Code is no longer part of the Claude Pro offering, which could impact users relying on this feature for coding tasks.
- A user mentioned their regret in paying for a year in advance for Claude Pro, having previously paid monthly for over two years. They indicated a willingness to request a refund if Claude Code ceases to function, reflecting concerns about the value proposition of the service without this feature.
-
Claude Code removed from Anthropic’s Pro plan (Activity: 990): The image depicts a comparison chart for different subscription plans of a service called Claude, highlighting that the ‘Claude Code’ feature has been removed from the Pro plan and is now only available in the higher-tier Max 5x and Max 20x plans. This change has not been officially announced by Anthropic, but was discovered through a Hacker News post and discussed in the r/ClaudeCode subreddit. The removal of this feature from the Pro plan suggests a strategic shift, possibly to encourage users to upgrade to more expensive plans. Additionally, a tweet suggests this change might have been a test, adding to the uncertainty around the decision. Commenters express concern about the lack of communication from Anthropic and the potential impact on users who paid for the Pro plan expecting the ‘Claude Code’ feature. There is also a sentiment that this move might push users towards competitors like Codex.
2. GPT-Image-2 and ChatGPT Image Model Developments
-
Gpt image 2 has the biggest jump in quality ever recorded (Activity: 1395): The image showcases a leaderboard from the ‘Text-to-Image Arena,’ highlighting the performance of various AI models in generating images from text prompts. The standout model, ‘gpt-image-2’ by OpenAI, achieves a score of
1512, marking a significant leap in quality compared to competitors like Google and Microsoft AI. This score is based on over4.8 millionvotes, indicating a broad consensus on its superior performance. The leaderboard is current as of April 19, 2026, underscoring the model’s cutting-edge capabilities in text rendering and photorealism. Commenters express surprise at the model’s capabilities, particularly in text rendering and photorealism, comparing it to the ‘o1 reasoning model of AI images.’ There is also discussion about different model versions, such as ‘medium’ and ‘instant,’ and speculation about a ‘high’ version in the API.- FateOfMuffins highlights that the new model offers different quality levels, such as ‘medium’ and ‘instant’, suggesting a tiered approach to image generation. This implies that users can choose between speed and quality, with potential for a ‘high’ quality option via API, indicating a flexible model architecture that can cater to various user needs.
- Thatunkownuser2465 and GoodDayToCome discuss the model’s advancements in text rendering and photorealism, noting its ability to create detailed and accurate infographics. They emphasize that previous models couldn’t match this level of detail, suggesting significant improvements in both the model’s understanding of layout and its ability to maintain stylistic coherence across complex images.
- Kinu4U mentions the use of ‘extended thinking’ in prompts, which may refer to a more sophisticated processing technique that allows the model to generate hyper-realistic images based on user preferences. This could indicate an advancement in how the model interprets and executes creative tasks, potentially leading to more personalized and high-quality outputs.
-
GPT-Image-2 now reviews its own output and iterates until it is satisfied with the correctness of its output. (Activity: 658): The image titled “The Great Counting Adventure” is a whimsical map generated by GPT-Image-2, showcasing its new capability to self-review and iterate on its outputs until achieving satisfactory correctness. This process took approximately 11 minutes, indicating a significant computational cost due to multiple internal iterations aimed at improving design clarity and accuracy. This feature, while enhancing output quality, raises concerns about its practicality in workflows requiring rapid iterations, such as UI mocks or storyboards, due to time and cost constraints. Commenters express concern over the practicality of the self-review loop, noting that the 11-minute generation time per image could be prohibitive for workflows needing quick iterations. There is interest in whether the iteration count will be adjustable to balance quality and efficiency.
- Worried-Squirrel2023 highlights a significant concern regarding the processing time and cost of GPT-Image-2’s self-review loop, noting that it takes ‘11 minutes per image’ and involves ‘5-10 internal iterations’. This could make it impractical for workflows requiring rapid iteration, such as UI mocks or storyboards, though it might be suitable for high-quality ‘hero shots’. The commenter suggests the possibility of a user-controlled ‘iteration count’ to manage these factors.
- Jaxraged comments on the aesthetic aspect of GPT-Image-2, noting that it retains a ‘sepia filter look’. This suggests that despite the technical advancements in self-review and iteration, the model’s output still maintains a certain stylistic consistency, which may or may not be desirable depending on the use case.
- TopTippityTop points out a specific issue with GPT-Image-2’s output accuracy, mentioning that it failed to correctly render the numbers ‘15 and 39’. This highlights a potential limitation in the model’s ability to accurately generate detailed numerical information, which could be critical for applications requiring precise data representation.
-
GPT Image 2 is amazing! (Activity: 794): The image described in the post is non-technical and appears to be a meme or a casual depiction of a streaming setup, emphasizing a cozy and relaxed atmosphere with elements like a neon sign and gaming chair. The comments do not provide any technical insights or discussions related to the image, focusing instead on humorous or casual remarks about the content. The comments reflect a humorous take on the image, with one user joking about its potential as a ‘goonerbait generator’ and another remarking on the progress made, likely in reference to streaming setups or technology.
-
Introducing ChatGPT Images 2.0 (Activity: 929): OpenAI has released ChatGPT Images 2.0, which significantly enhances image generation capabilities by improving precision and control. This version introduces support for multilingual text rendering and offers a range of visual styles, such as editorial, surreal, and photorealistic imagery, demonstrating its versatility in content creation. The update aims to provide more nuanced and diverse image outputs, catering to a broader range of user needs. For further details, refer to the OpenAI announcement. Users are experimenting with the new capabilities, noting both the system’s limitations in generating certain types of content and its impressive ability to create complex, realistic designs, such as a practical mobile suit. The discussions highlight the balance between creative freedom and content moderation in AI-generated imagery.
- Zandrio raises a critical point about the strategic release and subsequent throttling of AI models. Companies often release powerful models initially to generate hype and user engagement, but may later reduce capabilities to manage operational costs. This pattern suggests the importance of evaluating model performance and capabilities over time, particularly through benchmarks conducted 6 months post-release to assess any degradation or throttling effects.
- birdomike expresses interest in comparing ChatGPT Images 2.0 against other models like Nano Banana Pro and NB2. This highlights the competitive landscape in AI image generation, where performance metrics and feature comparisons are crucial for understanding relative strengths and weaknesses. Such comparisons often involve detailed benchmarks and real-world application tests to determine practical utility and efficiency.
-
Wow, GPT Image 2 is superb! (Activity: 56): The post discusses the release of GPT Image 2, highlighting its impressive capabilities. However, the technical details, such as model architecture, training data, or specific benchmarks, are not provided in the post. The image linked in the comments suggests a user interface, but no further technical insights are available from the image itself. One comment humorously suggests a reluctance to engage with complex user interfaces, indicating a potential user experience issue with the tool’s design.
-
GPT IMAGE 2 is superb (Activity: 563): The image is a creative output generated by GPT IMAGE 2, showcasing its ability to produce a fashion-editorial style collage based on a detailed prompt. The prompt specifies a freeform arrangement of eight distinct summer outfits on a consistent model, emphasizing the model’s height and maintaining visual scale across all figures. The image demonstrates the model’s capability to adhere to complex layout instructions, such as arranging figures in a balanced two-row layout and adding handwritten labels for clothing items, without using grids or borders. This highlights the model’s potential in generating visually appealing and contextually accurate fashion content.
- The comment by ‘flatacthe’ highlights the improved text rendering capabilities of GPT Image 2, noting that it handles text much better than previous versions. The user points out that specifying the style in prompts can enhance consistency across multiple figures, suggesting that smart prompting plays a significant role in achieving high-quality outputs.
3. Google TPU 8th Generation and AI Studio Limitations
-
Google introduces TPU 8t and TPU 8i (Activity: 550): The image provides a detailed comparison between Google’s Ironwood (2025) and the newly announced TPU 8i (2026), highlighting significant advancements in hardware specifications. The TPU 8i features a larger pod size, increased FP8 EFLOPS per pod, enhanced total HBM capacity per pod, and improved bidirectional scale-up bandwidth, indicating substantial performance improvements over its predecessor. These enhancements are part of Google’s strategy to advance supercomputing capabilities with the TPU 8i, which is custom-engineered for efficiency and scalability in the next generation of computing. Commenters note the impressive specifications of the TPU 8i, suggesting it poses a competitive challenge to NVIDIA as hyperscalers develop their own silicon solutions. The numbers are perceived as ‘insane,’ indicating a significant leap in performance.
- Worried-Squirrel2023 highlights a significant shift in the AI hardware landscape, noting that NVIDIA faces increased competition as major cloud providers develop their own silicon solutions. This trend suggests a diversification in AI hardware sources, potentially impacting NVIDIA’s market dominance.
- WhyLifeIs4 shares a link to a technical deep dive on Google’s new TPU models, which could provide detailed insights into their architecture, performance metrics, and potential use cases, offering valuable information for those interested in the technical specifics of these new processors.
-
Google’s 8th Generation TPU Released What is your take on this? (Activity: 85): Google’s 8th Generation TPU, labeled as “TPU 8t,” is highlighted for its remarkable computational power, boasting
121 exaflopsand nativeFP4compute capabilities. This advancement suggests a significant leap in processing power, which is expected to greatly enhance machine learning and AI applications. The image showcases the hardware’s design, featuring a green circuit board with multiple components and heat sinks, indicating a focus on efficient thermal management and high-performance computing. One comment humorously suggests that while many may not fully understand the technical details, they will still have opinions on it. Another comment highlights a common issue in tech hardware: the mismatch between supply and demand.- The 8th Generation TPU from Google is designed to enhance the performance of quantized models, as indicated by its focus on FP4 computation. This suggests a significant improvement in efficiency for running models that have been optimized through quantization, which is a technique used to reduce the computational load and increase the speed of machine learning models.
- The release of Google’s 8th Generation TPU highlights the ongoing issue of supply meeting demand in the tech industry. Despite advancements in hardware capabilities, there remains a challenge in ensuring that these high-performance components are readily available to meet the needs of developers and researchers.
- Google’s new TPU generation addresses the company’s previous compute constraints, which were unexpected by some industry observers. This development is likely to alleviate some of the computational bottlenecks that Google has faced, potentially accelerating their AI and machine learning projects.
-
Google AI Studio Madness (Activity: 102): The post criticizes Google AI Studio’s quota limitations, particularly for the
3.1 Pro model, which reportedly exhausts its quota after just15 messageseven with grounding turned off. The user claims the service’s promise of6,250 prompts a dayis misleading, leading to their decision to cancel the subscription. Comments highlight that the quota appears to be the same across Pro, Ultra, and Free tiers, limiting users to10-15 prompts. Additionally, the1 million token context sizeis criticized for its inability to maintain context over10 prompts.- vladislavkochergin01 highlights a significant limitation in Google AI Studio’s current offering, noting that the quota for Pro, Ultra, and Free users is now identical, allowing only
10-15 prompts. This change could impact users who rely on higher-tier plans for more extensive usage, potentially affecting productivity and workflow. - PsyckoSama points out a technical limitation regarding the context size of Google AI Studio, which is
1 million tokens. Despite this seemingly large capacity, the system struggles to maintain context over10 prompts, indicating potential inefficiencies in memory management or prompt handling that could hinder complex task execution.
- vladislavkochergin01 highlights a significant limitation in Google AI Studio’s current offering, noting that the quota for Pro, Ultra, and Free users is now identical, allowing only
-
Gemini 3.1 Pro limits in AI Studio are now exactly the same for Pro and Free users (Activity: 109): Google’s Gemini 3.1 Pro in AI Studio has implemented rate limits that are identical to the Free tier, restricting users after
8-12 prompts. This change has led to confusion and frustration among users who expected higher limits with the Pro version. Some users report that the issue seems intermittent, suggesting potential bugs or inconsistencies in the implementation. Users express dissatisfaction with Google’s handling of the rate limits, with some noting that the issue affects both Gemini 2.5 and 3.1 versions. There is a sentiment that the Pro tier should offer more value, and the current situation is seen as a failure to meet expectations.
AI Discords
Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.