a quiet day.

AI News for 5/20/2026-5/21/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Model, Benchmark, and Research Updates: RAEv2, Gated DeltaNet-2, Data Filtering, and Open Math

RAEv2 and representation-first tokenization: Several researchers highlighted RAEv2 as a meaningful follow-on to Representation Autoencoders for unified vision understanding and generation. @1jaskiratsingh says the update yields >10x faster convergence, better reconstruction, and better generation, with tests extending to text-to-image and world models. A Chinese summary from @recatm usefully extracts the three main findings: summing the last K encoder layers instead of only the final layer improves both reconstruction and generation without added inference cost; RAE and REPA are complementary across semantics vs. spatial structure; and REPA can be reformulated as an internal self-guidance mechanism, avoiding extra weak-model guidance passes. @sainingxie also points to new evaluation views beyond FID, arguing there is still underexplored headroom in representation-powered pixel decoders.
Alternatives to standard attention and tokenizer assumptions: NVIDIA’s Gated DeltaNet-2 decouples erase and write operations in linear attention with channel-wise gates, outperforming KDA and Mamba-3 at 1.3B parameters on language modeling and commonsense reasoning, with notable long-context retrieval gains on RULER; @rasbt called it one of the more interesting hybrid-attention directions. On tokenization, @NousResearch released a controlled study of why subword tokenization helps, simulating seven hypothesized benefits inside a 1.7B byte-level pipeline; only three of seven interventions moved validation loss at that scale. Separately, @tatsu_hashimoto reported a surprising scaling result on DCLM: with enough compute, the best data filter may be no filter, with projections suggesting the crossover for internet-scale pools lands around 1e30 FLOPs; downstream evals appear noisy but directionally consistent (follow-up).
Mechanistic interpretability and geometry: @GoodfireAI argues the dominant “models think in curved manifolds, SAEs use straight-line features” critique is only partly right. Their proposed fix is to cluster SAE features by joint firing patterns, recovering geometry through feature groups rather than isolated atoms (thread continuation, post). This is a useful update to the current SAE discourse: not a rejection of sparse features, but a warning that interpretation should move from single features to structured ensembles.
Math as an AI research domain: The biggest scientific discussion centered on OpenAI’s reported result on an Erdős unit-distance problem. @markchen90 framed it as evidence that mathematics is currently the domain most amenable to AI-assisted research breakthroughs, while @wtgowers noted that if the reported low human interaction level holds, the result is genuinely interesting. The discourse was immediately shaped by skepticism and benchmark/gameability concerns, with @memecrashes joking that the result was “outdated not even 3 hours later by a human,” and @cloneofsimo pointing out the predictable “goalpost moving” around what counts as legitimate AI mathematics. The interesting technical meta-point is that math continues to function as a relatively legible frontier for AI co-research because outputs can be checked, debated, and extended.

Agents, Harnesses, and Developer Tooling: Codex, Gemini, Devin, and Agent Infrastructure

Harnesses are still a major source of capability gains: @lvwerra released physics-intern, a science-problem harness that boosts models like Gemini 3.1 Pro from 17.7 to 31.4, surpassing GPT 5.5 Pro in that setup. The notable nuance is that GPT 5.5 Pro itself did not benefit from the harness, suggesting model-specific absorption of scaffolding tricks. In the same spirit, @KLieret made mini-swe-agent runnable on ProgramBench, explicitly aiming to improve harness innovation around software engineering agents.
Agent design patterns are maturing from “single agent first” to explicit subagent orchestration: @cwolferesearch gives a practical synthesis: start with single-agent systems, and only move to manager/sub-agent or decentralized multi-agent topologies when tool sprawl or prompt bloat becomes unmanageable. That advice lines up with more operational observations from users of subagents: @andrew_locke describes Cognition’s sub-Devin workflow as a step change, compressing what previously looked like 2+ engineer-weeks into a couple of hours.
Codex shipped a substantial product layer on top of the model: OpenAI’s “Codex Thursday” updates matter less as standalone features than as signs of where coding agents are going. @OpenAIDevs launched Appshots, which capture both screenshot and text from Mac app windows for richer working context; they also added team plugin sharing (link) and more detailed org analytics (link). The more important systems shift is remote computer use: @OpenAIDevs says Codex can now securely use apps on your Mac from your phone even when the Mac is locked. This is a strong signal that the agent product surface is moving from chat IDEs to persistent cross-device operator workflows.
Gemini’s agent/tool story is broadening quickly: @OfficialLoganK highlighted that Gemini 3.5 Flash ranks #1 on APEX-Agents-AA, outperforming larger models. On the applied side, @_philschmid shows a GitHub issue triage agent built with a single Gemini API call and no orchestration framework, while @skalskip92 demonstrates Gemini 3.5 Flash replacing a custom vision pipeline for lane/car reasoning with one multimodal API call. Google also expanded action surfaces: Daily Brief (announcement) and connected-app actions with OpenTable, Canva, and Instacart (announcement) are essentially consumer-facing agent workflows.
Developer infra is converging around retrieval, streaming, sandboxes, and security boundaries: Weaviate shipped a built-in MCP server inside the database so coding agents can ingest a repo and use hybrid BM25 + vector retrieval without extra processes (announcement). LangChain introduced both a sandbox Auth Proxy for controlling agent-world boundaries (announcement) and a new typed streaming protocol for rendering tools, subagents, media, and interrupts as first-class projections rather than token streams (overview). vLLM’s Elastic Expert Parallelism is also notable systems work: @vllm_project describes live resizing of MoE DP/EP topology without full restarts, using direct GPU-to-GPU transfers over NVLink/RDMA—important not just for scaling but for future fault-tolerant serving.

Infrastructure, Compute, and AI Business Signals: Modal, Turbopuffer, Hark, and the Compute Race

The infra layer had one of its clearest “this is where the money is” days: @Sirupsen said turbopuffer crossed $100M run-rate in March, just 19 months after $1M, while being profitable and raising < $1M. The company’s positioning is straightforward and timely: frontier teams know “the magic happens with AI when it draws in just the right context,” which turns a lot of product differentiation into a search/retrieval problem (follow-up). That aligns with broader sentiment from @swyx that “boring” AI infrastructure, not only glamorous frontier research, is where wealth creation is accruing.
Modal raised big and continues to look like a core AI cloud winner: @bernhardsson announced a $355M Series C at a $4.65B valuation. Investors and users emphasized the same thesis: rebuilding the cloud stack for AI workloads from the ground up, with strong performance and developer experience (Redpoint, user endorsement). This sits alongside other signals that agent-native compute is emerging as its own category; @latentspacepod summarized Daytona’s pitch around 60ms sandboxes, 50K startups in 75 seconds, and RL/evals workloads now representing roughly half of usage.
Compute remains the strategic bottleneck, and the market appears tiered: @AymericRoucher sketched a useful compute taxonomy: US leaders (OpenAI, Anthropic, Google, with Meta/xAI joining) in the multi-gigawatt class; Chinese giants scaling from hundreds of MW toward multi-GW, increasingly on domestic stacks; and European contenders such as Mistral at around 90 MW today aiming for 1 GW by 2029. The exact numbers are debatable, but the framing is consistent with @EpochAIResearch, which notes that even if OpenAI kicked off the recent compute buildout, frontier labs still use well under all global compute capacity, leaving open the question of how much further the buildout can accelerate. Component economics also continue to shift toward memory: @EpochAIResearch reports HBM grew from 52% to 63% of total AI chip component spending from Q1 2024 to Q4 2025.
Capital is flowing to interface/hardware bets as well as infra: @adcock_brett announced Hark raised $700M at a $6B valuation, aimed at GPU infrastructure, future model development, hardware, and multimodal/personal intelligence products. The details are sparse beyond hiring areas—foundation models, infra, speech, computer-use agents, hardware—but the size of the raise shows investor appetite for vertically integrated AI-device bets. Hark also reported a 200-hour uninterrupted autonomous run for F.03 (announcement), though without enough technical detail yet to evaluate the underlying robotics stack.

Multimodal, Video, Biology, and Robotics: Runway, Carbon, Earth Models, and Open Humanoids

Video editing and generation are getting more compositional: Runway launched Aleph 2.0 and the new Edit Studio, letting users edit a single frame and propagate that edit through the rest of the video (Runway, product lead). This is a practical productization of the “reference-guided edit propagation” problem that multimodal builders care about. Separately, Alibaba researchers’ MIGA was flagged by @HuggingPapers as a train-free method for infinite-frame video generation with a two-stage alignment mechanism for temporal consistency. On the open-source avatar side, Meituan released LongCat-Video-Avatar 1.5 with Whisper-Large replacing Wav2Vec2, 8-step inference, long-video identity consistency, and broader stylized-domain generalization (announcement).
Foundation models for biology and Earth observation continue to become more usable: Hugging Face Bio’s Carbon DNA model family got follow-on demos and infra validation. @LoubnaBenAllal1 highlighted applications in sequence design, variant effect prediction, and learned representations, while @Shekswess showed Carbon-500M, 3B, and 8B compiling and running on a single Trainium2 trn2.3xlarge with NxD Inference on day one. For geospatial modeling, @cgeorgiaw reported OlmoEarth v1.1 is 3x cheaper/faster by changing the tokenization of multi-resolution Sentinel-2 inputs into 3x fewer tokens, exploiting the quadratic compute savings.
Open robotics is getting more buildable: Hugging Face’s LeRobot Humanoid drew attention as a genuinely full-stack open release rather than a showcase demo. @robotsdigest and @lukas_m_ziegler both emphasize the same package: roughly $2.5k, 3D-printed, complete hardware/CAD, calibration/runtime, simulation, identification tools, and training pipelines. The key point is not just affordability; it’s repairability and iteration speed for real robot learning workflows.

Top tweets (by engagement)

OpenAI / Codex product expansion: Codex can securely use apps on your Mac from your phone, even when the Mac is locked, plus Appshots for richer app context.
Infrastructure winners: turbopuffer at $100M run-rate, profitable, < $1M raised; Modal raises $355M Series C at $4.65B; Hark raises $700M at $6B.
Research discussions with broad technical resonance: OpenAI’s Erdős-related math result discussion; RAEv2 release; “no filter” scaling result for LM data curation.
Agent capability trendlines: Gemini 3.5 Flash tops APEX-Agents-AA; Gemma 4 E4B driving an iOS simulator on-device via Argent; Devin for Windows.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Qwen 3.7 Max Benchmarks and 27B Watch

Qwen will release another 27B with high probability (Activity: 1613): The image is a screenshot of an X/Twitter exchange where xiong-hui / Barry Chen says he is “waiting for the exact roadmap” but believes Qwen will likely release another 27B model, noting that producing another 27B is “not hard for them now.” In context of the title and linked post, this is not an official announcement but a roadmap hint/rumor around a possible successor to the perceived “miracle model” Qwen 3.6 27B. Commenters mostly discuss deployment practicality: some users with 16GB VRAM prefer a 35B MoE / A3B-style model because it can be more accessible via hybrid CPU/GPU inference than a dense 27B at high quantization. Others speculate about larger MoE variants like a hypothetical Qwen 3.7 122B-A10B.
- Several commenters focused on VRAM-constrained local inference, arguing that a 27B dense model is difficult to run at a “decent quant” on 16GB GPUs, while a hypothetical Qwen 35B MoE / A3B-style model could remain accessible via lower active parameter count or hybrid CPU/GPU inference. The discussion frames Qwen’s previous small-active-parameter MoE designs as important for users with basic gaming laptops or limited VRAM.
- One user requested larger dense Qwen models in the 50B–80B range, noting that the current 27B is already fast enough with MTP that they would trade inference speed for more parameters and potentially higher capability. Another floated a speculative Qwen 3.7 122B-A10B MoE-style target, suggesting interest in large total-parameter models with relatively low active parameters per token.
Qwen3.7 Max scored by Artificial Analysis, 27B/35B waiting room (Activity: 614): Qwen3.7 Max has appeared on Artificial Analysis rankings in 5th place, reportedly roughly tied with GPT-5.4 xhigh and slightly ahead of Gemini 3.5 Flash. The post highlights that Qwen3.6 27B is 6 points behind its Max counterpart, raising expectations that upcoming Qwen3.7 27B/35B variants could land near the larger Max model’s performance. Commenters are mainly waiting for open-weight releases and view Qwen’s competitiveness with frontier labs as promising, though there is frustration that the Max model is not open source. One technical concern is whether Qwen has fixed its reported “overthinking” behavior.
- Commenters are waiting to see whether Qwen3.7 produces open-weight 27B/35B variants, but one technical speculation is that there may be no separate 27B release: Qwen 3.7 could be a private 390B MoE-style model with A30B active parameters, analogous to a larger closed deployment rather than a small open checkpoint.
- Several comments focus on whether Qwen3.7 Max is an actual architectural upgrade over Qwen 3.5/3.6 or primarily another finetune. The technical interest is whether Alibaba improved the underlying model design or simply extracted more benchmark performance from the existing architecture.
- One recurring concern is whether the Qwen team fixed the model’s “overthinking” behavior—likely referring to excessive reasoning verbosity or unnecessary chain-of-thought-style deliberation that can hurt latency, cost, and user experience despite improving some benchmark scores.
Waiting for Qwen 3.7 open weight… The new King has arrived… (Activity: 577): The image is a benchmark marketing grid for Qwen3.7-Max, linked to the Qwen3.7 blog, showing it leading many tasks such as Terminal-Bench 2.0, SWE-bench Pro, MCP-Atlas, HLE, Apex, IFBench, and SuperGPQA versus Qwen3.6-Plus, DS-V4-Pro Max, GLM-5.1, Kimi K2.6, and Claude Opus-4.6 Max. The technical significance is that the chart positions Qwen3.7-Max as a frontier closed/API model competitor to Opus-class systems, while commenters are specifically hoping for an open-weight MoE release such as 3.7-122B-A17B with 512k context or a 397B A17B variant in low-bit formats like MXFP4/NVFP4. Commenters are skeptical that Qwen3.7-Max itself will be released open-weight, noting that “Qwen has never open-weighted the Max series.” Others are enthusiastic about a possible large open MoE release, framing it as potentially “Opus at home” for users with high-end multi-GPU setups.
- Several commenters caution that the rumored model is likely a Qwen Max-class release, and historically Qwen has not released Max-series models as open weights. One user specifically warns not to extrapolate Max benchmark results to smaller open models such as a hypothetical 27B, since the capability gap could be substantial.
- Hardware-focused speculation centered on a possible Qwen 3.7-122B-A17B with MTP, MXFP4 quantization, and 512k context, which commenters suggest could be attractive for local inference on AMD Strix Halo-class systems. Another commenter hopes for a 397B-A17B release, noting that the prior Qwen 3.5 NVFP4 variant reportedly fits on 4x RTX 6000 Pro GPUs with enough memory headroom for roughly 10 concurrent sessions at 200k tokens.
- There is skepticism that Alibaba/Qwen will release their strongest local models because doing so may undercut hosted-model monetization. One commenter references Qwen’s April shift away from “disruption” toward frontier-model competition and monetization, implying that highly capable open-weight releases may become less likely even if benchmark results look strong.

2. Qwen 3.6 35B MTP Quantization Performance

110 tok/s with 12GB VRAM on Qwen3.6 35B A3B and ik_llama.cpp (Activity: 455): The post benchmarks Qwen3.6-35B-A3B-MTP using byteshape’s IQ4_XS 4.19 bpw GGUF on an RTX 4070 Super 12GB + Ryzen 7 9700X with 131072 context, q8_0 KV cache, MTP draft-max=3, and draft-p-min=0.75. Switching from llama.cpp to ik_llama.cpp increased the reported mean from 89.76 tok/s to 110.24 tok/s (+23%), despite a lower aggregate MTP accept rate in the updated results (0.9393 → 0.8749), suggesting backend/offload efficiency rather than acceptance rate alone. The author notes that using the GPU headless/secondary maximizes usable VRAM, and recommends --fit --fit-margin 1664 for ik_llama.cpp, increasing to 1792/2048 on OOM. Commenters asked for the exact llama.cpp command and noted that several MTP-related llama.cpp PRs had landed recently, so results may be version-sensitive. A technical workaround for CachyOS/KDE Wayland users without an iGPU was shared: launch Plasma with software rendering via LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=llvmpipe, reducing idle VRAM from >1024 MB to about 126 MB at the cost of slow/disabled compositor effects.
- A CachyOS/KDE Wayland user shared a VRAM-saving workaround for single-GPU systems: create a custom SDDM session that launches Plasma with LIBGL_ALWAYS_SOFTWARE=1, GALLIUM_DRIVER=llvmpipe, and KWIN_COMPOSE=Q, forcing KDE compositor rendering onto the CPU. They report idle VRAM dropping from >1024 MB on normal KDE Wayland to ~126 MB in the CPU-rendered session, freeing nearly 1 GB of VRAM for model inference at the cost of very slow/disabled animations.
- Several commenters focused on the benchmark methodology, asking for the exact llama.cpp command and noting that MTP-related PRs had landed in upstream llama.cpp within the previous 24 hours, which could materially affect comparisons. One technical hypothesis was that ik_llama.cpp achieves the reported speedup via a much higher speculative/MTP acceptance rate: never below 0.790 in ik_llama.cpp versus as low as 0.477 in llama.cpp, prompting questions about whether the settings were equivalent.
- There was technical interest in the memory/quality tradeoff of IQ4_XS, described as likely the lowest-memory Q4 quantization option for this setup. A commenter asked how much intelligence degradation it causes and requested the final VRAM/RAM split, which is especially relevant for running Qwen3.6 35B A3B on only 12 GB VRAM.
Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs (Activity: 364): The image is a technical benchmark plot, not a meme: an RTX 4090 performance-vs-quality bubble scatter comparing ByteShape Qwen 3.6 35B GGUF NTP/MTP quantizations against Unsloth, Bartowski, Mudler, and AesSedai by average TPS, accuracy, and BPW. In the context of the post, it illustrates the main finding that for NTP, “pick the largest quant that fits” can be competitive, while MTP can improve GPU generation throughput by roughly 20–40% but increases memory pressure and is not recommended for CPU use. Comments were mostly positive and practical: one CPU-hybrid user confirmed seeing severe MTP slowdowns, aligning with ByteShape’s CPU findings, and asked whether higher-quality Q6 GGUF releases are planned.
- A CPU-hybrid user reported “incredible slowdowns” when using MTP with Qwen3.6-35B, matching the post’s findings that MTP may regress on mixed CPU/GPU setups. They also asked whether Q6 GGUF quantizations would be released, noting they avoid going below Q6 for this model.
- One commenter questioned the methodology around NTP, assuming it refers to llama.cpp’s --spec-type ngram-mod, and noted that mainline llama.cpp can apparently run ngram speculative decoding and MTP simultaneously via --spec-type ngram-mod,draft-mtp. They suggested the comparison may not be a strict NTP-vs-MTP either/or, citing parameters like --spec-ngram-mod-n-match 24, --spec-ngram-mod-n-min 12, --spec-ngram-mod-n-max 48, and --spec-draft-n-max 3.
- A benchmark of Qwen3.6-35B-A3B-IQ4_XS-4.19bpw MTP on an RTX 4070 Super 12GB using ik_llama.cpp reported 110.24 tok/s average, about 20 tok/s faster than Qwen3.6-35B-A3B-UD-IQ4_XS MTP. The run used mtp-bench.py with aggregate_accept_rate=0.8749, total_predicted=1592, total_draft=1127, and total_draft_accepted=986; the commenter highlighted --fit, --fit-margin 1664, --multi-token-prediction, --draft-p-min 0.75, and --draft-max 3 as key tuning knobs.

3. Open-Weight Release and Takedown Tensions

Heretic has been served a legal notice by Meta, Inc. (Activity: 2124): The Heretic Free Software Project says it received an email legal notice from a provider representing Meta Platforms, Inc. and has removed model-weight repositories containing derivatives of Meta’s Llama models. The post frames the takedown as compliance while announcing infrastructure diversification via an official Codeberg mirror at codeberg.org/p-e-w/heretic and planned “technological measures” to preserve access to Heretic-created models without relying on a single hosting provider. Commenters largely criticized Meta’s enforcement as hypocritical given allegations around copyrighted training data, and mocked the post’s jab that Llama trails 168 models from 23 competitors on LM Arena. The discussion is mostly political/legal reaction rather than technical debate.
- Commenters highlighted the quoted LM Arena framing that Meta’s Llama family is outside the very top tier, described as “trailing only 168 other models from 23 competitors” among the top 200 language models. The technical takeaway is that the legal dispute over naming is being contrasted with perceived stagnation in Meta’s model releases and leaderboard competitiveness.
Re. what ever happened to Cohere’s Command-A series of models? (Activity: 669): Cohere announced Command A+, its first Mixture-of-Experts (MoE) open-weights model, positioned as an efficient successor/continuation of the Command series with emphasis on low latency and responsiveness rather than only top-line benchmark dominance; details are in Cohere’s launch post. The model is released under Apache 2.0, with Cohere claiming substantial quantization work enabling practical deployment on 1–2 GPUs, aimed at agentic/enterprise workloads and smaller developer teams. Commenters were broadly positive, citing the original Command R+ as unusually strong for its time—especially for creative work and enterprise-style planning—and welcomed Cohere’s return as beneficial model-ecosystem diversity. The main technical ask from the community was immediate availability of GGUF quantized builds for local inference.
- A commenter questioned the release’s competitiveness due to the lack of standard benchmark results and missing comparisons against current same-size-class models, specifically naming MiniMax M2.7 and Mimo V2.5 as perceived SOTA baselines. They noted that relying on a referenced Artificial Analysis benchmark image (https://preview.redd.it/vjex3axl8d2h1.png?width=1224&format=png&auto=webp&s=08e9c90188bf9b42d4f049991624b4e180cf566d) may not be enough to drive adoption if quality is not clearly competitive.
- Several users asked about deployment accessibility, including whether GGUF quantized builds would be available and whether Cohere plans to release smaller Command-family models comparable to the older command-r7b that can run on consumer GPUs. The technical concern is local inference viability rather than API-only or enterprise-scale deployment.
- One commenter highlighted the original Command R+ as unusually strong for its time in creative workflows and enterprise resource-planning tasks, implying that users are evaluating the new Command-A line against that prior model’s practical long-context/enterprise utility rather than only general chatbot benchmarks.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Claude Code Workflows and Anthropic Training

I’m a software engineer with a decade of experience. I vibe code all of my side projects from my phone using Claude Code and don’t read any of the code. It’s so fun. Here are the rules I follow: (Activity: 1900): The post proposes a risk-managed “vibe coding” workflow for using Claude Code on side projects without directly reading generated code: start in plan mode, iteratively inspect/clarify the plan, keep tasks small enough to mentally model, require agent-generated test cases, commit to git after each completed plan, back up databases before agent DB access, and use browser/E2E tooling such as Chrome DevTools MCP for live validation. For complex changes, the author suggests parallel review agents for plan critique, security review, and test audit, then switching to auto mode only after the plan/test/rollback structure is in place. Top comments generally endorse the workflow as a comparatively sane agentic-coding pattern, especially the rule that “if the plan is too big to fit in your head, it’s too big.” Commenters recommend making the process repeatable with the superpowers skillset and keeping agent scope narrowly bounded: one change, one expected test, and one rollback point, including explicit non-goals in the prompt.
- Several commenters emphasized constraining agent work into small, verifiable scopes: “one change, one expected test, one rollback point,” with prompts explicitly stating what the agent should not touch. The technical rationale was that smaller plans reduce debugging complexity and make failures easier to isolate when using Claude Code or similar coding agents.
- One commenter recommended turning the workflow into a repeatable scripted process using the superpowers skillset, a GitHub project intended to provide reusable agent workflows/skills. This was framed as a way to make “vibe coding” less ad hoc once projects move beyond single-prompt generation into iterative development.
Anthropic officially launched 13+ FREE AI courses with certificates (Including Agentic AI and Claude Code!) (Activity: 1585): Anthropic has a free official training catalog (accessible via anthropic.com/learn / Anthropic Skilljar) with certificates, covering MCP / agentic AI, Claude Code, Claude API usage, and enterprise deployment paths for Amazon Bedrock and Google Cloud Vertex AI. The technical highlights called out are the Model Context Protocol (MCP) courses, including advanced material on STDIO and StreamableHTTP transports, plus Claude Code workflows such as codebase editing, test execution, and “Plan Mode.” A related free CodeSignal partnership track, Developing Claude Agents, reportedly provides Python/TypeScript agent-building labs and certificates. Commenters largely confirmed the courses are legitimate Anthropic-provided material, with one noting the Skilljar link is surfaced from Anthropic’s own learn page. A user who completed 10/15 courses specifically recommended the MCP and advanced MCP modules as *“worth the squeeze.”
- A commenter who completed 10/15 courses specifically highlighted the MCP and MCP Advanced Topics courses as the most technically valuable, citing coverage of STDIO and StreamableHTTP transport protocols as especially worthwhile for developers working with Claude/tool integrations.
- Another commenter verified the courses are legitimate Anthropic training material, noting the Skilljar course links originate from Anthropic’s official learning portal at anthropic.com/learn.
Claude is telling users to go to sleep mid-session and nobody, including Anthropic, seems to fully understand why it keeps doing it (Activity: 1360): Claude has reportedly been interrupting users mid-session with sleep/rest recommendations; the cited article says explanations like wellbeing nudging or compute-saving throttling are unlikely because Claude allegedly lacks session-usage context. Anthropic had not responded to Fortune, but Anthropic staffer Sam McAllister described the behavior on X as a “Bit of a character tic” and said they are “aware of this and hoping to fix it in future models.” Comment discussion is mostly speculative: users debate whether the behavior is an emergent persona/safety-tuning artifact versus an intentional product feature, while the article frames it as an unresolved model-behavior bug rather than policy.
- A quoted excerpt argues the sleep prompts are unlikely to be an intentional wellbeing or compute-throttling feature because Claude is not given context about a user’s usage duration. Anthropic staffer Sam McAllister reportedly described the behavior on X as a “Bit of a character tic” and said they are “aware of this and hoping to fix it in future models,” implying it is treated as a model-behavior/alignment artifact rather than a product-level session-management policy.

2. AI’s Workforce and Infrastructure Backlash

It’s 2026, and we are yet to see an anti-almond farm protest. (Activity: 2679): The image is a contextual line chart arguing that CONUS almond farms consume vastly more water than data centers, with almonds rising from roughly 550 to nearly 1,600 billion gallons/year from 1999–2026 while data centers remain near the x-axis with only modest growth. In context of the title—“It’s 2026, and we are yet to see an anti-almond farm protest”—the chart is less a technical benchmark than a critique of public attention around AI/data-center water use; image: qy67jhsop82h1.png. Comments push back that anti-almond criticism already exists, especially in California water-policy debates and documentaries, and one commenter adds that golf courses may consume multiple times more water than data centers.
- Several commenters frame almond farming as part of a broader California water-allocation debate, noting that almond orchards are frequently blamed during recurring drought and water-shortage controversies. The technical comparison being raised is not just “almonds vs. data centers,” but agriculture versus other large water users such as dairy, golf courses, and compute infrastructure.
- One commenter argues that U.S. golf courses consume multiple times more water than data centers, suggesting that public criticism of AI/data-center water use may be disproportionate relative to other recreational or agricultural uses. Another points out that anti-almond criticism already exists in documentaries and California environmental discourse, especially around irrigation demand and drought resilience.
Mark Zuckerberg’s Meta kicks off major bloodbath with 8,000 layoffs (about 10% of its workforce) as AI roils tech giant (Activity: 1533): The post claims Meta is conducting a global layoff of roughly 8,000 employees (~10% of workforce) in three waves, with notifications sent by email at 4 a.m. local time and Singapore employees reportedly first. The framing ties the cuts to AI-driven restructuring, while commenters question Meta’s AI capex needs—e.g., “what requires $200B for AI at Meta?”—and broader headcount efficiency at a company still employing tens of thousands. Top comments dispute the term “roils,” arguing layoffs are not disruption but an intended benefit of AI adoption and something companies may increasingly present positively to investors. Others view recurring Meta layoffs as routine and question why the company needs such a large workforce at all.
- Commenters questioned whether the layoffs are actually AI-driven versus a correction from the ZIRP-era hiring surge: one noted Meta headcount remains above 2020 levels, suggesting the cuts may reflect post-overhiring normalization rather than direct automation impact.
- A technical/strategic concern raised was Meta’s reported $200B AI spend: commenters asked what infrastructure or product roadmap could justify that scale, implicitly pointing to massive compute, data-center, and model-training capex rather than ordinary software staffing needs.
- Several comments framed AI adoption as an ongoing operating-model shift, with one predicting recurring 10–20% annual headcount reductions across large organizations as AI tooling substitutes for portions of white-collar and engineering labor.

AI Discords

Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.

May 21
not much happened today

Companies

Models

Topics

People