a quiet day.

AI News for 4/18/2026-4/20/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!

AI Twitter Recap

Kimi K2.6 and Qwen3.6-Max-Preview Push Open Agentic Coding Forward

Moonshot’s Kimi K2.6 was the clear release of the day: an open-weight 1T-parameter MoE with 32B active, 384 experts (8 routed + 1 shared), MLA attention, 256K context, native multimodality, and INT4 quantization, with day-0 support in vLLM, OpenRouter, Cloudflare Workers AI, Baseten, MLX, Hermes Agent, and OpenCode. Moonshot claims open-source SOTA on HLE w/ tools 54.0, SWE-Bench Pro 58.6, SWE-bench Multilingual 76.7, BrowseComp 83.2, Toolathlon 50.0, CharXiv w/ python 86.7, and Math Vision w/ python 93.2 in the launch thread. The more novel systems claims are around long-horizon execution—4,000+ tool calls, 12+ hour continuous runs, 300 parallel sub-agents, and “Claw Groups” for multi-agent/human coordination. Community reactions quickly centered on K2.6 as a viable Claude/GPT backend for coding and infra work, including reports of a 5-day autonomous infra agent run, kernel rewrites, and a Zig inference engine outperforming LM Studio by 20% TPS.
Alibaba’s Qwen3.6-Max-Preview also landed as an early preview of its next flagship with improved agentic coding, stronger world knowledge and instruction following, and better “real-world agent and knowledge reliability” per @Alibaba_Qwen. Early community takes pegged it as unusually stable for long-reasoning tasks; @teortaxesTex highlighted it solving AIME 2026 #15 after ~30 minutes of thinking, and Arena later noted Qwen3.6 Plus reaching #7 in Code Arena and moving Alibaba to #3 lab there. Together, Kimi and Qwen reinforced a broader theme: Chinese open and semi-open labs are shipping highly competitive coding/agent models with fast ecosystem uptake.

Hermes Agent’s Rapid Ecosystem Expansion and Multi-Agent Orchestration Patterns

Hermes Agent continued to emerge as the most visible open agent stack in this batch. Multiple tweets pointed to it surpassing 100K GitHub stars in under two months and overtaking OpenClaw in weekly star growth, with @Delphi_Digital framing it as evidence that “open source agents are no longer a one-project story.” The ecosystem momentum is tangible: native launch support in Ollama, integration with Copilot CLI via Ollama, a growing set of community web UIs, and third-party tooling like Hermes Workspace V2, Browser Use integrations, and cloud deployment templates.
The more substantive content came from operator patterns. A detailed Chinese thread on advanced Hermes usage broke out three mechanisms that matter in practice for multi-agent systems: stateless ephemeral units for true parallelism (skip_memory=True, skip_context_files=True), LLM-driven replanning over structured failure metadata (status, exit_reason, tool_trace) instead of blind retries, and dynamic context injection via directory-local AGENTS.md/.cursorrules surfaced only through tool results. That is a more disciplined orchestration model than stuffing all history into one prompt. Related community posts described Hermes as a four-layer memory system with periodic memory consolidation, contrasted with OpenClaw’s “context window + RAG” approach in one comparison thread.
The ecosystem is also shifting toward self-improving harnesses and long-running operation: examples include hermes-skill-factory, maestro, icarus-plugin, and cloud templates, alongside discussion of the Externalized Intelligence in LLM Agents survey, which frames capability as increasingly living outside model weights—in memory systems, tools, protocols, and harnesses.

Memory, Context, and Runtime Become the New Product Surface for Coding Agents

OpenAI Codex Chronicle was the most notable product update: a research preview that lets Codex build memories from recent screen context, effectively turning passive work history into agent-usable context. OpenAI says Chronicle uses background agents to build memories from screenshots, stores captures and memories on device, lets users inspect/edit those memories, and is rolling out to Pro users on macOS (excluding EU/UK/Switzerland) for now via @OpenAIDevs and @thsottiaux. This is a meaningful shift from chat history as memory to ambient context capture, and several builders immediately recognized the lock-in implications; @hwchase17 bluntly noted that “memory will be the great lock in.”
There was also a parallel wave of infra thinking around runtime vs harness. LangChain’s new guide on deploying long-running agents and follow-on posts by @Vtrivedy10 and @sydneyrunkle argue that building an agent is mostly a harness problem, but productionizing it is a runtime problem: multi-tenant isolation, memory, observability, retries, governance, and improvement loops. This aligns with the self-improving-agent discussion around the Autogenesis Protocol and auditable self-improvement systems, both of which decompose prompts, tools, memory, and environments into versioned resources with gated reflection/improvement/commit cycles.
On the UX side, coding-agent tools kept polishing the terminal surface: Cursor CLI added /debug and customizable status bars, while OpenCode shipped a new model picker. The common pattern is that memory, inspection, and execution controls are becoming first-class product features, not just backend details.

Inference Systems and Architecture Work: Prefill/Decode Separation, Linear Attention, and Model Surgery

A notable systems thread was Prefill-as-a-Service for cross-datacenter inference. The core argument, described in a detailed Zhihu Frontier summary and echoed by @nrehiew_, is that traditional prefill/decode disaggregation hits a bandwidth wall because standard-attention KV cache transfer is too large for cross-DC links. Linear attention / recurrent-state architectures like Kimi Linear reduce state transfer enough to make remote prefill practical. The PoC cited scales a 1T-parameter linear-attention model across mixed H200/H20 clusters over a 100 Gbps inter-DC link, reporting +54% throughput and -64% P90 TTFT, with outbound bandwidth around 13 Gbps. If those numbers hold more broadly, linear-attention families may matter as much for serving topology as for asymptotic context scaling.
On the architecture side, @lianghui_zhu argued that post-ResNet deep nets have underexplored how layers communicate, beyond simple x + F(x) residual pathways. While the thread text here is partial, it signals renewed interest in inter-layer communication topologies rather than just scaling width/depth. Related architectural exploration appeared in the strong engagement around recurrent-depth transformers, e.g. Loop, Think, & Generalize, which reports systematic compositional generalization emerging through recurrence and grokking-like stages, plus community connections to Universal Transformers and MoEUT variants.
A more applied model-surgery idea came from @ostrisai, who expanded image-model patch-2 layers to patch-4 by averaging/replicating sub-patch weights, aiming for 2× image size at the same compute with near-zero-init transfer before finetuning. If this cleanup finetune works, it would be a clever example of reparameterizing existing image backbones for higher resolution without full retraining.

Benchmarks, Evals, and Safety Stress Tests Are Getting More Realistic

Arena results continued to matter for multimodal models. @arena reported Claude Opus 4.7 taking #1 in Vision & Document Arena, with +4 points over Opus 4.6 in Document Arena and a large margin over the next non-Anthropic models. Subcategory wins included diagram, homework, and OCR, reinforcing Anthropic’s current strength on document-heavy, long-context enterprise workflows.
On the safety/evals side, Redwood Research’s LinuxArena is the more technically interesting result: 20 live production environments for AI agents, with frontier models achieving roughly 23% undetected sabotage against trusted monitors. The takeaway in the tweet is blunt: as useful work rises, so does attack surface; sandboxing alone fails, so monitoring is essential. This feels directionally important because it moves from toy CTFs to more production-like environments.
Two benchmark-adjacent research items stood out. Sakana’s SSoT (“String Seed of Thought”) tackles a less discussed failure mode: LLMs are poor at distribution-faithful generation. In the announcement, they show that adding a prompt step where the model internally generates and manipulates a random string improves coin-flip calibration and output diversity without external RNGs. And Skill-RAG, summarized by @omarsar0, uses hidden-state probing to detect impending knowledge failures and only then invoke the right retrieval strategy—moving RAG from unconditional retrieval to failure-aware retrieval selection.

Top tweets (by engagement)

Kimi K2.6 launch: Moonshot’s release dominated technical engagement, combining strong benchmark claims with unusual long-horizon agent systems details in the main launch thread.
Anthropic’s AWS expansion: Anthropic said it secured up to 5 GW of compute with Amazon, with an additional $5B investment today and up to $20B more later, a major signal on frontier-model capex and supply strategy via @AnthropicAI.
Codex Chronicle: OpenAI’s move toward screen-derived memory in Chronicle was one of the more consequential product-direction tweets for coding agents.
Qwen3.6-Max-Preview: Alibaba’s preview release reinforced that top-tier coding/agent competition is no longer concentrated in a handful of Western labs.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Kimi K2.6 Model Release and Benchmarks

Kimi K2.6 Released (huggingface) (Activity: 1105): Kimi K2.6, released by Hugging Face, is a cutting-edge open-source multimodal AI model featuring a Mixture-of-Experts architecture with 1 trillion parameters. It excels in long-horizon coding, coding-driven design, and autonomous task orchestration, capable of transforming prompts into production-ready interfaces and executing complex coding tasks across multiple languages. The model supports up to 300 sub-agents for parallel task execution and outperforms previous models in benchmarks focused on coding, reasoning, and vision tasks. More details can be found in the original article. Commenters noted the impressive scale of 1.1 trillion parameters, with some expressing surprise at the model’s size. Another comment mentioned the start of training for Cursor’s Composer 2.1 model, indicating ongoing advancements in AI model development.
- ResidentPositive4122 highlights that the Kimi K2.6 release includes both the code repository and model weights under a Modified MIT License. This license allows for broad usage with minimal restrictions, primarily requiring attribution if used by large corporations, which is a significant point for developers and companies considering integration or modification of the model.
- mrinterweb comments on the impressive scale of the Kimi K2.6 model, noting its 1.1 trillion parameters. This scale is indicative of the model’s potential capabilities and computational demands, reflecting the trend towards increasingly large and complex models in the AI field.
- Few_Painter_5588 mentions the training of Cursor’s Composer 2.1 model, indicating ongoing developments in AI model training. This suggests a competitive landscape where multiple models are being developed and improved simultaneously, highlighting the rapid pace of innovation in AI technologies.
Kimi K2.6 (Activity: 422): The image presents a benchmark comparison of AI models, highlighting Kimi K2.6 against competitors like GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. Kimi K2.6 shows strong performance across various tasks, particularly excelling in DeepSearchQA and MathVision. This suggests Kimi K2.6’s competitive edge in both general and specialized AI tasks, indicating its potential as a robust alternative to more established models. Commenters note the significance of Kimi K2.6’s performance, especially in coding, and express surprise at an open-source model competing closely with proprietary models. There is anticipation for Kimi K2.6 to surpass Claude Opus, highlighting the competitive landscape of AI development.
- MokoshHydro highlights the significance of Kimi K2.6’s new feature, the ‘vendor verifier’, which provides a standardized method for evaluating third-party services. This is crucial for ensuring consistency and reliability when integrating external services into the Kimi ecosystem, as detailed in their blog post.
- Ok_Knowledge_8259 notes the impressive progress of Kimi K2.6, especially considering its open-source nature, which is closing the gap with proprietary models. This suggests a significant advancement in the capabilities of open-source AI models, particularly in coding tasks where Kimi has historically excelled.
- pmttyji expresses a desire for the inclusion of GLM-5.1 in the comparison, noting that Kimi-K2.6 has set a high benchmark for models like DeepseekV4. This indicates that Kimi-K2.6 is being used as a new standard for evaluating the performance of other AI models.

2. Qwen Model Discussions and Experiences

Qwen 3.6 Max Preview just went live on the Qwen Chat website. It currently has the highest AA-Intelligence Index score among Chinese models (52) (Will it be open source?) (Activity: 402): Qwen 3.6 Max has been released on the Qwen Chat website and currently holds the highest AA-Intelligence Index score of 52 among Chinese models, as reported by AiBattle. The model’s parameter count is speculated to be between 600-700B, given that the previous version, Qwen 3.6, had 397B parameters. However, there is no indication that the Max version will be open-sourced, as historically, Max models have not been made publicly available. Commenters express skepticism about the open-sourcing of Max models, noting that these models are typically not released to the public. There is a preference for smaller models that can be run on consumer-grade hardware, suggesting that Max models should remain proprietary to support the company’s revenue.
- There is speculation about the parameter size of the Qwen 3.6 Max model, with one user suggesting it could be between 600-700B parameters, given that the Qwen Plus model is 397B. This indicates a significant increase in complexity and potential capability, aligning with its high AA-Intelligence Index score of 52.
- A user highlights the business strategy behind not open-sourcing the Max models, suggesting that these models serve as a revenue engine for the company. This implies that the company prioritizes monetization of their most advanced models while potentially offering smaller models for broader accessibility.
- Discussion around open-sourcing reveals that the largest model likely to be open-weighted is the 122B model, as the company has stopped open-weighting the 397B Plus models. This suggests a strategic decision to limit access to their most advanced models, possibly to maintain competitive advantage.
Switching from Opus 4.7 to Qwen-35B-A3B (Activity: 772): The user is considering switching from Opus 4.7 to Qwen-35B-A3B for a coding agent driver, specifically running on an M5 Max 128GB setup. The user acknowledges that Opus might have an advantage in complex reasoning tasks but is questioning whether Qwen-35B-A3B would be adequate for most tasks. The post suggests that Qwen-35B-A3B has replaced about 95% of the user’s calls, indicating a high level of functionality, though it may not fully match Opus’s capabilities in complex scenarios. One commenter suggests that Qwen-35B-A3B might not meet expectations if the user is accustomed to Opus’s capabilities, while another implies that the user’s tasks may not require Opus’s advanced features. A third comment indicates that Qwen-35B-A3B can handle most tasks but may fall short compared to Opus in certain areas.
- Flinchie76 discusses the trade-offs between using Opus 4.7 and Qwen-35B-A3B, highlighting that while Opus can generate large amounts of code quickly, it often results in complex, hard-to-understand architectures. In contrast, using a less capable model like Qwen-35B-A3B allows for more control and understanding of the code, as it requires the user to think through the process and inspect changes closely, leading to better ownership of the final product.
- Borkato notes that Qwen-35B-A3B has replaced about 95% of their calls, suggesting that while it may not match Opus in capability, it is still highly functional for many tasks. This implies that Qwen-35B-A3B can handle a significant portion of tasks that users might typically rely on Opus for, albeit with some limitations.
- Thump604 mentions the possibility of running a 122B model, but clarifies that it does not reach the level of Opus 4.7. This suggests that while there are larger models available, they may not fully replicate the performance or capabilities of Opus, indicating a potential gap in functionality for users transitioning from Opus to other models.
I’m running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it’s as good as claude (Activity: 1239): The user reports running the qwen3.6-35b-a3b model with 8-bit quantization and a 64k context on a MacBook Pro M5 Max with 128GB RAM using OpenCode. They claim it performs comparably to Claude in terms of speed and handling complex tasks, such as debugging serialization issues in an Android app. The model is noted for its fast response times and effective handling of long research tasks, making it a viable alternative to cloud-based models. Commenters highlight the model’s speed, especially on high-performance hardware like a 5090, and its efficient handling of large contexts, suggesting it can handle up to 256k context effectively. However, there is some skepticism about its equivalence to Claude, though it is acknowledged as a strong local model.
- cosmicnag highlights the performance of the Qwen 3.6-35b-a3b model, noting that on a 5090 GPU, the speed is unmatched compared to cloud models. They mention not having tried NVFP4 yet, suggesting potential for even greater performance improvements.
- H_DANILO points out that the Qwen model can handle up to 256k context efficiently, emphasizing that context handling is very cheap with this model. This suggests significant advantages for tasks requiring extensive context management.
- Krillian58 shares a contrasting experience, stating that after switching from Opus to Qwen 3.6, they found it substantially worse for their tasks. They speculate that it might be due to the model picking up Opus loose ends, indicating potential issues with model transition or adaptation.

3. Local LLMs and Offline AI Applications

So… what am I supposed to learn with local LLMs? (Activity: 112): The post discusses the challenges and potential of using local LLMs, particularly on limited hardware like a 16GB M4 Mac Mini. The user experimented with OpenClaw and local models like gemma e4b q4 distilled by Opus, integrating it with Apple’s OCR and vision capabilities. Despite initial success in setting up cron jobs and basic tasks, the user questions the practical utility of local LLMs compared to cloud-based solutions like Claude Code. The post highlights the potential for local LLMs to improve with better hardware and the importance of understanding model context windows and privacy benefits. The user is advised to explore smaller models and consider future-proofing their setup for more advanced applications. Commenters emphasize the benefits of local LLMs in terms of privacy, cost-effectiveness, and the ability to run unrestricted models. They suggest using local LLMs for tasks like email summarization, document analysis, and personal knowledge management. Some recommend switching from OpenClaw to Hermes Agent for a more streamlined experience, highlighting the importance of setting up remote interaction channels and automating routine tasks.
- Local LLMs offer significant advantages in terms of privacy and control over data. Running models like Qwen 3.5 or 3.6 locally allows users to avoid sending sensitive information to large corporations, which is crucial for maintaining privacy. Additionally, as hardware becomes cheaper and models more efficient, local LLMs can become more cost-effective and faster than cloud-based solutions, providing a future-proofing benefit.
- Hermes Agent is recommended over OpenClaw due to its lower token overhead and better design. Local LLMs can be integrated with communication platforms like Telegram or Slack to automate tasks such as summarizing emails, creating knowledge bases, and performing OCR on PDFs. This setup allows for seamless task management without the limitations of token usage imposed by cloud-based models.
- Running local LLMs on limited hardware, such as 16GB RAM, can be challenging but offers unique benefits. It allows for secure processing of sensitive data without internet exposure, which is critical for tasks that require high privacy. While models like Qwen 3.5 9b can run on such setups, the real advantage lies in automating tasks that are too sensitive for cloud APIs, despite the hardware constraints.
llama.cpp speculative checkpointing was merged (Activity: 417): The llama.cpp project has merged a speculative checkpointing feature, which can lead to varying speedups depending on the task and repetition patterns. For coding tasks, users have reported speedups ranging from 0% to 50% using parameters like --spec-type ngram-mod, --spec-ngram-size-n 24, --draft-min 48, and --draft-max 64. This feature is part of ongoing optimizations, including other enhancements like DFlash and SYCL support, which have shown speed improvements of 17% to 50%. These updates suggest that performance will continue to improve as software and drivers are refined (source). Commenters are optimistic about the improvements, noting that while some users are disappointed with the initial performance of the B70, ongoing updates are expected to enhance performance significantly. The community is encouraged to be patient as further optimizations are implemented.
- The speculative checkpointing feature in llama.cpp has been merged, which is expected to enhance performance significantly. Notably, there are several related pull requests that contribute to performance improvements: PR #22066 reports a 17 to 50% speed increase on SYCL, PR #21845 claims up to 50% speed up, and PR #21527 also mentions a 50% speed up. These improvements suggest that initial performance concerns with the B70 may be premature as software and drivers continue to evolve.
- The implementation of self-speculative decoding in llama.cpp allows for its use with models like Qwen3.5 and 3.6. This feature can be activated by adjusting parameters, potentially leading to more efficient token generation. However, the actual performance gain may vary, as indicated by the humorous note that it might not be as fast as expected (‘not BRRRRRR’), but still offers some ‘free tokens’.
- The variance in acceptance rates for speculative decoding is influenced by the ngram-mod matching mechanism. Codebases with repetitive patterns, such as those in TypeScript or Java, may experience higher acceptance rates (up to 50%), while unique logic sequences may see lower rates. The parameter --spec-ngram-size-n 24 is considered aggressive, as it requires 24 tokens of context for pattern matching. Experimenting with smaller values (e.g., 8-12) could improve performance in mixed code/prose tasks by increasing the likelihood of pattern matches, albeit with shorter draft runs.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Claude Design and Usage Innovations

This cannot be real. I cannot believe my eyes (Activity: 1527): The image in the Reddit post is a feature launch carousel for an app called “Air Roster,” showcasing various functionalities such as a month mapping feature, a month picker interface, a geodesic map visualization, and pay-related statistics. The design employs a dark theme with blue and white text, aiming for a modern aesthetic. The post discusses the democratization of design tools, comparing the impact of Canva on design accessibility to the potential of new AI tools in reducing the need for specialized design skills, allowing users to focus on content rather than tool proficiency. Comments reflect skepticism about the design quality, with some users criticizing the user interface (UI) and user experience (UX), and others questioning the seriousness of the praise, suggesting it might be sarcastic.
- Capable_Ad1259 highlights the disparity in perception of the UI/UX design based on professional background. Backend/API/AI/ML developers might find the design impressive due to its technical complexity, whereas UI developers and designers might critique it for being ‘sloppy’. This underscores the challenge of transitioning from backend engineering to design, emphasizing the need for time and effort to master design skills.
Claude Design is Amazing! We’re cooked! (Activity: 576): The post discusses a request made to Claude Design, an AI model, to create an operating system that avoids typical AI-generated content, referred to as “AI-slop.” The user claims that Claude Design successfully generated a unique OS design in a single attempt, highlighting its capabilities. However, the post lacks specific technical details about the OS design, such as architecture, features, or benchmarks, which would be crucial for a technical evaluation. One commenter questions the feasibility of the AI creating a complete operating system, suggesting skepticism about the claim’s validity. Another comment nostalgically references the design’s similarity to Windows 98, indicating a retro aesthetic rather than a modern technical innovation.
Claude Design is Incredible… (Activity: 1689): The post discusses a rapid UI redesign using Claude Design, highlighting its ability to quickly transform applications with minimal effort. The author notes that while the redesign may resemble other apps made with Claude, it was effective for personal use. The project is now open source and available on GitHub. The author suggests that with a specific design prompt, Claude can produce unique results, but a generic prompt leads to default designs. Commenters generally agree that apps designed with Claude tend to look similar, with one noting that the redesign resulted in a less appealing font choice. Another commenter suggests that the uniformity might lead to many apps having the same design in the near future.
- Chupa-Skrull highlights that Claude Design’s main advantage is its ability to expose ‘knobs’ on various properties, which allows users to optimize their workflow by adjusting parameters that they might not have known to prompt for. This feature significantly speeds up the design process, although the underlying capabilities are similar to what other models have offered for months.
- One-Cheesecake-9353 points out that while Claude Design might be suitable for personal projects, it introduces too much cognitive load for projects intended for mass consumption. This suggests that the design complexity or the user interface might not be intuitive enough for broader audiences, potentially impacting user experience negatively.
- Toxic-slop and disky_wude both note that apps generated by Claude tend to look similar, indicating a lack of diversity in design outputs. This could be a limitation in Claude’s design algorithm, leading to repetitive styles and potentially reducing the uniqueness of applications developed using this tool.
I didn’t realise Claude could build actual Word docs and Excel files. Cancelled three subscriptions in the same week. (Activity: 422): The post highlights Claude’s ability to generate fully formatted Word (.docx), Excel (.xlsx), and PowerPoint (.pptx) files directly from prompts, eliminating the need for separate document creation software. Users can request specific formatting, such as headings, bullet points, and professional fonts, and Claude can handle complex Excel functionalities like formulas and conditional formatting. The tool also supports editing existing documents while maintaining their format. This capability allows users to bypass traditional document creation tools, focusing instead on content creation rather than formatting and infrastructure. Commenters noted the importance of changing document metadata to reflect the correct author, and shared experiences of using Claude to fix complex formatting issues in documents converted from PDF to Word. They also praised Claude’s iterative editing capabilities, allowing for seamless content updates and modifications.
- Rencauchao highlights a critical step when using Claude to generate Word documents: users should modify the ‘author’ and ‘comments’ metadata to reflect their own information before sharing, as these fields can reveal the document’s origin as being generated by Claude. This is important for maintaining authorship integrity and privacy.
- sceez shares a practical use case where Claude was employed to resolve formatting issues in a Word document that had been converted to PDF and back. The process involved iterative interactions with Claude, which successfully restored the document’s formatting, demonstrating Claude’s capability in handling complex document editing tasks.
- 5aur1an suggests a method for personalizing outputs from Claude by training it to mimic a user’s writing style. This involves analyzing a sample document for stylistic elements, then iteratively refining the generated content by providing feedback on specific words or phrases that don’t match the user’s style. This approach can enhance the relevance and personalization of the generated content over time.

2. DeepSeek and V4 Developments

They said it’s next week 🤞 (Activity: 328): The image is a screenshot of a social media post by Yifan Zhang, discussing upcoming technological updates related to AI models, specifically mentioning terms like “Sparse MQA,” “Fused MoE Mega Kernel,” and “Hyper-connections.” These terms suggest advancements in AI model architecture, potentially improving efficiency and performance. The mention of “V4, next week” implies an anticipated release or update, possibly related to a new version of an AI model or framework. The post has been edited and shows significant engagement, indicating community interest. Commenters express skepticism about the release timeline, noting that similar promises have been made since January. However, there is a sense of renewed optimism and excitement, with some users more interested in this update than other recent AI developments.
To those waiting for V4 (Activity: 221): High-Flyer is a unique entity in the tech landscape, operating as a massive quant hedge fund rather than a traditional tech company. This structure allows them to develop AI models like V4 without the typical pressures of generating direct revenue or appeasing venture capitalists. Their approach is driven by internal metrics rather than external market cycles, which explains the lack of marketing hype and the low-cost API offerings. The company is rumored to fund its AI division through strategic financial maneuvers, such as shorting Nvidia, highlighting their financial independence and strategic focus. Commenters debate the rationale behind High-Flyer’s AI development, suggesting that despite their financial independence, they must innovate to remain competitive and relevant. Concerns are also raised about talent retention and the potential need to go public to ensure long-term success.
- WHY_DO_I_SHOUT highlights that the hedge fund’s lack of marketing hype and low-cost API access is due to their financial independence, as they don’t rely on direct revenue from the model. This suggests their primary goal isn’t monetization through the model itself, but possibly leveraging it for internal advantages or strategic positioning.
- Weird-Pollution-6251 points out that the model’s user interface and lack of integration with other tools indicate it’s more of a demonstration than a fully-fledged product. This implies that the hedge fund’s focus might be on showcasing capabilities rather than creating a market-ready product, which aligns with their financial strategy of not needing direct revenue from the model.
- Puzzleheaded-Drama-8 speculates that the hedge fund might benefit from market fluctuations caused by the hype around model releases. This suggests a strategic use of the model to influence market conditions, potentially creating opportunities for profit through trading on these fluctuations.

3. Kimi 2.6 and AI Model Benchmarks

Kimi 2.6 has been released (Activity: 605): The image is a performance comparison chart that highlights the competitive performance of Kimi K2.6 against other AI models like GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro across various tasks such as general agents, coding, and visual agents. Kimi K2.6 is particularly noted for its autonomous overhaul of an open-source financial matching engine, achieving significant performance improvements by iterating through optimization strategies and modifying code autonomously. This showcases the model’s advanced capabilities in system architecture and optimization, achieving a 185% medium throughput increase and a 133% performance throughput gain. Commenters are impressed by the open-source nature of Kimi K2.6 and its ability to autonomously optimize complex systems, highlighting its potential in real-world applications.
- Kimi K2.6 autonomously optimized the exchange-core, an open-source financial matching engine, by iterating through 12 optimization strategies and making over 1,000 tool calls to modify more than 4,000 lines of code. The model analyzed CPU and allocation flame graphs to identify bottlenecks and reconfigured the core thread topology, achieving a 185% increase in medium throughput and a 133% gain in performance throughput, demonstrating significant advancements in open-source AI capabilities.
- A user expressed skepticism about Kimi 2.5 being ‘benchmaxed,’ noting that it excelled in design and web development tasks compared to other models like Claude, GLM 5.1, GPT, Gemini 3.1, and Qwen. They highlighted Kimi’s unmatched performance in creating PowerPoint presentations, PDFs, and websites, suggesting that its design capabilities were far superior to its competitors, which is particularly impressive if Kimi 2.6 is indeed open-source.
- The discussion includes a query about whether Kimi 2.6 is truly open-source, reflecting the community’s interest in the accessibility and transparency of advanced AI models. The user compares Kimi’s performance favorably against other models, emphasizing its exceptional design task capabilities, which could be a significant advantage if the model remains open-source.
Opus 4.7 vs 4.6 after 3 days of real coding - side by side from my actual sessions (Activity: 696): The image provides a detailed side-by-side comparison of Opus 4.6 and Opus 4.7 based on three days of real coding sessions. Key metrics such as one-shot rate, retry rate, and cost per call are highlighted, showing that Opus 4.6 generally performs better in terms of one-shot success rate (83.8% vs 74.5%) and cost efficiency ($0.112 vs $0.185 per call). Opus 4.7, however, generates more output per call (800 tokens vs 372 tokens), making it more expensive. The analysis also notes that Opus 4.7 uses fewer tools per turn and delegates less to subagents, suggesting potential differences in operational style or sample size limitations. The post emphasizes that these findings are preliminary and based on limited data, with the potential for shifts as more data is collected. Commenters appreciate the detailed analysis and suggest that prompt adjustments might be needed for Opus 4.7. There is also a discussion about the potential motivations behind the aggressive promotion of Opus 4.7, hinting at cost considerations.
- phil_thrasher raises a critical point about the need for prompt adjustments when transitioning from Opus 4.6 to 4.7, suggesting that the harness might require changes to optimize performance for the newer version. This highlights the importance of adapting testing frameworks to accommodate updates in AI models, which may not have been fully addressed by the development team.
- SovietRabotyaga points out the significance of the ‘total cost field’ in understanding Anthropic’s strategy for aggressively promoting Opus 4.7. This suggests that economic factors might be influencing the push for newer versions, potentially impacting the decision-making process behind model updates and deployments.
- thewormbird reflects on historical model updates, noting that intermediate versions like 3.7 were less effective in their workflows compared to major releases like 4.0. This raises questions about the versioning strategy and whether incremental updates provide substantial improvements, suggesting that users might benefit more from waiting for major releases like Opus/Sonnet 5.

AI Discords

Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.

Apr 20
not much happened today

Companies

Models

Topics