All tags
Model: "holo-3.1"
Microsoft Build: MAI-Thinking-1 and MAI Family models, Surface RTX Spark Dev Box, and OpenClaw in Windows
mai-thinking-1 mai-code-1-flash holo-3.1 qwen-35b sonnet-4.6 claude-code codex microsoft openrouter fal baseten hcompany_ai teksedge nous-research teknim cognition windsurf perplexity-ai mixture-of-experts context-windows benchmarking reinforcement-learning prompt-optimization agentic-ai local-inference model-family-expansion model-reporting agent-native-devices software-development model-optimization hybrid-inference desktop-agents model-quantization mustafasuleyman eliebakouch hannahajishirzi asadovsky bj2rn lateinteraction lakshyaaagrawal theturingpost kimmonismus yusuf_i_mehdi pierceboggan lukehoban nielsrogge russelljkaplan
Microsoft introduced MAI-Thinking-1, a 35B parameter MoE model with 256K context, achieving 97% on AIME 2025 and outperforming Sonnet 4.6 in human preference tests. The broader 7-model MAI family spans reasoning, code, image, speech, and voice, with third-party availability on OpenRouter, fal, and Baseten. The detailed 109-page technical report revealed insights on scaling, MFU, RL/post-training, and data curation, highlighting no third-party distillation and advanced prompt optimization techniques. Microsoft emphasized agent-native devices and local inference with projects like Project Solara / Scout and the Surface RTX Spark Dev Box, alongside software innovations such as the Copilot desktop app and MAI-Code-1-Flash integration. Meanwhile, local-first computer-use agents like Holo 3.1 (Qwen-based, 0.8B to 35B parameters) support laptops and small workstations with optimized formats and strong benchmark results. Desktop shells for agents, including Hermes Desktop, Devin Desktop, and agent-neutral approaches compatible with Devin, Claude Code, and Codex, are proliferating, with hybrid local/cloud execution becoming the default architecture as seen in Perplexity Computer's hybrid agentic inference.