All tags
Company: "xiaomi"
not much happened today
gpt-5.5 gpt-5.4 opus-4.7 mimo-v2.5-pro mimo-v2.5 kimi-k2.6 codex copilot openai microsoft google amazon github xiaomi openai-devs vllm_project kimi-moonshot model-distribution cloud-computing benchmarking usage-based-billing model-orchestration open-source large-context-models agent-scaling coding model-training fp8 attention-mechanisms multi-agent-systems sama scaling01 kimmonismus ajassy simonw htihle arena gdb hangsiin eliebakouch _luofuli teortaxestex
OpenAI loosens its Azure exclusivity, allowing distribution across Google TPU, AWS Trainium, and Bedrock with commitments through 2032 and revenue share through 2030. GPT-5.5 shows improved benchmarks but is not uniformly dominant, ranking variably across coding, document, math, and vision tasks. GitHub's Copilot shifts to usage-based billing starting June 1, reflecting increased runtime costs. OpenAI open-sourced Symphony, an orchestration layer for issue tracking and Codex agents. Xiaomi released MiMo-V2.5 and MiMo-V2.5-Pro, large context models with up to 1M-token context and trillions of tokens trained, emphasizing complex agent and omni-modal capabilities. Kimi K2.6 leads OpenRouter's leaderboard, noted for coding and long-horizon agent capabilities with large-scale sub-agent coordination.
DeepSeek v4
deepseek-v4 deepseek-v4-pro deepseek-v4-flash kimi-k2.6 glm-5.1 xiaomi-mimo-v2.5-pro gpt-5.5 gpt-5.5-pro deepseek nvidia openai lambdaapi togethercompute xiaomi long-context mixture-of-experts model-quantization memory-optimization hardware-model-co-design inference-speed agent-integration token-efficiency model-deployment open-weights reasoning hallucination-detection scaling01 ben_burtenshaw artificialanlys
DeepSeek-V4 technical release features a 1.6T-parameter MoE with 49B active parameters and 1M-token context, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the #2 open-weights reasoning model behind Kimi K2.6 but has a high hallucination rate and higher serving costs. Hardware-model co-design is emphasized, with NVIDIA Blackwell Ultra delivering 150+ TPS/user and support for FP4 and FP8 quantization enabling deployment on single nodes. Positioning among open Chinese models is competitive with GLM-5.1 and Xiaomi MiMo V2.5 Pro. Meanwhile, OpenAI launched GPT-5.5 and GPT-5.5 Pro APIs with a 1M context window, focusing on improved long-running workflows and token efficiency, quickly integrated into tools like GitHub Copilot and Cursor. "GPT-5.5 handles complex, tool-heavy, ambiguous workflows with fewer retries," highlighting rapid distribution and agent integration.
not much happened today
qwen3.6-27b qwen3.5-397b-a17b privacy-filter mimo-v2.5-pro mimo-v2.5 gemini-3.1-pro gemini-3.1-flash-image alibaba openai xiaomi google google-deepmind vllm_project unsloth ggml ollama arena nous-research open-models multimodality vision tokenization pii-detection privacy enterprise-ai agentic-ai benchmarking long-context model-deployment hardware-optimization model-integration software-engineering alibaba_qwen clementdelangue altryne eliebakouch mervenoyann xiaomimo sundarpichai scaling01
Alibaba released Qwen3.6-27B, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench. It supports native vision-language reasoning over images and video, with immediate ecosystem support from vLLM, Unsloth, ggml, and Ollama. OpenAI open-sourced a practical Privacy Filter model for PII detection and masking, a 1.5B parameter token-classification model with a 128k context window aimed at enterprise redaction tasks. Xiaomi announced MiMo-V2.5-Pro and MiMo-V2.5 models, emphasizing software engineering advances, long-horizon agents, and large context windows (up to 1M tokens), with strong benchmark results and integrations with Hermes and Nous. At Google Cloud Next, Google and Google DeepMind unveiled 8th-gen TPUs (TPU 8t for training and TPU 8i for inference) with claims of scaling to a million TPUs in a cluster, and launched the Gemini Enterprise Agent Platform evolving Vertex AI with Agent Studio and access to 200+ models including Gemini 3.1 Pro and Gemini 3.1 Flash Image. This marks a significant vertical integration of hardware, models, and enterprise tooling.
MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
minimax-m2.7 sonnet-4.6 glm-5 mimo-v2-pro mamba-3 qwen-3.5 kimi-k2.5 gpt-5.4-mini minimax xiaomi artificial-analysis ollama trae yupp openrouter vercel zo opencode kilocode cartesia self-evolving-agents reasoning cost-efficiency token-efficiency hybrid-architecture harness-engineering agent-harnesses skills memory-optimization architecture feedback-loops api inference execution-environment
MiniMax M2.7 is the headline model release, described as a "self-evolving agent" with strong performance metrics including 56.22% on SWE-Pro, 57.0% on Terminal Bench 2, and parity with Sonnet 4.6. It features recursive self-improvement in skills, memory, and architecture. Artificial Analysis places M2.7 on the cost/performance frontier with an Intelligence Index score of 50, matching GLM-5 (Reasoning) but at a fraction of the cost. Distribution is available via platforms like Ollama cloud and OpenRouter. Xiaomi’s MiMo-V2-Pro is noted as a serious Chinese API-only reasoning model with a score of 49 on the Intelligence Index and favorable token efficiency. Cartesia’s Mamba-3 is highlighted as an SSM optimized for inference-heavy use, with early reactions focusing on hybrid transformer architectures like Qwen3.5 and Kimi Linear. The report emphasizes a shift from prompting to harness engineering, where the execution environment and agent harnesses, including skills and MCP, are becoming key differentiators in AI system design. This includes discussions on tools, repo legibility, constraints, and feedback loops, with mentions of DSPy and GPT-5.4 mini as important components in this evolving landscape.
not much happened today
glm-4.7 mimo-v2-flash z-image-turbo kling-2.6-motion-control zhipu-ai xiaomi google langchain huggingface openrouter artificial-analysis vllm-project coding complex-reasoning tool-use mixture-of-experts cost-efficiency open-weight-models text-to-image video-models memory-persistence agent-frameworks interactive-user-interfaces model-deployment mervenoyann eliebakouch omarsar0 osanseviero dair_ai
Zhipu AI's GLM-4.7 release marks a significant improvement in coding, complex reasoning, and tool use, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. Xiaomi's MiMo-V2-Flash is highlighted as a practical, cost-efficient mixture-of-experts model optimized for deployment. The open-weight text-to-image competition sees Z-Image Turbo leading with 6B parameters under Apache-2.0 license. Video model advances focus on control and long-form consistency, exemplified by Kling 2.6 Motion Control and research like MemFlow's adaptive memory retrieval. In agent frameworks, Google's A2UI protocol introduces agent-driven UI generation, while studies reveal that mixing multiple agent frameworks is common, with challenges in logic, termination, and tool interaction. LangChain emphasizes persistent memory patterns for production agents.
OpenAI GPT Image-1.5 claims to beat Nano Banana Pro, #1 across all Arenas, but completely fails Vibe Checks
gpt-image-1.5 nano-banana-pro mimo-v2-flash deepseek-v3.2 openai gemini xiaomi lmsys deepseek openrouter image-generation instruction-following benchmarking model-efficiency long-context multi-token-prediction hybrid-attention model-optimization inference-speed agentic-workflows model-architecture model-quantization fuli_luo eliebakouch
OpenAI released its new image model GPT Image 1.5, featuring precise image editing, better instruction following, improved text and markdown rendering, and faster generation up to 4×. Despite topping multiple leaderboards like LMArena (1277), Design Arena (1344), and AA Arena (1272), user feedback from Twitter, Reddit, and Discord communities is largely negative compared to Nano Banana Pro by Gemini. Xiaomi introduced the MiMo-V2-Flash, a 309B MoE model optimized for inference efficiency with 256K context window, achieving state-of-the-art scores on SWE-Bench. The model uses Hybrid Sliding Window Attention and multi-token prediction, offering significant speedups and efficiency improvements. The timing of OpenAI's launch amid competition from Gemini and Nano Banana Pro affects user sentiment, highlighting challenges in benchmarking relevance.
not much happened today
deepseek-r1-0528 o3 gemini-2.5-pro claude-opus-4 deepseek_ai openai gemini meta-ai-fair anthropic x-ai ollama hugging-face alibaba bytedance xiaomi reasoning reinforcement-learning benchmarking quantization local-inference model-evaluation open-weights transparency post-training agentic-benchmarks long-context hallucination-detection teortaxestex wenfeng danielhanchen awnihannun reach_vb abacaj
DeepSeek R1-0528 release brings major improvements in reasoning, hallucination reduction, JSON output, and function calling, matching or surpassing closed models like OpenAI o3 and Gemini 2.5 Pro on benchmarks such as Artificial Analysis Intelligence Index, LiveBench, and GPQA Diamond. The model ranks #2 globally in open weights intelligence, surpassing Meta AI, Anthropic, and xAI. Open weights and technical transparency have fueled rapid adoption across platforms like Ollama and Hugging Face. Chinese AI labs including DeepSeek, Alibaba, ByteDance, and Xiaomi now match or surpass US labs in model releases and intelligence, driven by open weights strategies. Reinforcement learning post-training is critical for intelligence gains, mirroring trends seen at OpenAI. Optimized quantization techniques (1-bit, 4-bit) and local inference enable efficient experimentation on consumer hardware. New benchmarks like LisanBench test knowledge, planning, memory, and long-context reasoning, with OpenAI o3 and Claude Opus 4 leading. Discussions highlight concerns about benchmark contamination and overemphasis on RL-tuned gains.
not much happened today
phi-4 phi-4-mini-reasoning qwen3-235b qwen3-moe-235b qwen3-moe-30b qwen3-dense-32b qwen3-dense-14b qwen3-dense-8b qwen3-dense-4b qwen3-dense-0.6b qwen2.5-omni-3b deepseek-prover-v2 llama llama-guard-4 prompt-guard-2 mimo-7b microsoft anthropic cursor alibaba togethercompute deepseek meta-ai-fair xiaomi openrouterai cohere reasoning model-fine-tuning model-evaluation benchmarking model-popularity open-source math model-scaling model-filtering jailbreak-prevention cline reach_vb vipulved akhaliq omarsar0 zhs05232838 huajian_xin mervenoyann karpathy random_walker sarahookr blancheminerva clefourrier
Microsoft released Phi-reasoning 4, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. Anthropic introduced remote MCP server support and a 45-minute Research mode in Claude. Cursor published a model popularity list. Alibaba launched Qwen3-235B and other Qwen3 variants, highlighting budget-friendly coding and reasoning capabilities, with availability on Together AI API. Microsoft also released Phi-4-Mini-Reasoning with benchmark performance on AIME 2025 and OmniMath. DeepSeek announced DeepSeek-Prover V2 with state-of-the-art math problem solving, scaling to 671B parameters. Meta AI's Llama models hit 1.2 billion downloads, with new Llama Guard 4 and Prompt Guard 2 for input/output filtering and jailbreak prevention. Xiaomi released the open-source reasoning model MiMo-7B trained on 25 trillion tokens. Discussions on AI model evaluation highlighted issues with the LMArena leaderboard, data access biases favoring proprietary models, and challenges in maintaining fair benchmarking, with suggestions for alternatives like OpenRouterAI rankings. "LMArena slop and biased" and "61.3% of all data going to proprietary model providers" were noted concerns.