All tags
Model: "minimax-m2"
not much happened today
qwen3-max-thinking minimax-m2 claude-3-sonnet llamaindex-light chronos-2 openai aws microsoft nvidia gpu_mode vllm alibaba arena llamaindex amazon anthropic gradio compute-deals gpu-optimization kernel-optimization local-serving reasoning long-context benchmarks long-term-memory time-series-forecasting agent-frameworks oauth-integration developer-tools sama gdb andrewcurran_ a1zhang m_sirovatka omarsar0 _philschmid
OpenAI and AWS announced a strategic partnership involving a $38B compute deal to deploy hundreds of thousands of NVIDIA GB200 and GB300 chips, while Microsoft secured a license to ship NVIDIA GPUs to the UAE with a planned $7.9B datacenter investment. A 3-month NVFP4 kernel optimization competition on Blackwell B200s was launched by NVIDIA and GPU_MODE with prizes including DGX Spark and RTX 50XX GPUs. vLLM gains traction for local LLM serving, exemplified by PewDiePie's adoption. Alibaba previewed the Qwen3-Max-Thinking model hitting 100% on AIME 2025 and HMMT benchmarks, signaling advances in reasoning with tool use. The MIT-licensed MiniMax-M2 230B MoE model topped the Arena WebDev leaderboard, tying with Claude Sonnet 4.5 Thinking 32k. Critiques emerged on OSWorld benchmark stability and task validity. LlamaIndex's LIGHT framework demonstrated significant improvements in long-term memory tasks over raw context and RAG baselines, with gains up to +160.6% in summarization at 10M tokens. Amazon introduced Chronos-2, a time-series foundation model for zero-shot forecasting. The MCP ecosystem expanded with new tools like mcp2py OAuth integration and Gemini Docs MCP server, alongside a build sprint by Anthropic and Gradio offering substantial credits and prizes. "OSWorld doesn’t really exist—different prompt sets = incomparable scores" highlights benchmarking challenges.
not much happened today
kimi-linear kimi-delta-attention minimax-m2 looped-llms aardvark-gpt-5 moonshot-ai minimax bytedance princeton mila openai cursor cognition hkust long-context attention-mechanisms agentic-ai tool-use adaptive-compute coding-agents performance-optimization memory-optimization reinforcement-learning model-architecture kimi_moonshot scaling01 uniartisan omarsar0 aicodeking songlinyang4 iscienceluvr nrehiew_ gdb embeddedsec auchenberg simonw
Moonshot AI released Kimi Linear (KDA) with day-0 infrastructure and strong long-context metrics, achieving up to 75% KV cache reduction and 6x decoding throughput. MiniMax M2 pivoted to full attention for multi-hop reasoning, maintaining strong agentic coding performance with 200k context and ~100 TPS. ByteDance, Princeton, and Mila introduced Looped LLMs showing efficiency gains comparable to larger transformers. OpenAI's Aardvark (GPT-5) entered private beta as an agentic security researcher for scalable vulnerability discovery. Cursor launched faster cloud coding agents, though transparency concerns arose regarding base-model provenance. Cognition released a public beta for a desktop/mobile tool-use agent named Devin. The community discussed advanced attention mechanisms and adaptive compute techniques.
MiniMax M2 230BA10B — 8% of Claude Sonnet's price, ~2x faster, new SOTA open model
minimax-m2 hailuo-ai huggingface baseten vllm modelscope openrouter cline sparse-moe model-benchmarking model-architecture instruction-following tool-use api-pricing model-deployment performance-evaluation full-attention qk-norm gqa rope reach_vb artificialanlys akhaliq eliebakouch grad62304977 yifan_zhang_ zpysky1125
MiniMax M2, an open-weight sparse MoE model by Hailuo AI, launches with ≈200–230B parameters and 10B active parameters, offering strong performance near frontier closed models and ranking #5 overall on the Artificial Analysis Intelligence Index v3.0. It supports coding and agent tasks, is licensed under MIT, and is available via API at competitive pricing. The architecture uses full attention, QK-Norm, GQA, partial RoPE, and sigmoid routing, with day-0 support in vLLM and deployment on platforms like Hugging Face and Baseten. Despite verbosity and no tech report, it marks a significant win for open models.
not much happened today
nemotron-nano-2 gpt-oss-120b qwen3 llama-3 minimax-m2 glm-4.6-air gemini-2.5-flash gpt-5.1-mini tahoe-x1 vllm_project nvidia mistral-ai baseten huggingface thinking-machines deeplearningai pytorch arena yupp-ai zhipu-ai scaling01 stanford transformer-architecture model-optimization inference distributed-training multi-gpu-support performance-optimization agents observability model-evaluation reinforcement-learning model-provenance statistical-testing foundation-models cancer-biology model-fine-tuning swyx dvilasuero _lewtun clementdelangue zephyr_z9 skylermiao7 teortaxestex nalidoust
vLLM announced support for NVIDIA Nemotron Nano 2, featuring a hybrid Transformer–Mamba design and tunable "thinking budget" enabling up to 6× faster token generation. Mistral AI Studio launched a production platform for agents with deep observability. Baseten reported high throughput (650 TPS) for GPT-OSS 120B on NVIDIA hardware. Hugging Face InspectAI added inference provider integration for cross-provider evaluation. Thinking Machines Tinker abstracts distributed fine-tuning for open-weight LLMs like Qwen3 and Llama 3. In China, MiniMax M2 shows competitive performance with top models and is optimized for agents and coding, while Zhipu GLM-4.6-Air focuses on reliability and scaling for coding tasks. Rumors suggest Gemini 2.5 Flash may be a >500B parameter MoE model, and a possible GPT-5.1 mini reference appeared. Outside LLMs, Tahoe-x1 (3B) foundation model achieved SOTA in cancer cell biology benchmarks. Research from Stanford introduces a method to detect model provenance via training-order "palimpsest" with strong statistical guarantees.