All tags
Topic: "sparse-moe"
MiniMax M2 230BA10B — 8% of Claude Sonnet's price, ~2x faster, new SOTA open model
minimax-m2 hailuo-ai huggingface baseten vllm modelscope openrouter cline sparse-moe model-benchmarking model-architecture instruction-following tool-use api-pricing model-deployment performance-evaluation full-attention qk-norm gqa rope reach_vb artificialanlys akhaliq eliebakouch grad62304977 yifan_zhang_ zpysky1125
MiniMax M2, an open-weight sparse MoE model by Hailuo AI, launches with ≈200–230B parameters and 10B active parameters, offering strong performance near frontier closed models and ranking #5 overall on the Artificial Analysis Intelligence Index v3.0. It supports coding and agent tasks, is licensed under MIT, and is available via API at competitive pricing. The architecture uses full attention, QK-Norm, GQA, partial RoPE, and sigmoid routing, with day-0 support in vLLM and deployment on platforms like Hugging Face and Baseten. Despite verbosity and no tech report, it marks a significant win for open models.
Qwen 1.5 Released
qwen-1.5 mistral-7b sparsetral-16x7b-v2 bagel-7b-v0.4 deepseek-math-7b-instruct deepseek qwen mistral-ai hugging-face meta-ai-fair quantization token-context multilinguality retrieval-augmented-generation agent-planning code-generation sparse-moe model-merging fine-tuning direct-preference-optimization character-generation ascii-art kanji-generation vr retinal-resolution light-field-passthrough frozen-networks normalization-layers
Chinese AI models Yi, Deepseek, and Qwen are gaining attention for strong performance, with Qwen 1.5 offering up to 32k token context and compatibility with Hugging Face transformers and quantized models. The TheBloke Discord discussed topics like quantization of a 70B LLM, the introduction of the Sparse MoE model Sparsetral based on Mistral, debates on merging vs fine-tuning, and Direct Preference Optimization (DPO) for character generation. The Nous Research AI Discord covered challenges in Japanese Kanji generation, AI scams on social media, and Meta's VR headset prototypes showcased at SIGGRAPH 2023. Discussions also included fine-tuning frozen networks and new models like bagel-7b-v0.4, DeepSeek-Math-7b-instruct, and Sparsetral-16x7B-v2.