All tags
Topic: "open-weights"
Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params
kimi-k2 kimi-k2-1t deepseek-v3 grok-4 devstral-2507 gpt-4.1 sonnet-4 moonshot-ai alibaba tencent deepseek x-ai mistral-ai weights-biases hugging-face mixture-of-experts model-training model-optimization optimizer benchmarking long-context model-performance open-weights model-release yuchenj_uw andrew_n_carr scaling01 novita_labs teknium1 aravsrinivas mparakhin simonw
Moonshot AI has released Kimi K2, a 1 trillion parameter Mixture-of-Experts model trained on 15.5 trillion tokens using the new MuonClip optimizer, achieving state-of-the-art results on benchmarks like SWE-Bench Verified (65.8%) and TAU2 (58.4%). This model is competitive with GPT-4.1 and Sonnet 4 on non-thinking tasks and is available under an MIT license. Meanwhile, xAI announced Grok-4, noted for its "LEAST censored frontier model" status and strong long-context performance but criticized for rushed post-training. Mistral AI updated its Devstral 2507 models with improved performance and cost efficiency. The community is excited about the potential of the MuonClip optimizer, which may surpass the long-standing AdamW optimizer in machine learning.
not much happened today
deepseek-r1-0528 o3 gemini-2.5-pro claude-opus-4 deepseek_ai openai gemini meta-ai-fair anthropic x-ai ollama hugging-face alibaba bytedance xiaomi reasoning reinforcement-learning benchmarking quantization local-inference model-evaluation open-weights transparency post-training agentic-benchmarks long-context hallucination-detection teortaxestex wenfeng danielhanchen awnihannun reach_vb abacaj
DeepSeek R1-0528 release brings major improvements in reasoning, hallucination reduction, JSON output, and function calling, matching or surpassing closed models like OpenAI o3 and Gemini 2.5 Pro on benchmarks such as Artificial Analysis Intelligence Index, LiveBench, and GPQA Diamond. The model ranks #2 globally in open weights intelligence, surpassing Meta AI, Anthropic, and xAI. Open weights and technical transparency have fueled rapid adoption across platforms like Ollama and Hugging Face. Chinese AI labs including DeepSeek, Alibaba, ByteDance, and Xiaomi now match or surpass US labs in model releases and intelligence, driven by open weights strategies. Reinforcement learning post-training is critical for intelligence gains, mirroring trends seen at OpenAI. Optimized quantization techniques (1-bit, 4-bit) and local inference enable efficient experimentation on consumer hardware. New benchmarks like LisanBench test knowledge, planning, memory, and long-context reasoning, with OpenAI o3 and Claude Opus 4 leading. Discussions highlight concerns about benchmark contamination and overemphasis on RL-tuned gains.
DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release
deepseek-r1-0528 gemini-2.5-pro qwen-3-8b qwen-3-235b deepseek-ai anthropic meta-ai-fair nvidia alibaba google-deepmind reinforcement-learning benchmarking model-performance open-weights reasoning quantization post-training model-comparison artificialanlys scaling01 cline reach_vb zizhpan andrewyng teortaxestex teknim1 lateinteraction abacaj cognitivecompai awnihannun
DeepSeek R1-0528 marks a significant upgrade, closing the gap with proprietary models like Gemini 2.5 Pro and surpassing benchmarks from Anthropic, Meta, NVIDIA, and Alibaba. This Chinese open-weights model leads in several AI benchmarks, driven by reinforcement learning post-training rather than architecture changes, and demonstrates increased reasoning token usage (23K tokens per question). The China-US AI race intensifies as Chinese labs accelerate innovation through transparency and open research culture. Key benchmarks include AIME 2024, LiveCodeBench, and GPQA Diamond.