All tags
Person: "kimi_moonshot"
not much happened today
kimi-k2-thinking kimi-k3 gelato-30b-a3b omnilingual-wav2vec-2.0 moonshot-ai meta-ai-fair togethercompute qwen attention-mechanisms quantization fine-tuning model-optimization agentic-ai speech-recognition multilingual-models gui-manipulation image-editing dataset-release yuchenj_uw scaling01 code_star omarsar0 kimi_moonshot anas_awadalla akhaliq minchoi
Moonshot AI's Kimi K2 Thinking AMA revealed a hybrid attention stack using KDA + NoPE MLA outperforming full MLA + RoPE, with the Muon optimizer scaling to ~1T parameters and native INT4 QAT for cost-efficient inference. K2 Thinking ranks highly on LisanBench and LM Arena Text leaderboards, offering low-cost INT4 serving and strong performance in Math, Coding, and Creative Writing. It supports heavy agentic tool use with up to 300 tool requests per run and recommends using the official API for reliable long-trace inference. Meta AI released the Omnilingual ASR suite covering 1600+ languages including 500 underserved, plus a 7B wav2vec 2.0 model and ASR corpus. Additionally, the Gelato-30B-A3B model for computer grounding in GUI manipulation agents outperforms larger VLMs, targeting immediate agent gains. Qwen's image-edit LoRAs and light-restoration app were also highlighted.
Terminal-Bench 2.0 and Harbor
kimi-k2-thinking moonshot-ai anthropic hugging-face ollama slime-framework benchmarking agentic-ai quantization model-optimization inference model-deployment moe context-windows cost-efficiency clementdelangue dbreunig awnihannun crystalsssup kimi_moonshot
Terminal-Bench has fixed task issues and launched version 2.0 with cloud container support via the Harbor framework, gaining recognition from models like Claude 4.5 and Kimi K2 Thinking. Moonshot AI's Kimi K2 Thinking is a 1 trillion parameter MoE reasoning model with ~32B active parameters, running natively in INT4 quantization and featuring a 256K context window. It leads open-weights benchmarks with an Artificial Analysis Intelligence Index score of 67 and strong agentic performance, running efficiently on consumer Apple silicon and 2× M3 Ultra hardware. The model is broadly available on Hugging Face, Ollama Cloud, and integrated into frameworks like slime. Serving bottlenecks were traced to network bandwidth rather than GPU limits, highlighting infrastructure considerations for LLM deployment.
not much happened today
kimi-linear kimi-delta-attention minimax-m2 looped-llms aardvark-gpt-5 moonshot-ai minimax bytedance princeton mila openai cursor cognition hkust long-context attention-mechanisms agentic-ai tool-use adaptive-compute coding-agents performance-optimization memory-optimization reinforcement-learning model-architecture kimi_moonshot scaling01 uniartisan omarsar0 aicodeking songlinyang4 iscienceluvr nrehiew_ gdb embeddedsec auchenberg simonw
Moonshot AI released Kimi Linear (KDA) with day-0 infrastructure and strong long-context metrics, achieving up to 75% KV cache reduction and 6x decoding throughput. MiniMax M2 pivoted to full attention for multi-hop reasoning, maintaining strong agentic coding performance with 200k context and ~100 TPS. ByteDance, Princeton, and Mila introduced Looped LLMs showing efficiency gains comparable to larger transformers. OpenAI's Aardvark (GPT-5) entered private beta as an agentic security researcher for scalable vulnerability discovery. Cursor launched faster cloud coding agents, though transparency concerns arose regarding base-model provenance. Cognition released a public beta for a desktop/mobile tool-use agent named Devin. The community discussed advanced attention mechanisms and adaptive compute techniques.
Oracle jumps +36% in a day after winning $300B OpenAI contract
qwen3-235b qwen3-4b qwen2.5-7b vllm oracle openai microsoft moonshot-ai vllm-project thinking-machines-lab meta reinforcement-learning model-weight-updates deterministic-inference benchmarking long-context model-optimization cuda distributed-training kimi_moonshot arankomatsuzaki qgallouedec cHHillee woosuk_k stasbekman
Oracle's OCI division reported a stunning +359% revenue bookings growth to $455B with cloud revenue guidance of $144B by 2030, driven significantly by a large deal with OpenAI amid tensions with Microsoft. On AI infrastructure, Moonshot AI released Kimi’s checkpoint-engine, enabling rapid weight updates on 1T-parameter models across thousands of GPUs, integrating with vLLM. RLFactory introduced a plug-and-play reinforcement learning framework for tool-using agents, showing smaller models outperforming larger ones. TRL v0.23 added context parallelism for long-context training. Thinking Machines Lab published research on deterministic inference pipelines, making vLLM deterministic for Qwen models. Meta launched BackendBench, a PyTorch benchmarking tool.