All tags
Company: "vllm_project"
not much happened today
nemotron-nano-2 gpt-oss-120b qwen3 llama-3 minimax-m2 glm-4.6-air gemini-2.5-flash gpt-5.1-mini tahoe-x1 vllm_project nvidia mistral-ai baseten huggingface thinking-machines deeplearningai pytorch arena yupp-ai zhipu-ai scaling01 stanford transformer-architecture model-optimization inference distributed-training multi-gpu-support performance-optimization agents observability model-evaluation reinforcement-learning model-provenance statistical-testing foundation-models cancer-biology model-fine-tuning swyx dvilasuero _lewtun clementdelangue zephyr_z9 skylermiao7 teortaxestex nalidoust
vLLM announced support for NVIDIA Nemotron Nano 2, featuring a hybrid Transformer–Mamba design and tunable "thinking budget" enabling up to 6× faster token generation. Mistral AI Studio launched a production platform for agents with deep observability. Baseten reported high throughput (650 TPS) for GPT-OSS 120B on NVIDIA hardware. Hugging Face InspectAI added inference provider integration for cross-provider evaluation. Thinking Machines Tinker abstracts distributed fine-tuning for open-weight LLMs like Qwen3 and Llama 3. In China, MiniMax M2 shows competitive performance with top models and is optimized for agents and coding, while Zhipu GLM-4.6-Air focuses on reliability and scaling for coding tasks. Rumors suggest Gemini 2.5 Flash may be a >500B parameter MoE model, and a possible GPT-5.1 mini reference appeared. Outside LLMs, Tahoe-x1 (3B) foundation model achieved SOTA in cancer cell biology benchmarks. Research from Stanford introduces a method to detect model provenance via training-order "palimpsest" with strong statistical guarantees.
not much happened today
qwen3-coder-480b-a35b-instruct kimi-k2 alibaba openrouterai togethercompute vllm_project unslothai white-house code-generation benchmarking model-integration context-windows open-source national-security infrastructure ai-policy fchollet clementdelangue scaling01 aravsrinivas rasbt gregkamradt yuchenj_uw
Alibaba announced the release of Qwen3-Coder-480B-A35B-Instruct, an open agentic code model with 480B parameters and 256K context length, praised for rapid development and strong coding performance. Benchmark claims of 41.8% on ARC-AGI-1 faced skepticism from Fran ois Chollet and others due to reproducibility issues. The model quickly integrated into ecosystems like vLLM, Dynamic GGUFs, and OpenRouterAI. The White House unveiled a new AI Action Plan emphasizing Innovation, Infrastructure, and International Diplomacy, linking AI leadership to national security and prioritizing compute access for the Department of Defense. The plan sparked debate on open vs. closed-source AI, with calls from Clement Delangue to embrace open science to maintain US AI competitiveness.