All tags
Topic: "inference-infrastructure"
not much happened today
vllm-0.12.0 gemma3n qwen3-omni qwen3-vl gpt-5.1-codex-max gemini-3-pro runway-gen-4.5 kling-video-2.6 vllm nvidia huggingface langchain-ai together-ai meta-ai-fair sonarsource openrouter runway gemini arena gpu-programming quantization multimodality agent-platforms reinforcement-learning static-analysis reasoning inference-infrastructure model-optimization economics audio video-generation jeremyphoward mervenoyann sydneyrunkle swyx maximelabonne
vLLM 0.12.0 introduces DeepSeek support, GPU Model Runner V2, and quantization improvements with PyTorch 2.9.0 and CUDA 12.9. NVIDIA launches CUDA Tile IR and cuTile Python for advanced GPU tensor operations targeting Blackwell GPUs. Hugging Face releases Transformers v5 RC with an any-to-any multimodal pipeline supporting models like Gemma3n and Qwen3-Omni. Agent platforms see updates from LangChain with content moderation and cost tracking, Together AI and Meta AI collaborate on RL for long-horizon workflows, and SonarSource integrates static analysis into AI codegen. Economic insights from OpenRouter highlight coding as a key AI application, with reasoning models surpassing 50% usage and market bifurcation between premium and open models. Additionally, Kling Video 2.6 debuts native audio capabilities, and Runway Gen-4.5, Qwen3-TTS, and Gemini 3 Pro advance multimodality.