All tags
Model: "gemini-live"
not much happened today
gemini-robotics-1.5 gemini-live embeddinggemma veo-3 gemini-2.5-flash code-world-model-32b qwen3-coder-30b vllm-v1 mlx-lm flashattention-4 google meta-ai-fair perplexity-ai baseten spatial-reasoning temporal-reasoning agentic-ai code-semantics code-execution-traces coding-infrastructure runtime-optimization batch-inference embedding-latency api model-optimization model-performance osanseviero _anniexie rmstein scaling01 giffmana cline redhat_ai awnihannun charles_irl bernhardsson akshat_b aravsrinivas
Google released a dense September update including Gemini Robotics 1.5 with enhanced spatial/temporal reasoning, Gemini Live, EmbeddingGemma, and Veo 3 GA powering creative workflows. They also introduced agentic features like restaurant-reservation agents and reduced pricing for Gemini 2.5 Flash. Meta AI unveiled the open-weight Code World Model (CWM) 32B, excelling in code semantics and math benchmarks, with innovations in training code models via execution traces. Local-first coding setups highlight Qwen3-Coder-30B running efficiently on consumer GPUs, paired with tools like Cline and LM Studio. Runtime improvements include vLLM v1 supporting hybrid models and mlx-lm adding batch inference on Apple silicon. In infrastructure, FlashAttention 4 was reverse-engineered revealing a ~20% speedup from architectural optimizations. Perplexity AI advances its independent web index and browsing API with upcoming feed refreshes. Embedding latency improvements were achieved by Superhuman using Baseten.