All tags
Model: "mlx-lm"
not much happened today
gemini-robotics-1.5 gemini-live embeddinggemma veo-3 gemini-2.5-flash code-world-model-32b qwen3-coder-30b vllm-v1 mlx-lm flashattention-4 google meta-ai-fair perplexity-ai baseten spatial-reasoning temporal-reasoning agentic-ai code-semantics code-execution-traces coding-infrastructure runtime-optimization batch-inference embedding-latency api model-optimization model-performance osanseviero _anniexie rmstein scaling01 giffmana cline redhat_ai awnihannun charles_irl bernhardsson akshat_b aravsrinivas
Google released a dense September update including Gemini Robotics 1.5 with enhanced spatial/temporal reasoning, Gemini Live, EmbeddingGemma, and Veo 3 GA powering creative workflows. They also introduced agentic features like restaurant-reservation agents and reduced pricing for Gemini 2.5 Flash. Meta AI unveiled the open-weight Code World Model (CWM) 32B, excelling in code semantics and math benchmarks, with innovations in training code models via execution traces. Local-first coding setups highlight Qwen3-Coder-30B running efficiently on consumer GPUs, paired with tools like Cline and LM Studio. Runtime improvements include vLLM v1 supporting hybrid models and mlx-lm adding batch inference on Apple silicon. In infrastructure, FlashAttention 4 was reverse-engineered revealing a ~20% speedup from architectural optimizations. Perplexity AI advances its independent web index and browsing API with upcoming feed refreshes. Embedding latency improvements were achieved by Superhuman using Baseten.
Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
claude-3.5-haiku llama-3-1 llama-3-2 mlx-lm tencent anthropic meta-ai-fair togethercompute llamaindex mixture-of-experts synthetic-data model-scaling model-architecture model-optimization kv-cache-quantization react fine-tuning scaling-laws model-efficiency model-deployment multimodality
Tencent released a notable >300B parameter MoE model pretrained on 7T tokens, including 1.5T synthetic data generated via Evol-Instruct. The model introduces novel techniques like "recycle routing" and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, Anthropic launched Claude 3.5 Haiku, now available on multiple platforms, praised for intelligence and speed but criticized for a 10x price increase. Meta opened Llama AI to the U.S. defense sector, and a Llama Impact Hackathon offers a $15K prize for projects using Llama 3.1 & 3.2 Vision. LlamaIndex released a React chat UI component with Tailwind CSS and LLM backend integrations. The MLX LM model advances text generation speed and efficiency with KV cache quantization.