All tags
Model: "embeddinggemma"
not much happened today
gemini-robotics-1.5 gemini-live embeddinggemma veo-3 gemini-2.5-flash code-world-model-32b qwen3-coder-30b vllm-v1 mlx-lm flashattention-4 google meta-ai-fair perplexity-ai baseten spatial-reasoning temporal-reasoning agentic-ai code-semantics code-execution-traces coding-infrastructure runtime-optimization batch-inference embedding-latency api model-optimization model-performance osanseviero _anniexie rmstein scaling01 giffmana cline redhat_ai awnihannun charles_irl bernhardsson akshat_b aravsrinivas
Google released a dense September update including Gemini Robotics 1.5 with enhanced spatial/temporal reasoning, Gemini Live, EmbeddingGemma, and Veo 3 GA powering creative workflows. They also introduced agentic features like restaurant-reservation agents and reduced pricing for Gemini 2.5 Flash. Meta AI unveiled the open-weight Code World Model (CWM) 32B, excelling in code semantics and math benchmarks, with innovations in training code models via execution traces. Local-first coding setups highlight Qwen3-Coder-30B running efficiently on consumer GPUs, paired with tools like Cline and LM Studio. Runtime improvements include vLLM v1 supporting hybrid models and mlx-lm adding batch inference on Apple silicon. In infrastructure, FlashAttention 4 was reverse-engineered revealing a ~20% speedup from architectural optimizations. Perplexity AI advances its independent web index and browsing API with upcoming feed refreshes. Embedding latency improvements were achieved by Superhuman using Baseten.
not much happened today
embeddinggemma qwen-2.5-coder minicpm-v-4.5 gpt-4o gemini-2.0-pro google-deepmind hugging-face jina-ai lighton microsoft stanford openai ollama weaviate langchain llamaindex embeddings retrieval-augmented-generation quantization multilingual-models on-device-ai semantic-search contrastive-learning dataset-release vision multimodality video-generation text-to-speech optimizer-benchmarking training-recipes model-compression video-token-compression fine-tuning osanseviero _philschmid tomaarsen ollama weaviate_io lusxvr andimarafioti thibaudfrere _akhaliq clementdelangue gordonwetzstein konstmish wen_kaiyue percyliang
Google DeepMind released EmbeddingGemma (308M), a small multilingual embedding model optimized for on-device retrieval-augmented generation and semantic search, supporting over 100 languages and running efficiently with quantization and EdgeTPU latency under 15ms. Jina AI introduced new code-focused embedding models (0.5B/1.5B) with GGUF quantization, achieving state-of-the-art retrieval across multiple languages and tasks. LightOn demonstrated large-scale retrieval training without distillation using contrastive training on billions of passages. Hugging Face released the FineVision dataset with 17.3M images and 9.5B answer tokens for vision-language model training, showing significant benchmark improvements. The MiniCPM-V 4.5 (8B) multimodal model reported surpassing GPT-4o and Gemini-2.0 Pro on OpenCompass benchmarks with innovative video token compression. Microsoft’s VibeVoice TTS and Stanford’s Mixture-of-Contexts video generation also featured. Additionally, a Stanford study benchmarked optimizers like Muon, Soap, Mars, and Sophia, finding diminishing speedups over AdamW at larger scales but advantages at smaller scales. The new ChatGPT branching feature was noted for its simplicity and popularity. "Everyone's a decacorn now."