All tags
Topic: "contrastive-learning"
not much happened today
embeddinggemma qwen-2.5-coder minicpm-v-4.5 gpt-4o gemini-2.0-pro google-deepmind hugging-face jina-ai lighton microsoft stanford openai ollama weaviate langchain llamaindex embeddings retrieval-augmented-generation quantization multilingual-models on-device-ai semantic-search contrastive-learning dataset-release vision multimodality video-generation text-to-speech optimizer-benchmarking training-recipes model-compression video-token-compression fine-tuning osanseviero _philschmid tomaarsen ollama weaviate_io lusxvr andimarafioti thibaudfrere _akhaliq clementdelangue gordonwetzstein konstmish wen_kaiyue percyliang
Google DeepMind released EmbeddingGemma (308M), a small multilingual embedding model optimized for on-device retrieval-augmented generation and semantic search, supporting over 100 languages and running efficiently with quantization and EdgeTPU latency under 15ms. Jina AI introduced new code-focused embedding models (0.5B/1.5B) with GGUF quantization, achieving state-of-the-art retrieval across multiple languages and tasks. LightOn demonstrated large-scale retrieval training without distillation using contrastive training on billions of passages. Hugging Face released the FineVision dataset with 17.3M images and 9.5B answer tokens for vision-language model training, showing significant benchmark improvements. The MiniCPM-V 4.5 (8B) multimodal model reported surpassing GPT-4o and Gemini-2.0 Pro on OpenCompass benchmarks with innovative video token compression. Microsoft’s VibeVoice TTS and Stanford’s Mixture-of-Contexts video generation also featured. Additionally, a Stanford study benchmarked optimizers like Muon, Soap, Mars, and Sophia, finding diminishing speedups over AdamW at larger scales but advantages at smaller scales. The new ChatGPT branching feature was noted for its simplicity and popularity. "Everyone's a decacorn now."
Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence
gemini-1.5-pro gpt-4-turbo llama-3 orca-2.5-7b functionary-v2.4 cosxl google openai meta-ai-fair hugging-face cohere million-token-context-window audio-processing file-api text-embedding function-calling reasoning direct-nash-optimization contrastive-learning code-interpreter diffusion-models neural-odes inference-speed multilingual-dataset image-editing no-code-development
At Google Cloud Next, Gemini 1.5 Pro was released with a million-token context window, available in 180+ countries, featuring 9.5 hours of audio understanding, a new File API for nearly unlimited free uploads, and the Gecko-1b-256/768 embedding model. GPT-4 Turbo with Vision became generally available in the API with a major update improving reasoning capabilities. Meta Platforms plans to launch smaller versions of Llama 3 next week. The Orca 2.5 7B model using Direct Nash Optimization outperforms older GPT-4 versions in AlpacaEval. New releases include Functionary-V2.4 with enhanced function calling and code interpretation, and CosXL models for image editing. Research highlights include continuous U-Nets for diffusion models achieving up to 80% faster inference and a massive multilingual dataset with ~5.6 trillion word tokens. Creative applications include a no-code touch screen game made with Gemini 1.5 and AI-generated novel trailers.