All tags
Model: "gemini-exp-1206"
not much happened today
nemotron-h nvidia-eagle-2.5 gpt-4o qwen2.5-vl-72b gemini-2.5-flash gemini-2.0-pro gemini-exp-1206 gemma-3 qwen2.5-32b deepseek-r1-zero-32b uni3c seedream-3.0 adobe-dragon kimina-prover qwen2.5-72b bitnet-b1.58-2b4t nvidia deepseek hugging-face alibaba bytedance adobe transformers model-optimization multimodality long-context reinforcement-learning torch-compile image-generation diffusion-models distributional-rewards model-efficiency model-training native-quantization sampling-techniques philschmid arankomatsuzaki osanseviero iScienceLuvr akhaliq
Nemotron-H model family introduces hybrid Mamba-Transformer models with up to 3x faster inference and variants including 8B, 56B, and a compressed 47B model. Nvidia Eagle 2.5 is a frontier VLM for long-context multimodal learning, matching GPT-4o and Qwen2.5-VL-72B on long-video understanding. Gemini 2.5 Flash shows improved dynamic thinking and cost-performance, outperforming previous Gemini versions. Gemma 3 now supports torch.compile for about 60% faster inference on consumer GPUs. SRPO using Qwen2.5-32B surpasses DeepSeek-R1-Zero-32B on benchmarks with reinforcement learning only. Alibaba's Uni3C unifies 3D-enhanced camera and human motion controls for video generation. Seedream 3.0 by ByteDance is a bilingual image generation model with high-resolution outputs up to 2K. Adobe DRAGON optimizes diffusion generative models with distributional rewards. Kimina-Prover Preview is an LLM trained with reinforcement learning from Qwen2.5-72B, achieving 80.7% pass@8192 on miniF2F. BitNet b1.58 2B4T is a native 1-bit LLM with 2B parameters trained on 4 trillion tokens, matching full-precision LLM performance with better efficiency. Antidistillation sampling counters unwanted model distillation by modifying reasoning traces from frontier models.
Google wakes up: Gemini 2.0 et al
gemini-2.0-flash gemini-1.5-pro gemini-exp-1206 claude-3.5-sonnet opus google-deepmind openai apple multimodality agent-development multilinguality benchmarking model-releases demis-hassabis sundar-pichai paige-bailey bindureddy
Google DeepMind launched Gemini 2.0 Flash, a new multimodal model outperforming Gemini 1.5 Pro and o1-preview, featuring vision and voice APIs, multilingual capabilities, and native tool use. It powers new AI agents like Project Astra and Project Mariner, with Project Mariner achieving state-of-the-art 83.5% on the WebVoyager benchmark. OpenAI announced ChatGPT integration with Apple devices, enabling Siri access and visual intelligence features. Claude 3.5 Sonnet is noted as a distilled version of Opus. The AI community's response at NeurIPS 2024 has been overwhelmingly positive, signaling a strong comeback for Google in AI innovation. Key topics include multimodality, agent development, multilinguality, benchmarking, and model releases.
Meta Llama 3.3: 405B/Nova Pro performance at 70B price
llama-3-70b llama-3.3-70b gpt-4o gemini-exp-1206 meta-ai-fair openai google-deepmind hugging-face llamacloud reinforcement-learning fine-tuning model-performance document-processing pricing-models alignment online-rl sama steven-heidel aidan_mclau lmarena_ai oriolvinyalsml jerryjliu0
Meta AI released Llama 3.3 70B, matching the performance of the 405B model with improved efficiency using "a new alignment process and progress in online RL techniques". OpenAI announced Reinforcement Fine-Tuning (RFT) for building expert models with limited data, offering alpha access to researchers and enterprises. Google DeepMind's Gemini-Exp-1206 leads benchmarks, tying with GPT-4o in coding performance. LlamaCloud enhanced document processing with table extraction and analytics. Discussions on OpenAI's pricing plans continue in the community.