All tags
Model: "veo"
not much happened today
o1-preview o1-mini qwen-2.5 gpt-4o deepseek-v2.5 gpt-4-turbo-2024-04-09 grin llama-3-1-405b veo kat openai qwen deepseek-ai microsoft kyutai-labs perplexity-ai together-ai meta-ai-fair google-deepmind hugging-face google anthropic benchmarking math coding instruction-following model-merging model-expressiveness moe voice voice-models generative-video competition open-source model-deployment ai-agents hyung-won-chung noam-brown bindureddy akhaliq karpathy aravsrinivas fchollet cwolferesearch philschmid labenz ylecun
OpenAI's o1-preview and o1-mini models lead benchmarks in Math, Hard Prompts, and Coding. Qwen 2.5 72B model shows strong performance close to GPT-4o. DeepSeek-V2.5 tops Chinese LLMs, rivaling GPT-4-Turbo-2024-04-09. Microsoft's GRIN MoE achieves good results with 6.6B active parameters. Moshi voice model from Kyutai Labs runs locally on Apple Silicon Macs. Perplexity app introduces voice mode with push-to-talk. LlamaCoder by Together.ai uses Llama 3.1 405B for app generation. Google DeepMind's Veo is a new generative video model for YouTube Shorts. The 2024 ARC-AGI competition increases prize money and plans a university tour. A survey on model merging covers 50+ papers for LLM alignment. The Kolmogorov–Arnold Transformer (KAT) paper proposes replacing MLP layers with KAN layers for better expressiveness. Hugging Face Hub integrates with Google Cloud Vertex AI Model Garden for easier open-source model deployment. Agent.ai is introduced as a professional network for AI agents. "Touching grass is all you need."
Not much happened today
gpt-4o gemini-1.5-pro gemini-1.5-flash imagen-3 veo reka-core qwen-1.5-110b openai google-deepmind anthropic rekailabs alibaba salesforce multimodality long-context model-releases reinforcement-learning model-benchmarking text-to-image video-generation ai-assistants ilya-sutskever jakub-pachocki mike-krieger sama
Ilya Sutskever steps down as Chief Scientist at OpenAI after nearly a decade, with Jakub Pachocki named as his successor. Google DeepMind announces Gemini 1.5 Pro and Gemini 1.5 Flash models featuring 2 million token context and improved multimodal capabilities, alongside demos of Project Astra AI assistant, Imagen 3 text-to-image model, and Veo generative video model. GPT-4o tops the VHELM leaderboard and outperforms competitors on LMSYS Chatbot Arena. Reka Core multimodal model with 128K context and Alibaba's Qwen1.5-110B open-source model are released. Salesforce shares an online RLHF recipe.
Google I/O in 60 seconds
gemini-1.5-pro gemini-flash gemini-ultra gemini-pro gemini-nano gemma-2 llama-3-70b paligemma imagen-3 veo google google-deepmind youtube tokenization model-performance fine-tuning vision multimodality model-release model-training model-optimization ai-integration image-generation watermarking hardware-optimization voice video-understanding
Google announced updates to the Gemini model family, including Gemini 1.5 Pro with 2 million token support, and the new Gemini Flash model optimized for speed with 1 million token capacity. The Gemini suite now includes Ultra, Pro, Flash, and Nano models, with Gemini Nano integrated into Chrome 126. Additional Gemini features include Gemini Gems (custom GPTs), Gemini Live for voice conversations, and Project Astra, a live video understanding assistant. The Gemma model family was updated with Gemma 2 at 27B parameters, offering near-llama-3-70b performance at half the size, plus PaliGemma, a vision-language open model inspired by PaLI-3. Other launches include DeepMind's Veo, Imagen 3 for photorealistic image generation, and a Music AI Sandbox collaboration with YouTube. SynthID watermarking now extends to text, images, audio, and video. The Trillium TPUv6 codename was revealed. Google also integrated AI across its product suite including Workspace, Email, Docs, Sheets, Photos, Search, and Lens. "The world awaits Apple's answer."