All tags
Model: "veo-2"
SOTA Video Gen: Veo 2 and Kling 2 are GA for developers
veo-2 gemini gpt-4.1 gpt-4o gpt-4.5-preview gpt-4.1-mini gpt-4.1-nano google openai video-generation api coding instruction-following context-window performance benchmarks model-deprecation kevinweil stevenheidel aidan_clark_
Google's Veo 2 video generation model is now available in the Gemini API with a cost of 35 cents per second of generated video, marking a significant step in accessible video generation. Meanwhile, China's Kling 2 model launched with pricing around $2 for a 10-second clip and a minimum subscription of $700 per month for 3 months, generating excitement despite some skill challenges. OpenAI announced the GPT-4.1 family release, including GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, highlighting improvements in coding, instruction following, and a 1 million token context window. The GPT-4.1 models are 26% cheaper than GPT-4o and will replace the GPT-4.5 Preview API version by July 14. Performance benchmarks show GPT-4.1 achieving 54-55% on SWE-bench verified and a 60% improvement over GPT-4o in some internal tests, though some critiques note it underperforms compared to other models like OpenRouter and DeepSeekV3 in coding tasks. The release is API-only, with a prompting guide provided for developers.
Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights
apollo-1b apollo-3b apollo-7b veo-2 imagen-3 llama-3-70b llama-3b command-r7b llama-1b llama-8b chatgpt meta-ai-fair hugging-face google-deepmind openai figure-ai klarna cohere notion video-understanding scaling-consistency benchmarking temporal-ocr egocentric-perception spatial-perception reasoning video-generation physics-simulation voice-features map-integration language-expansion test-time-compute-scaling humanoid-robots ai-integration search-optimization self-recognition self-preference-bias akhaliq _lewtun clementdelangue adcock_brett rohanpaul_ai swyx shaneguML
Meta released Apollo, a new family of state-of-the-art video-language models available in 1B, 3B, and 7B sizes, featuring "Scaling Consistency" for efficient scaling and introducing ApolloBench, which speeds up video understanding evaluation by 41× across five temporal perception categories. Google Deepmind launched Veo 2, a 4K video generation model with improved physics and camera control, alongside an enhanced Imagen 3 image model. OpenAI globally rolled out ChatGPT search with advanced voice and map features and discussed a potential $2,000/month "ChatGPT Max" tier. Research highlights include achieving Llama 70B performance using Llama 3B via test-time compute scaling and expanding Command R7B language support from 10 to 23 languages. Industry updates feature Figure AI delivering humanoid robots commercially and Klarna reducing workforce through AI. Notion integrated Cohere Rerank for better search. Studies reveal LLMs can recognize their own writing style and show self-preference bias. Discussions note video processing progress outpacing text due to better signal-per-compute and data evaluation.