All tags
Topic: "physics-simulation"
Genesis: Generative Physics Engine for Robotics (o1-2024-12-17)
o1 gemini-2.0-pro openai google carnegie-mellon-university universal-physics-engine robotics-simulation physics-simulation photo-realistic-rendering generative-data simulation-platform open-source function-calling vision performance-benchmarks sdk realtime-api zhou-xian aidan_mclau sundar-pichai
Genesis is a newly announced universal physics engine developed by a large-scale collaboration led by CMU PhD student Zhou Xian. It integrates multiple state-of-the-art physics solvers to simulate diverse materials and physical phenomena, targeting robotics applications with features like lightweight, ultra-fast simulation, photo-realistic rendering, and generative data capabilities. The engine is open source and designed for robotics simulation beyond just video generation. Additionally, OpenAI released the o1 model to API with advanced features like function calling and vision support, showing strong math and coding performance. Google teased updates on Gemini 2.0 Pro, accelerating deployment for advanced users.
Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights
apollo-1b apollo-3b apollo-7b veo-2 imagen-3 llama-3-70b llama-3b command-r7b llama-1b llama-8b chatgpt meta-ai-fair hugging-face google-deepmind openai figure-ai klarna cohere notion video-understanding scaling-consistency benchmarking temporal-ocr egocentric-perception spatial-perception reasoning video-generation physics-simulation voice-features map-integration language-expansion test-time-compute-scaling humanoid-robots ai-integration search-optimization self-recognition self-preference-bias akhaliq _lewtun clementdelangue adcock_brett rohanpaul_ai swyx shaneguML
Meta released Apollo, a new family of state-of-the-art video-language models available in 1B, 3B, and 7B sizes, featuring "Scaling Consistency" for efficient scaling and introducing ApolloBench, which speeds up video understanding evaluation by 41× across five temporal perception categories. Google Deepmind launched Veo 2, a 4K video generation model with improved physics and camera control, alongside an enhanced Imagen 3 image model. OpenAI globally rolled out ChatGPT search with advanced voice and map features and discussed a potential $2,000/month "ChatGPT Max" tier. Research highlights include achieving Llama 70B performance using Llama 3B via test-time compute scaling and expanding Command R7B language support from 10 to 23 languages. Industry updates feature Figure AI delivering humanoid robots commercially and Klarna reducing workforce through AI. Notion integrated Cohere Rerank for better search. Studies reveal LLMs can recognize their own writing style and show self-preference bias. Discussions note video processing progress outpacing text due to better signal-per-compute and data evaluation.