All tags
Topic: "humanoid-robots"
ModernBert: small new Retriever/Classifier workhorse, 8k context, 2T tokens,
modernbert gemini-2.0-flash-thinking o1 llama answerdotai lightonio hugging-face google-deepmind openai meta-ai-fair figure encoder-only-models long-context alternating-attention natural-language-understanding reasoning robotics-simulation physics-engine humanoid-robots model-performance model-releases jeremyphoward alec-radford philschmid drjimfan bindureddy
Answer.ai/LightOn released ModernBERT, an updated encoder-only model with 8k token context, trained on 2 trillion tokens including code, with 139M/395M parameters and state-of-the-art performance on retrieval, NLU, and code tasks. It features Alternating Attention layers mixing global and local attention. Gemini 2.0 Flash Thinking debuted as #1 in Chatbot Arena, and the O1 model scored top in reasoning benchmarks. Llama downloads surpassed 650 million, doubling in 3 months. OpenAI launched desktop app integrations with voice capabilities. Figure delivered its first humanoid robots commercially. Advances in robotics simulation and a new physics engine Genesis claiming 430,000x faster than real-time were highlighted.
Genesis: Generative Physics Engine for Robotics (o1-mini version)
o1 o1-preview gpt-4o claude-3.5-sonnet gemini-2.0-pro llama-3-3b llama-3-70b openai google-deepmind meta-ai-fair hugging-face function-calling structured-outputs vision performance-benchmarks sdk webrtc reasoning math code-generation transformer-architecture model-training humanoid-robots search model-efficiency dataset-sharing aidan_mclau sundarpichai adcock_brett
OpenAI launched the o1 model API featuring function calling, structured outputs, vision support, and developer messages, achieving 60% fewer reasoning tokens than its preview. The model excels in math and code with a 0.76 LiveBench Coding score, outperforming Sonnet 3.5. Beta SDKs for Go and Java and WebRTC support with 60% lower prices were also released. Google Gemini 2.0 Pro (Gemini Exp 1206) deployment accelerated, showing improved coding, math, and reasoning performance. Meta AI FAIR introduced research on training transformers directly on raw bytes using dynamic entropy-based patching. Commercial humanoid robots were successfully deployed by an industry player. Hugging Face researchers demonstrated that their 3B Llama model can outperform the 70B Llama model on MATH-500 accuracy using search techniques, highlighting efficiency gains with smaller models. Concerns about reproducibility and domain-specific limitations were noted.
Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights
apollo-1b apollo-3b apollo-7b veo-2 imagen-3 llama-3-70b llama-3b command-r7b llama-1b llama-8b chatgpt meta-ai-fair hugging-face google-deepmind openai figure-ai klarna cohere notion video-understanding scaling-consistency benchmarking temporal-ocr egocentric-perception spatial-perception reasoning video-generation physics-simulation voice-features map-integration language-expansion test-time-compute-scaling humanoid-robots ai-integration search-optimization self-recognition self-preference-bias akhaliq _lewtun clementdelangue adcock_brett rohanpaul_ai swyx shaneguML
Meta released Apollo, a new family of state-of-the-art video-language models available in 1B, 3B, and 7B sizes, featuring "Scaling Consistency" for efficient scaling and introducing ApolloBench, which speeds up video understanding evaluation by 41× across five temporal perception categories. Google Deepmind launched Veo 2, a 4K video generation model with improved physics and camera control, alongside an enhanced Imagen 3 image model. OpenAI globally rolled out ChatGPT search with advanced voice and map features and discussed a potential $2,000/month "ChatGPT Max" tier. Research highlights include achieving Llama 70B performance using Llama 3B via test-time compute scaling and expanding Command R7B language support from 10 to 23 languages. Industry updates feature Figure AI delivering humanoid robots commercially and Klarna reducing workforce through AI. Notion integrated Cohere Rerank for better search. Studies reveal LLMs can recognize their own writing style and show self-preference bias. Discussions note video processing progress outpacing text due to better signal-per-compute and data evaluation.
Not much (in AI) happened this weekend
llama-3.1-8b llama-3.2 chatgpt movie-gen openai meta-ai-fair google-deepmind microsoft x-ai spacex harvard nvidia long-context feature-prediction-loss ai-agents privacy text-to-video text-to-image humanoid-robots gpu-deployment media-foundation-models ai-research-labs sam-altman yann-lecun rasbt bindureddy andrej-karpathy soumithchintala svpino adcock_brett rohanpaul_ai
OpenAI introduced an "edit this area" feature for image generation, praised by Sam Altman. Yann LeCun highlighted a NYU paper improving pixel generation with feature prediction loss using pre-trained visual encoders like DINOv2. Long-context LLMs such as llama-3.1-8b and llama-3.2 variants now support up to 131k tokens, offering alternatives to RAG systems. Bindu Reddy announced AI agents capable of building and deploying code from English instructions, signaling AI's replacement of SQL and potential impact on Python. SpaceX's successful Starship rocket catch was celebrated by Andrej Karpathy and others, with Soumith Chintala praising SpaceX's efficient, low-bureaucracy research approach. Privacy concerns arose from Harvard students' AI glasses, I-XRAY, which can reveal personal information. Meta AI FAIR's Movie Gen model advances media foundation models with high-quality text-to-image and video generation, including synced audio. Humanoid robots like Ameca and Azi now engage in expressive conversations using ChatGPT. xAI rapidly deployed 100K Nvidia H100 GPUs in 19 days, with CEO Jensen Huang commending Elon Musk. Leading AI research labs compared include Meta-FAIR, Google DeepMind, and Microsoft Research. Skepticism about LLM intelligence was voiced by Sam Pino, emphasizing limitations in novel problem-solving despite strong memorization.
not much happened this weekend
jamba-1.5 dream-machine-1.5 ideogram-v2 mistral-nemo-minitron-8b mistral-7b llama-3-8b nous-research cursor-ai gdm george-hotz agibot unitree eth-zurich disney uc-san-diego ai21-labs luma-labs ideogram nvidia mistral-ai meta-ai-fair distributed-ai optimizer inter-gpu-communication low-latency-training open-source humanoid-robots robotics physics-based-motion teleoperation multilingual-models long-context text-to-video text-to-image model-performance george-hotz adcock_brett aman
Nous Research announced DisTrO, a new optimizer that drastically reduces inter-GPU communication by 1000x to 10,000x enabling efficient training on slow networks, offering an alternative to GDM's DiLoCo. Cursor AI gained viral attention from an 8-year-old user and announced a new fundraise, with co-host Aman returning to their podcast. George Hotz launched tinybox for sale. In robotics, AGIBOT revealed 5 new humanoid robots with open-source plans, and Unitree showcased its G1 humanoid robot nearing mass production at $16,000. ETH Zurich and Disney developed an AI system for physics-based robot motion generation from text or images. UC San Diego released ACE, an open-source teleoperation system for controlling multiple robots. AI21 Labs unveiled Jamba 1.5, a multilingual model with 256k context length and permissive licensing. Luma Labs released Dream Machine 1.5 for improved text-to-video generation. Ideogram launched v2 of its text-to-image model with near-perfect text generation. Nvidia and Mistral released Mistral-NeMo-Minitron 8B, a small model outperforming Mistral-7B and llama-3-8b on the Open LLM leaderboard.