All tags
Person: "demishassabis"
not much happened today
veo-3 deepseek-r1t2 deepseek-tng-r1t2-chimera o3-deep-research o4-mini-deep-research deepswe-agent safe-superintelligence-inc perplexity-ai meta-ai-fair midjourney sakana-ai cohere google-deepmind deepseek openai together-ai video-generation assembly-of-experts model-licenses api-pricing research-roles product-expansion corporate-leadership model-release team-expansion ilya_sutskever daniel_levy daniel_gross aravsrinivas zeyuanallenzhu nat_friedman davidsholz fp_champagne demishassabis reach_vb
Ilya Sutskever confirmed his role as CEO of Safe Superintelligence Inc. (SSI) with Daniel Levy as President, dismissing acquisition rumors and emphasizing their strong team and compute resources. Perplexity AI expanded its data integrations by adding Morningstar's financial research and hinted at new product features for Pro users. Meta AI FAIR clarified its research structure, distinguishing its small lab from larger model training groups, and welcomed Nat Friedman to enhance AI product development. Midjourney and Sakana AI announced hiring for research and applied engineering roles. Cohere expanded its presence in Montréal, receiving praise from Canadian officials. On the model front, Google DeepMind's Gemini Pro released the Veo 3 video generation model globally. DeepSeek launched the faster DeepSeek R1T2 model using an Assembly of Experts approach, available under an MIT license. Kling AI showcased cinematic video generation capabilities. OpenAI introduced a high-cost Deep Research API with pricing up to $30 per call. Together AI announced the release of the DeepSWE agent.
not much happened today
gemma-3n hunyuan-a13b flux-1-kontext-dev mercury fineweb2 qwen-vlo o3-mini o4-mini google-deepmind tencent black-forest-labs inception-ai qwen kyutai-labs openai langchain langgraph hugging-face ollama unslothai nvidia amd multimodality mixture-of-experts context-windows tool-use coding image-generation diffusion-models dataset-release multilinguality speech-to-text api prompt-engineering agent-frameworks open-source model-release demishassabis reach_vb tri_dao osanseviero simonw clementdelangue swyx hwchase17 sydneyrunkle
Google released Gemma 3n, a multimodal model for edge devices available in 2B and 4B parameter versions, with support across major frameworks like Transformers and Llama.cpp. Tencent open-sourced Hunyuan-A13B, a Mixture-of-Experts (MoE) model with 80B total parameters and a 256K context window, optimized for tool calling and coding. Black Forest Labs released FLUX.1 Kontext [dev], an open image AI model gaining rapid Hugging Face adoption. Inception AI Labs launched Mercury, the first commercial-scale diffusion LLM for chat. The FineWeb2 multilingual pre-training dataset paper was released, analyzing data quality impacts. The Qwen team released Qwen-VLo, a unified visual understanding and generation model. Kyutai Labs released a top-ranked open-source speech-to-text model running on Macs and iPhones. OpenAI introduced Deep Research API with o3/o4-mini models and open-sourced prompt rewriter methodology, integrated into LangChain and LangGraph. The open-source Gemini CLI gained over 30,000 GitHub stars as an AI terminal agent.
OpenAI releases Deep Research API (o3/o4-mini)
o3-deep-research o4-mini-deep-research gemma-3n flux-1-kontext-dev gpt-4o alphagenome openai google black-forest-labs deepmind sakana-ai higgsfield-ai huggingface ollama multimodality model-releases agentic-ai reinforcement-learning instruction-following model-architecture model-optimization image-generation biological-ai multi-agent-systems model-integration demishassabis hardmaru osanseviero clementdelangue
OpenAI has launched the Deep Research API featuring powerful models o3-deep-research and o4-mini-deep-research with native support for MCP, Search, and Code Interpreter, enabling advanced agent capabilities including multi-agent setups. Google released Gemma 3n, a multimodal model optimized for edge devices with only 3GB RAM, achieving a top score of 1300 on LMSys Arena, featuring the new MatFormer architecture and broad ecosystem integration. Black Forest Labs introduced FLUX.1 Kontext [dev], a 12B parameter rectified flow transformer for instruction-based image editing, comparable to GPT-4o. DeepMind unveiled AlphaGenome, an AI model capable of reading 1 million DNA bases for gene function prediction, marking a breakthrough in AI biology. Sakana AI presented Reinforcement-Learned Teachers (RLTs) to enhance LLM reasoning, achieving 86.1% on MiniF2F with efficient compute. Higgsfield AI released Higgsfield Soul, a high-aesthetic photo model with 50+ presets for fashion-grade realism. Additionally, Google launched the Gemini CLI, an open-source AI agent for terminal use with free Gemini 2.5 Pro requests.
Bartz v. Anthropic PBC — "Training use is Fair Use"
claude gemini-robotics-on-device anthropic replit delphi sequoia thinking-machines-lab disney universal midjourney google-deepmind fair-use copyright reinforcement-learning foundation-models robotics funding lawsuit digital-minds model-release andrea_bartz giffmana andrewcurran_ amasad swyx hwchase17 krandiash daraladje steph_palazzolo corbtt demishassabis
Anthropic won a significant fair use ruling allowing the training of Claude on copyrighted books, setting a precedent for AI training legality despite concerns over pirated data. Replit achieved a major milestone with $100M ARR, showing rapid growth. Delphi raised $16M Series A to scale digital minds, while Thinking Machines Lab focuses on reinforcement learning for business applications. Disney and Universal sued Midjourney over unauthorized use of copyrighted images. Google DeepMind released Gemini Robotics On-Device, a compact foundation model for robotics.
The Quiet Rise of Claude Code vs Codex
mistral-small-3.2 qwen3-0.6b llama-3-1b gemini-2.5-flash-lite gemini-app magenta-real-time apple-3b-on-device mistral-ai hugging-face google-deepmind apple artificial-analysis kuaishou instruction-following function-calling model-implementation memory-efficiency 2-bit-quantization music-generation video-models benchmarking api reach_vb guillaumelample qtnx_ shxf0072 rasbt demishassabis artificialanlys osanseviero
Claude Code is gaining mass adoption, inspiring derivative projects like OpenCode and ccusage, with discussions ongoing in AI communities. Mistral AI released Mistral Small 3.2, a 24B parameter model update improving instruction following and function calling, available on Hugging Face and supported by vLLM. Sebastian Raschka implemented Qwen3 0.6B from scratch, noting its deeper architecture and memory efficiency compared to Llama 3 1B. Google DeepMind showcased Gemini 2.5 Flash-Lite's UI code generation from visual context and added video upload support in the Gemini App. Apple's new 3B parameter on-device foundation model was benchmarked, showing slower speed but efficient memory use via 2-bit quantization, suitable for background tasks. Google DeepMind also released Magenta Real-time, an 800M parameter music generation model licensed under Apache 2.0, marking Google's 1000th model on Hugging Face. Kuaishou launched KLING 2.1, a new video model accessible via API.
Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview
gemini-2.5 gemini-2.5-flash-lite gemini-2.5-flash gemini-2.5-pro gemini-2.5-ultra kimi-dev-72b nanonets-ocr-s ii-medical-8b-1706 jan-nano deepseek-r1 minimax-m1 google moonshot-ai deepseek cognitivecompai kling-ai mixture-of-experts multimodality long-horizon-planning benchmarking coding-performance long-context ocr video-generation model-releases tulsee_doshi oriolvinyalsml demishassabis officiallogank _philschmid swyx sainingxie scaling01 gneubig clementdelangue mervenoyann
Gemini 2.5 models are now generally available, including the new Gemini 2.5 Flash-Lite, Flash, Pro, and Ultra variants, featuring sparse Mixture-of-Experts (MoE) transformers with native multimodal support. A detailed 30-page tech report highlights impressive long-horizon planning demonstrated by Gemini Plays Pokemon. The LiveCodeBench-Pro benchmark reveals frontier LLMs struggle with hard coding problems, while Moonshot AI open-sourced Kimi-Dev-72B, achieving state-of-the-art results on SWE-bench Verified. Smaller specialized models like Nanonets-OCR-s, II-Medical-8B-1706, and Jan-nano show competitive performance, emphasizing that bigger models are not always better. DeepSeek-r1 ties for #1 in WebDev Arena, and MiniMax-M1 sets new standards in long-context reasoning. Kling AI demonstrated video generation capabilities.
Google I/O: new Gemini native voice, Flash, DeepThink, AI Mode (DeepSearch+Mariner+Astra)
gemini-2.5-pro gemini-2.5 google google-deepmind ai-assistants reasoning generative-ai developer-tools ai-integration model-optimization ai-application model-updates ai-deployment model-performance demishassabis philschmid jack_w_rae
Google I/O 2024 showcased significant advancements with Gemini 2.5 Pro and Deep Think reasoning mode from google-deepmind, emphasizing AI-driven transformations and developer opportunities. GeminiApp aims to become a universal AI assistant on the path to AGI, with new features like AI Mode in Google Search expanding generative AI access. The event included multiple keynotes and updates on over a dozen models and 20+ AI products, highlighting Google's leadership in AI innovation. Influential voices like demishassabis and philschmid provided insights and recaps, while the launch of Jules as a competitor to Codex/Devin was noted.
ChatGPT Codex, OpenAI's first cloud SWE agent
codex-1 openai-o3 codex-mini gemma-3 blip3-o qwen-2.5 marigold-iid deepseek-v3 lightlab gemini-2.0 lumina-next openai runway salesforce qwen deepseek google google-deepmind j1 software-engineering parallel-processing multimodality diffusion-models depth-estimation scaling-laws reinforcement-learning fine-tuning model-performance multi-turn-conversation reasoning audio-processing sama kevinweil omarsar0 iscienceluvr akhaliq osanseviero c_valenzuelab mervenoyann arankomatsuzaki jasonwei demishassabis philschmid swyx teortaxestex jaseweston
OpenAI launched Codex, a cloud-based software engineering agent powered by codex-1 (an optimized version of OpenAI o3) available in research preview for Pro, Enterprise, and Team ChatGPT users, featuring parallel task execution like refactoring and bug fixing. The Codex CLI was enhanced with quick sign-in and a new low-latency model, codex-mini. Gemma 3 is highlighted as the best open model runnable on a single GPU. Runway released the Gen-4 References API for style transfer in generation. Salesforce introduced BLIP3-o, a unified multimodal model family using diffusion transformers for CLIP image features. The Qwen 2.5 models (1.5B and 3B versions) were integrated into the PocketPal app with various chat templates. Marigold IID, a new state-of-the-art open-source depth estimation model, was released.
In research, DeepSeek shared insights on scaling and hardware for DeepSeek-V3. Google unveiled LightLab, a diffusion-based light source control in images. Google DeepMind's AlphaEvolve uses Gemini 2.0 to discover new math and reduce costs without reinforcement learning. Omni-R1 studied audio's role in fine-tuning audio LLMs. Qwen proposed a parallel scaling law inspired by classifier-free guidance. Salesforce released Lumina-Next on the Qwen base, outperforming Janus-Pro. A study found LLM performance degrades in multi-turn conversations due to unreliability. J1 is incentivizing LLM-as-a-Judge thinking via reinforcement learning. A new Qwen study correlates question and strategy similarity to predict reasoning strategies.
AI Engineer World's Fair: Second Run, Twice The Fun
gemini-2.5-pro google-deepmind waymo tesla anthropic braintrust retrieval-augmentation graph-databases recommendation-systems software-engineering-agents agent-reliability reinforcement-learning voice image-generation video-generation infrastructure security evaluation ai-leadership enterprise-ai mcp tiny-teams product-management design-engineering robotics foundation-models coding web-development demishassabis
The 2025 AI Engineer World's Fair is expanding with 18 tracks covering topics like Retrieval + Search, GraphRAG, RecSys, SWE-Agents, Agent Reliability, Reasoning + RL, Voice AI, Generative Media, Infrastructure, Security, and Evals. New focuses include MCP, Tiny Teams, Product Management, Design Engineering, and Robotics and Autonomy featuring foundation models from Waymo, Tesla, and Google. The event highlights the growing importance of AI Architects and enterprise AI leadership. Additionally, Demis Hassabis announced the Gemini 2.5 Pro Preview 'I/O edition', which leads coding and web development benchmarks on LMArena.
Gemini 2.5 Pro Preview 05-06 (I/O edition) - the SOTA vision+coding model
gemini-2.5-pro claude-3.7-sonnet llama-nemotron qwen3 google-deepmind nvidia alibaba hugging-face multimodality coding reasoning model-release speech-recognition recommender-systems benchmarking demishassabis _philschmid lmarena_ai scaling01 fchollet
Gemini 2.5 Pro has been updated with enhanced multimodal image-to-code capabilities and dominates the WebDev Arena Leaderboard, surpassing Claude 3.7 Sonnet in coding and other tasks. Nvidia released the Llama-Nemotron model family on Hugging Face, noted for efficient reasoning and inference. Alibaba's Qwen3 models range from 0.6B to 235B parameters, including dense and MoE variants. KerasRS was released by Fran ois Chollet as a new recommender system library compatible with JAX, PyTorch, and TensorFlow, optimized for TPUs. These updates highlight advancements in coding, reasoning, and speech recognition models.