All tags
Person: "noam-shazeer"
Gemini 2.5 Pro + 4o Native Image Gen
gemini-2.5-pro gpt-4o google-deepmind openai lmarena_ai autoregressive-models multimodality reasoning coding instruction-following model-release leaderboards noam-shazeer allan-jabri gabe-goh
Gemini 2.5 Pro from Google DeepMind has become the new top AI model, surpassing Grok 3 by 40 LMarena points, with contributions from Noam Shazeer integrating Flash Thinking techniques. It is available as a free, rate-limited experimental model. Meanwhile, OpenAI released GPT 4o Native Images, an autoregressive image generation model with detailed insights shared by Allan Jabri and credits to Gabe Goh. Gemini 2.5 Pro excels in reasoning, coding, STEM, multimodal tasks, and instruction following, topping the LMarena leaderboard significantly. It is accessible via Google AI Studio and the Gemini App.
Project Stargate: $500b datacenter (1.7% of US GDP) and Gemini 2 Flash Thinking 2
gemini-2.0-flash deepseek-r1 qwen-32b openai softbank oracle arm microsoft nvidia huggingface deepseek-ai long-context quantization code-interpretation model-distillation open-source agi-research model-performance memory-optimization noam-shazeer liang-wenfeng
Project Stargate, a US "AI Manhattan project" led by OpenAI and Softbank, supported by Oracle, Arm, Microsoft, and NVIDIA, was announced with a scale comparable to the original Manhattan project costing $35B inflation adjusted. Despite Microsoft's reduced role as exclusive compute partner, the project is serious but not immediately practical. Meanwhile, Noam Shazeer revealed a second major update to Gemini 2.0 Flash Thinking, enabling 1M token long context usable immediately. Additionally, AI Studio introduced a new code interpreter feature. On Reddit, DeepSeek R1, a distillation of Qwen 32B, was released for free on HuggingChat, sparking discussions on self-hosting, performance issues, and quantization techniques. DeepSeek's CEO Liang Wenfeng highlighted their focus on fundamental AGI research, efficient MLA architecture, and commitment to open-source development despite export restrictions, positioning DeepSeek as a potential alternative to closed-source AI trends.
Execuhires: Tempting The Wrath of Khan
gemini-1.5-pro gpt-4o claude-3.5 flux-1 llama-3-1-405b character.ai google adept amazon inflection microsoft stability-ai black-forest-labs schelling google-deepmind openai anthropic meta-ai-fair lmsys langchainai execuhire model-benchmarking multilinguality math coding text-to-image agent-ide open-source-models post-training data-driven-performance noam-shazeer mostafa-mostaque david-friedman rob-rombach alexandr-wang svpino rohanpaul_ai
Character.ai's $2.5b execuhire to Google marks a significant leadership move alongside Adept's $429m execuhire to Amazon and Inflection's $650m execuhire to Microsoft. Despite strong user growth and content momentum, Character.ai's CEO Noam Shazeer returns to Google, signaling shifting vibes in the AI industry. Google DeepMind's Gemini 1.5 Pro tops Chatbot Arena benchmarks, outperforming GPT-4o and Claude-3.5, excelling in multilingual, math, and coding tasks. The launch of Black Forest Labs' FLUX.1 text-to-image model and LangGraph Studio agent IDE highlight ongoing innovation. Llama 3.1 405B is released as the largest open-source model, fostering developer use and competition with closed models. The industry is focusing increasingly on post-training and data as key competitive factors, raising questions about acquisition practices and regulatory scrutiny.
Shazeer et al (2024): you are overpaying for inference >13x
claude-3.5-sonnet claude-3-opus character.ai anthropic memory-efficiency kv-cache attention-mechanisms stateful-caching int8-precision transformer-architecture scaling overfitting architecture noam-shazeer kevin-a-fischer sebastien-bubeck _aidan_clark_ andrej-karpathy
Noam Shazeer explains how Character.ai serves 20% of Google Search Traffic for LLM inference while reducing serving costs by a factor of 33 compared to late 2022, with leading commercial APIs costing at least 13.5X more. Key memory-efficiency techniques include MQA > GQA reducing KV cache size by 8X, hybrid attention horizons, cross-layer KV-sharing, stateful caching with a 95% cache rate, and native int8 precision with custom kernels. Anthropic released Claude 3.5 Sonnet, which outperforms Claude 3 Opus at twice the speed and one-fifth the cost, passing 64% of internal pull request tests and introducing new features like Artifacts for real-time doc and code generation. Discussions on LLM architecture highlight the dominance of transformers, challenges in scaling and overfitting, and the importance of architecture work for progress.