All tags
Person: "sundarpichai"
Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch
kimi-k2-thinking gemini moonshot-ai google apple vllm_project arena baseten yupp_ai mixture-of-experts quantization int4 context-window agentic-ai benchmarking model-deployment inference-acceleration api performance-optimization eliebakouch nrehiew_ andrew_n_carr ofirpress artificialanlys sundarpichai akhaliq
Moonshot AI launched Kimi K2 Thinking, a 1 trillion parameter mixture-of-experts (MoE) model with 32 billion active experts, a 256K context window, and native INT4 quantization-aware training. It achieves state-of-the-art results on benchmarks like HLE (44.9%), BrowseComp (60.2%), and agentic tool use with 200-300 sequential tool calls. The model is deployed with vLLM support and OpenAI-compatible APIs, available on platforms like Arena, Baseten, and Yupp. Early user reports note some API instability under launch load. Meanwhile, Google announced the TPU v7 (Ironwood) with a 10× peak performance improvement over TPU v5p, aimed at training and agentic inference for models like Gemini. Apple added support for M5 Neural Accelerators in llama.cpp for inference acceleration.
not much happened today
trillium gemini-2.5-pro gemini-deepthink google huawei epoch-ai deutsche-telekom nvidia anthropic reka-ai weaviate deepmind energy-efficiency datacenters mcp context-engineering instruction-following embedding-models math-reasoning benchmarking code-execution sundarpichai yuchenj_uw teortaxestex epochairesearch scaling01 _avichawla rekaailabs anthropicai douwekiela omarsar0 nityeshaga goodside iscienceluvr lmthang
Google's Project Suncatcher prototypes scalable ML compute systems in orbit using solar energy with Trillium-generation TPUs surviving radiation, aiming for prototype satellites by 2027. China's 50% electricity subsidies for datacenters may offset chip efficiency gaps, with Huawei planning gigawatt-scale SuperPoDs for DeepSeek by 2027. Epoch launched an open data center tracking hub, and Deutsche Telekom and NVIDIA announced a $1.1B Munich facility with 10k GPUs. In agent stacks, MCP (Model-Compute-Platform) tools gain traction with implementations like LitServe, Claude Desktop, and Reka's MCP server for VS Code. Anthropic emphasizes efficient code execution with MCP. Context engineering shifts focus from prompt writing to model input prioritization, with reports and tools from Weaviate, Anthropic, and practitioners highlighting instruction-following rerankers and embedding approaches. DeepMind's IMO-Bench math reasoning suite shows Gemini DeepThink achieving high scores, with a ProofAutoGrader correlating strongly with human grading. Benchmarks and governance updates include new tasks and eval sharing in lighteval.
Claude Haiku 4.5
claude-3.5-sonnet claude-3-haiku claude-3-haiku-4.5 gpt-5 gpt-4.1 gemma-2.5 gemma o3 anthropic google yale artificial-analysis shanghai-ai-lab model-performance fine-tuning reasoning agent-evaluation memory-optimization model-efficiency open-models cost-efficiency foundation-models agentic-workflows swyx sundarpichai osanseviero clementdelangue deredleritt3r azizishekoofeh vikhyatk mirrokni pdrmnvd akhaliq sayashk gne
Anthropic released Claude Haiku 4.5, a model that is over 2x faster and 3x cheaper than Claude Sonnet 4.5, improving iteration speed and user experience significantly. Pricing comparisons highlight Haiku 4.5's competitive cost against models like GPT-5 and GLM-4.6. Google and Yale introduced the open-weight Cell2Sentence-Scale 27B (Gemma) model, which generated a novel, experimentally validated cancer hypothesis, with open-sourced weights for community use. Early evaluations show GPT-5 and o3 models outperform GPT-4.1 in agentic reasoning tasks, balancing cost and performance. Agent evaluation challenges and memory-based learning advances were also discussed, with contributions from Shanghai AI Lab and others. "Haiku 4.5 materially improves iteration speed and UX," and "Cell2Sentence-Scale yielded validated cancer hypothesis" were key highlights.
not much happened today
kling-2.5-turbo sora-2 gemini-2.5-flash granite-4.0 qwen-3 qwen-image-2509 qwen3-vl-235b openai google ibm alibaba kling_ai synthesia ollama huggingface arena artificialanalysis tinker scaling01 video-generation instruction-following physics-simulation image-generation model-architecture mixture-of-experts context-windows token-efficiency fine-tuning lora cpu-training model-benchmarking api workflow-automation artificialanlys kling_ai altryne teortaxestex fofrai tim_dettmers sundarpichai officiallogank andrew_n_carr googleaidevs clementdelangue wzhao_nlp alibaba_qwen scaling01 ollama
Kling 2.5 Turbo leads in text-to-video and image-to-video generation with competitive pricing. OpenAI Sora 2 shows strong instruction-following but has physics inconsistencies. Google Gemini 2.5 Flash "Nano Banana" image generation is now generally available with multi-image blending and flexible aspect ratios. IBM Granite 4.0 introduces a hybrid Mamba/Transformer architecture with large context windows and strong token efficiency, outperforming some peers on the Intelligence Index. Qwen models receive updates including fine-tuning API support and improved vision capabilities. Tinker offers a flexible fine-tuning API supporting LoRA sharing and CPU-only training loops. The ecosystem also sees updates like Synthesia 3.0 adding video agents.
nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion
gemini-2.5-flash-image-preview hermes-4 nemotron-nano-9b-v2 internvl3.5 gpt-oss qwen3 deepseek-v3.1 google-deepmind nous-research nvidia openai ollama huggingface openrouter image-editing natural-language-processing multi-image-composition character-consistency reasoning hybrid-models context-windows model-steerability pretraining finetuning alignment vision vision-language api model-integration sundarpichai _philschmid lmarena_ai omarsar0 skirano yupp_ai xanderatallah officiallogank mervenoyann
Google DeepMind revealed Gemini-2.5-Flash-Image-Preview, a state-of-the-art image editing model excelling in character consistency, natural-language edits, and multi-image composition, dominating the Image Edit Arena with a ~170-180 Elo lead and over 2.5M votes. It is integrated into multiple platforms including Google AI Studio and third-party services. Nous Research released Hermes 4, an open-weight hybrid reasoning model focused on steerability and STEM benchmarks. NVIDIA launched Nemotron Nano 9B V2, a hybrid Mamba-Transformer with 128k context, top-performing under 10B parameters, and released a 6.6T-token pretraining subset. InternVL3.5 introduced 32 vision-language models based on OpenAI's gpt-oss and Qwen3 backbones. Ollama v0.11.7 added DeepSeek v3.1 support with hybrid thinking and Turbo mode preview.
Genesis: Generative Physics Engine for Robotics (o1-mini version)
o1 o1-preview gpt-4o claude-3.5-sonnet gemini-2.0-pro llama-3-3b llama-3-70b openai google-deepmind meta-ai-fair hugging-face function-calling structured-outputs vision performance-benchmarks sdk webrtc reasoning math code-generation transformer-architecture model-training humanoid-robots search model-efficiency dataset-sharing aidan_mclau sundarpichai adcock_brett
OpenAI launched the o1 model API featuring function calling, structured outputs, vision support, and developer messages, achieving 60% fewer reasoning tokens than its preview. The model excels in math and code with a 0.76 LiveBench Coding score, outperforming Sonnet 3.5. Beta SDKs for Go and Java and WebRTC support with 60% lower prices were also released. Google Gemini 2.0 Pro (Gemini Exp 1206) deployment accelerated, showing improved coding, math, and reasoning performance. Meta AI FAIR introduced research on training transformers directly on raw bytes using dynamic entropy-based patching. Commercial humanoid robots were successfully deployed by an industry player. Hugging Face researchers demonstrated that their 3B Llama model can outperform the 70B Llama model on MATH-500 accuracy using search techniques, highlighting efficiency gains with smaller models. Concerns about reproducibility and domain-specific limitations were noted.
OpenAI Sora Turbo and Sora.com
sora-turbo o1 claude-3.5-sonnet claude-3.5 gemini llama-3-3-euryale-v2.3 mistral-large behemoth endurance-v1.1 openai google nvidia hugging-face mistral-ai text-to-video-generation quantum-computing coding-capabilities transformers algorithmic-innovation storytelling roleplay model-parameter-tuning anti-monopoly-investigation sama sundarpichai bindureddy denny_zhou nrehiew_
OpenAI launched Sora Turbo, enabling text-to-video generation for ChatGPT Plus and Pro users with monthly generation limits and regional restrictions in Europe and the UK. Google announced a quantum computing breakthrough with the development of the Willow chip, potentially enabling commercial quantum applications. Discussions on O1 model performance highlighted its lag behind Claude 3.5 Sonnet and Gemini in coding tasks, with calls for algorithmic innovation beyond transformer scaling. The Llama 3.3 Euryale v2.3 model was praised for storytelling and roleplay capabilities, with users suggesting parameter tuning to reduce creative liberties and repetition. Alternatives like Mistral-Large, Behemoth, and Endurance v1.1 were also noted. Additionally, Nvidia faces an anti-monopoly investigation in China. Memes and humor around GPU issues and embargo mishaps were popular on social media.