All tags
Company: "geminiapp"
Google I/O 2026: Gemini 3.5 Flash, Omni, and Google’s Agent Stack
gemini-3.5-flash gemini-3.1-pro gemini-3.5 gemini-omni google google-deepmind geminiapp agentic-ai multimodality video-generation model-performance benchmarking context-windows model-optimization model-scaling instruction-following api model-efficiency cost-analysis philschmid jeffdean
Google announced at I/O the repositioning of Gemini as a consumer AI and developer/agent platform with three key releases: Gemini 3.5 Flash for fast agentic and coding tasks, Gemini Omni for multimodal generation and editing including video, and the expanded Antigravity 2.0 agent stack. Google reports processing over 3.2 quadrillion tokens per month, a 7x increase year-over-year, with 900M+ monthly Gemini users across 230+ countries and 70+ languages. Gemini 3.5 Flash features a 1M-token context window, 65k max output tokens, 4 thinking levels, and "thought preservation" across turns, outperforming Gemini 3.1 Pro on multiple benchmarks and running up to 12x faster in Antigravity. Independent benchmarks show Gemini 3.5 Flash scoring 55 on the Intelligence Index, with higher costs than previous versions. Gemini Omni Flash supports text, image, video, and audio inputs for generative media tasks, available now for paid users.
Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2
gemini-3.1-pro gemini-3-deep-think google google-deepmind geminiapp reasoning benchmarking agentic-ai cost-efficiency hallucination code-generation model-release developer-tools sundarpichai demishassabis jeffdean koraykv noamshazeer joshwoodward artificialanlys arena oriolvinyalsml scaling01
Google released Gemini 3.1 Pro, a developer preview integrated across the Gemini app, NotebookLM, Gemini API / AI Studio, and Vertex AI, highlighting a significant reasoning improvement with ARC-AGI-2 = 77.1% and strong coding and agentic-tool benchmarks like SWE-Bench Verified = 80.6%. Independent evaluators such as Artificial Analysis and Arena confirmed top-tier performance and cost efficiency, though community reactions included excitement about practical gains, skepticism about benchmark targeting, and concerns over rollout inconsistencies. The release emphasizes the same core intelligence powering Gemini 3 Deep Think scaled for practical use, with notable mentions from leaders like @sundarpichai, @demishassabis, and @JeffDean.
new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5
gemini-3-deep-think-v2 arc-agi-2 google-deepmind google geminiapp arcprize benchmarking reasoning test-time-adaptation fluid-intelligence scientific-computing engineering-workflows 3d-modeling cost-analysis demishassabis sundarpichai fchollet jeffdean oriolvinyalsml tulseedoshi
Google DeepMind is rolling out the upgraded Gemini 3 Deep Think V2 reasoning mode to Google AI Ultra subscribers and opening early access to the Vertex AI / Gemini API for select users. Key benchmark achievements include ARC-AGI-2 at 84.6%, Humanity’s Last Exam (HLE) at 48.4% without tools, and a Codeforces Elo of 3455, showcasing Olympiad-level performance in physics and chemistry. The mode emphasizes practical scientific and engineering applications such as error detection in math papers, physical system modeling, semiconductor optimization, and a sketch to CAD/STL pipeline for 3D printing. ARC benchmark creator François Chollet highlights the benchmark's role in advancing test-time adaptation and fluid intelligence, projecting human-AI parity around 2030. This rollout is framed as a productized, compute-heavy test-time mode rather than a lab demo, with cost disclosures for ARC tasks provided.