All tags
Model: "grok-4.1"
Gemini 3 Pro — new GDM frontier model 6, Gemini 3 Deep Think, and Antigravity IDE
gemini-3-pro gemini-2.5 grok-4.1 sonnet-4.5 gpt-5.1 google google-deepmind multimodality agentic-ai benchmarking context-window model-performance instruction-following model-pricing api model-release reasoning model-evaluation sundarpichai _philschmid oriol_vinyals
Google launched Gemini 3 Pro, a state-of-the-art model with a 1M-token context window, multimodal reasoning, and strong agentic capabilities, priced significantly higher than Gemini 2.5. It leads major benchmarks, surpassing Grok 4.1 and competing closely with Sonnet 4.5 and GPT-5.1, though GPT-5.1 excels in ultralong summarization. Independent evaluations from Artificial Analysis, Vending Bench, ARC-AGI 2, Box, and PelicanBench validate Gemini 3 as a frontier LLM. Google also introduced Antigravity, an agentic IDE powered by Gemini 3 Pro and other models, featuring task orchestration and human-in-the-loop validation. The launch marks Google's strong return to AI with more models expected soon. "Google is very, very back in the business."
xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing
grok-4.1 gpt-5.1 claude-4.1-opus grok-4 gpt-5 grok-4.1-thinking gpt-5-pro claude-4.5-haiku xai openai google-deepmind sakana-ai anthropic microsoft mufg khosla nea lux-capital iqt model-performance creative-writing hallucination evaluation-datasets ensemble-models weather-forecasting funding efficiency anti-hallucination arc-agi model-scaling yanndubs gregkamradt philschmid willccbb
xAI launched Grok 4.1, achieving a #1 rank on the LM Arena Text Leaderboard with an Elo score of 1483, showing improvements in creative writing and anti-hallucination. OpenAI's GPT-5.1 "Thinking" demonstrates efficiency gains with ~60% less "thinking" on easy queries and strong ARC-AGI performance. Google DeepMind released WeatherNext 2, an ensemble generative model that is 8× faster and more accurate for global weather forecasts, integrated into multiple Google products. Sakana AI raised ¥20B ($135M) in Series B funding at a $2.63B valuation to focus on efficient AI for resource-constrained enterprise applications in Japan. New evaluations highlight tradeoffs between hallucination and knowledge accuracy across models including Claude 4.1 Opus and Anthropic models.