All tags
Topic: "hybrid-reasoning"
Grok 3 & 3-mini now API Available
grok-3 grok-3-mini gemini-2.5-flash o3 o4-mini llama-4-maverick gemma-3-27b openai llamaindex google-deepmind epochairesearch goodfireai mechanize agent-development agent-communication cli-tools reinforcement-learning model-evaluation quantization-aware-training model-compression training-compute hybrid-reasoning model-benchmarking
Grok 3 API is now available, including a smaller version called Grok 3 mini, which offers competitive pricing and full reasoning traces. OpenAI released a practical guide for building AI agents, while LlamaIndex supports the Agent2Agent protocol for multi-agent communication. Codex CLI is gaining traction with new features and competition from Aider and Claude Code. GoogleDeepMind launched Gemini 2.5 Flash, a hybrid reasoning model topping the Chatbot Arena leaderboard. OpenAI's o3 and o4-mini models show emergent behaviors from large-scale reinforcement learning. EpochAIResearch updated its methodology, removing Maverick from high FLOP models as Llama 4 Maverick training compute drops. GoodfireAI announced a $50M Series A for its Ember neural programming platform. Mechanize was founded to build virtual work environments and automation benchmarks. GoogleDeepMind's Quantisation Aware Training for Gemma 3 models reduces model size significantly, with open source checkpoints available.
Claude 3.7 Sonnet
claude-3-7-sonnet claude-3 claude-code anthropic hybrid-reasoning extended-thinking coding-benchmarks agentic-ai prompt-caching streaming token-capacity tool-use
Anthropic launched Claude 3.7 Sonnet, their most intelligent model to date featuring hybrid reasoning with two thinking modes: near-instant and extended step-by-step thinking. The release includes Claude Code, an agentic coding tool in limited preview, and supports a 128k output token capability in beta. Claude 3.7 Sonnet performs well on coding benchmarks like SWE-Bench Verified and Cognition's junior-dev eval, and introduces advanced features such as streaming thinking, prompt caching, and tool use. The model is also benchmarked on Pokebench, reflecting agentic capabilities similar to the Voyager paper. The launch is accompanied by extensive documentation, cookbooks, and prompting guides for extended thinking. "The first generally available hybrid reasoning model" and "first coding tool from Anthropic" were highlighted in social media announcements.