All tags
Topic: "agent-development"
Grok 3 & 3-mini now API Available
grok-3 grok-3-mini gemini-2.5-flash o3 o4-mini llama-4-maverick gemma-3-27b openai llamaindex google-deepmind epochairesearch goodfireai mechanize agent-development agent-communication cli-tools reinforcement-learning model-evaluation quantization-aware-training model-compression training-compute hybrid-reasoning model-benchmarking
Grok 3 API is now available, including a smaller version called Grok 3 mini, which offers competitive pricing and full reasoning traces. OpenAI released a practical guide for building AI agents, while LlamaIndex supports the Agent2Agent protocol for multi-agent communication. Codex CLI is gaining traction with new features and competition from Aider and Claude Code. GoogleDeepMind launched Gemini 2.5 Flash, a hybrid reasoning model topping the Chatbot Arena leaderboard. OpenAI's o3 and o4-mini models show emergent behaviors from large-scale reinforcement learning. EpochAIResearch updated its methodology, removing Maverick from high FLOP models as Llama 4 Maverick training compute drops. GoodfireAI announced a $50M Series A for its Ember neural programming platform. Mechanize was founded to build virtual work environments and automation benchmarks. GoogleDeepMind's Quantisation Aware Training for Gemma 3 models reduces model size significantly, with open source checkpoints available.
Google wakes up: Gemini 2.0 et al
gemini-2.0-flash gemini-1.5-pro gemini-exp-1206 claude-3.5-sonnet opus google-deepmind openai apple multimodality agent-development multilinguality benchmarking model-releases demis-hassabis sundar-pichai paige-bailey bindureddy
Google DeepMind launched Gemini 2.0 Flash, a new multimodal model outperforming Gemini 1.5 Pro and o1-preview, featuring vision and voice APIs, multilingual capabilities, and native tool use. It powers new AI agents like Project Astra and Project Mariner, with Project Mariner achieving state-of-the-art 83.5% on the WebVoyager benchmark. OpenAI announced ChatGPT integration with Apple devices, enabling Siri access and visual intelligence features. Claude 3.5 Sonnet is noted as a distilled version of Opus. The AI community's response at NeurIPS 2024 has been overwhelmingly positive, signaling a strong comeback for Google in AI innovation. Key topics include multimodality, agent development, multilinguality, benchmarking, and model releases.