All tags
Company: "arena"
not much happened today
claude-3-sonnet claude-3-opus gpt-5-codex grok-4-fast qwen-3-next gemini-2.5-pro sora-2-pro ray-3 kling-2.5 veo-3 modernvbert anthropic x-ai google google-labs openai arena epoch-ai mit luma akhaliq coding-agents cybersecurity api model-taxonomy model-ranking video-generation benchmarking multi-modal-generation retrieval image-text-retrieval finbarrtimbers gauravisnotme justinlin610 billpeeb apples_jimmy akhaliq
Anthropic announces a new CTO. Frontier coding agents see updates with Claude Sonnet 4.5 showing strong cybersecurity and polished UX but trailing GPT-5 Codex in coding capability. xAI Grok Code Fast claims higher edit success at lower cost. Google's Jules coding agent launches a programmable API with CI/CD integration. Qwen clarifies its model taxonomy and API tiers. Vision/LM Arena rankings show a tight competition among Claude Sonnet 4.5, Claude Opus 4.1, Gemini 2.5 Pro, and OpenAI's latest models. In video generation, Sora 2 Pro leads App Store rankings with rapid iteration and a new creator ecosystem; early tests show it answers GPQA-style questions at 55% accuracy versus GPT-5's 72%. Video Arena adds new models like Luma's Ray 3 and Kling 2.5 for benchmarking. Multi-modal video+audio generation model Ovi (Veo-3-like) is released. Retrieval models include ModernVBERT from MIT with efficient image-text retrieval capabilities. "Claude Sonnet 4.5 is basically the same as Opus 4.1 for coding" and "Jules is a programmable team member" highlight key insights.
not much happened today
kling-2.5-turbo sora-2 gemini-2.5-flash granite-4.0 qwen-3 qwen-image-2509 qwen3-vl-235b openai google ibm alibaba kling_ai synthesia ollama huggingface arena artificialanalysis tinker scaling01 video-generation instruction-following physics-simulation image-generation model-architecture mixture-of-experts context-windows token-efficiency fine-tuning lora cpu-training model-benchmarking api workflow-automation artificialanlys kling_ai altryne teortaxestex fofrai tim_dettmers sundarpichai officiallogank andrew_n_carr googleaidevs clementdelangue wzhao_nlp alibaba_qwen scaling01 ollama
Kling 2.5 Turbo leads in text-to-video and image-to-video generation with competitive pricing. OpenAI Sora 2 shows strong instruction-following but has physics inconsistencies. Google Gemini 2.5 Flash "Nano Banana" image generation is now generally available with multi-image blending and flexible aspect ratios. IBM Granite 4.0 introduces a hybrid Mamba/Transformer architecture with large context windows and strong token efficiency, outperforming some peers on the Intelligence Index. Qwen models receive updates including fine-tuning API support and improved vision capabilities. Tinker offers a flexible fine-tuning API supporting LoRA sharing and CPU-only training loops. The ecosystem also sees updates like Synthesia 3.0 adding video agents.