All tags
Person: "bcherny"
Anthropic's Claude Opus 4.7
claude-opus-4.7 codex gpt-rosalind anthropic openai cursor replit perplexity-ai microsoft coding agentic-ai tokenization long-context benchmarking image-processing software-engineering computer-use plugin-integration multi-terminal-support ssh-access model-expansion bcherny kimmonismus scaling01 valsai artificialanlys natolambert nrehiew_
Anthropic launched Claude Opus 4.7, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved long-context handling with a new xhigh reasoning tier. Benchmarks show substantial gains, including SWE-bench Pro 64.3%, SWE-bench Verified 87.6%, and TerminalBench 69.4%, with top rankings on Vals Index and GDPval-AA. Technical changes include a new tokenizer and increased image input resolution to 3.75MP. Some long-context benchmarks showed mixed results, with a shift in focus from MRCR to Graphwalks. Adoption was rapid across tools like Cursor, VS Code, Replit Agent, and Perplexity. Meanwhile, OpenAI expanded Codex into a broader computer agent with Mac computer use, in-app browser, image generation/editing, 90+ plugins, multi-terminal support, SSH remote devbox access, and richer file previews. A new vertical life-sciences model, GPT-Rosalind, was also introduced.
not much happened today
opus-4.6 glm-5 anthropic ibm perplexity-ai llamaindex deepseek google-chrome persistent-memory agent-infrastructure cross-device-synchronization long-context sparse-attention inference-optimization computer-architecture task-completion systems-performance pamelafox tadasayy llama_index bromann dair_ai omarsar0 abxxai teknuim bcherny kimmonismus _catwu alexalbert__ realyushibai
MCP tools remain relevant for deterministic APIs despite ergonomic criticisms, with new web MCP support in Chrome v146 enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for agents, with IBM improving task completion rates and multi-agent memory framed as a computer architecture challenge. Agent UX is evolving towards always-on, cross-device operation, exemplified by Perplexity Computer on iOS and Claude Code session management. Anthropic released Opus 4.6 1M context as default with no extra long-context API charges, achieving 78.3% on MRCR v2 at 1M tokens. Sparse attention optimizations like IndexCache in DeepSeek Sparse Attention yield significant speedups on large models with minimal code changes.