All tags
Topic: "agent-infrastructure"
not much happened today
gpt-5.4 openai anthropic uber nous-research cursor_ai redisinc artificialanlys langchain-js agent-infrastructure mcp-protocol harnesses coding-agents evaluation-methodologies agent-ui-ux runtime-environments multi-axis-evaluation automation workflow-optimization open-agent-platforms provider-integration filesystem-checkpoints mattturck hwchase17 omarsar0 gergelyorosz htihle theprimeagen sydneyrunkle corbtt
Harnesses, agent infrastructure, and the MCP protocol are central themes, with emphasis on how harnesses, sandboxes, filesystem access, skills, memory, and observability shape agent UI/UX and runtime environments. Despite jokes about MCP's demise, it remains vital in production, notably used internally by Uber and supported by Anthropic. The coding-agent stack is evolving with CursorBench combining offline and online metrics to evaluate models on intelligence and efficiency, where GPT-5.4 leads in correctness and token efficiency. Agent-assisted development is splitting between automation-heavy workflows and "stay-in-the-loop" tooling, with OpenAI advancing Codex Automations featuring worktree vs. branch choices and UI customization. The open agent platform Hermes Agent v0.2.0 introduces full MCP client support, ACP server for editors, and expanded provider integrations including OpenAI OAuth.
not much happened today
nemotron-3-super gpt-oss-120b qwen3.5-122b-a10b nvidia perplexity replit base44 vllm llama.cpp ollama togethercompute baseten wandb langchain unsloth model-architecture model-optimization inference-speed kv-cache multi-token-prediction agent-infrastructure orchestration persistent-agents model-serving product-launches karpathy ctnzr bnjmn_marie artificialanlys
NVIDIA’s Nemotron 3 Super is a 120B parameter / ~12B active open model featuring a hybrid Mamba-Transformer / SSM Latent MoE architecture and 1M context window, delivering up to 2.2x faster inference than GPT-OSS-120B in FP4 with strong throughput gains. It supports agentic workloads and is unusually open with weights, data, and infrastructure details released. The model scored 36 on the AA Intelligence Index, outperforming GPT-OSS-120B but behind Qwen3.5-122B-A10B. Community and infrastructure support from projects like vLLM, llama.cpp, Ollama, Together, Baseten, W&B Inference, LangChain, and Unsloth GGUFs was immediate. Key technical innovations include native multi-token prediction (MTP) and a significant KV-cache efficiency advantage.
On the product side, a shift towards persistent agent runtimes and orchestration layers is highlighted, with Andrej Karpathy advocating for a "bigger IDE" concept where agents replace files as the unit of work, enabling legible, forkable agentic organizations with real-time control. New launches fitting this vision include Perplexity’s Personal Computer, an always-on local/cloud hybrid running on Mac mini, and Computer for Enterprise orchestrating 20 specialized models and 400+ apps. Replit Agent 4 offers a collaborative, canvas-like workflow with parallel agents, while Base44 Superagents provide integrated solutions for nontechnical users. The engineering focus is increasingly on the orchestration harness rather than just the model.
not much happened today
opus-4.6 glm-5 anthropic ibm perplexity-ai llamaindex deepseek google-chrome persistent-memory agent-infrastructure cross-device-synchronization long-context sparse-attention inference-optimization computer-architecture task-completion systems-performance pamelafox tadasayy llama_index bromann dair_ai omarsar0 abxxai teknuim bcherny kimmonismus _catwu alexalbert__ realyushibai
MCP tools remain relevant for deterministic APIs despite ergonomic criticisms, with new web MCP support in Chrome v146 enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for agents, with IBM improving task completion rates and multi-agent memory framed as a computer architecture challenge. Agent UX is evolving towards always-on, cross-device operation, exemplified by Perplexity Computer on iOS and Claude Code session management. Anthropic released Opus 4.6 1M context as default with no extra long-context API charges, achieving 78.3% on MRCR v2 at 1M tokens. Sparse attention optimizations like IndexCache in DeepSeek Sparse Attention yield significant speedups on large models with minimal code changes.