Person: "walden_yan"

fable-5 mythos anthropic model-performance trust data-retention benchmarking agentic-ai coding policy darioamodei natolambert martin_casado drfeifei antirez clementdelangue deanwball hlntnr _arohan_ dbahdanau gergelyorosz scaling01 dbreunig omarsar0 yacinemtb mchlhess jasonbotterill lvwerra lechmazur kimmonismus walden_yan hrishioa

Anthropic faced backlash for silently degrading AI research capabilities in its Fable/Mythos models without clear disclosure, raising concerns about trust, reproducibility, and enterprise data retention policies. Despite controversy, Fable 5 demonstrated strong benchmark performance, leading in agentic and coding tasks with high scores on Agent Arena, SimpleBench, CADGenBench, and PACT. Dario Amodei published a policy advocating stronger frontier AI oversight amid these tensions.

May 18

not much happened today

claude-code codex composer-2.5 langchain cognition anthropic openai microsoft cursor agent-automation agent-observability ci-cd prompt-caching remote-execution verification decomposition feedback-loops coding-agents model-efficiency instruction-following krishdpi walden_yan russelljkaplan fchollet gabriberton palashshah shannholmberg

Agent infrastructure is advancing with LangSmith Engine providing CI/CD loops for agents and SmithDB enabling low-latency querying for observability. Cognition's Devin Auto-Triage offers persistent automation for bug triage with memory and subagent structures. Anthropic improves Claude Code for large codebases with prompt cache diagnostics and faster modes, while OpenAI enhances Codex workflows with remote execution and plugins. Microsoft released remote control for GitHub Copilot CLI and VS Code. The community emphasizes verification, decomposition, and feedback loops over prompt cleverness for coding agents. Cursor's Composer 2.5 is highlighted as a strong new coding model, with plans for a larger model trained with SpaceXAI using 10× more compute on Colossus 2 hardware, praised for efficiency and collaboration improvements.

Apr 10

not much happened today

glm-5.1 gemini-3.1 gpt-5.4 claude-3-sonnet haiku opus sonnet qwen-3.6-plus qwen3-coder-next-80b z-ai anthropic berkeley langchain alibaba openai model-performance agent-frameworks orchestration model-routing fine-tuning agent-harness model-selection workflow-automation zixuan_li akshay_pachaar harrison_chase walden_yan yuchen_jin sentdex

GLM-5.1 has reached #3 on Code Arena, surpassing Gemini 3.1 and GPT-5.4, and matching Claude Sonnet 4.6 in coding performance. Z.ai now holds the #1 open model rank close to the top overall. The advisor pattern, combining a cheap executor with an expensive advisor, is gaining traction, improving performance and efficiency in models like Haiku + Opus and Sonnet + Opus. Alibaba's Qwen Code v0.14.x introduces orchestration features including remote control channels, cron tasks, and sub-agent model selection. Model routing is becoming a product-level concern due to specialization and spikiness in top models such as Opus and GPT-5.4. The Hermes Agent ecosystem shows strong momentum with a new workspace mobile app, FAST mode for OpenAI/GPT-5.4, and over 50k GitHub stars. Practitioners report Hermes as a reliable agent framework, with local Qwen3-Coder-Next 80B 4-bit replacing parts of workflows previously reliant on Claude Code. The harness layer is emerging as a key abstraction in agent frameworks.

Jun 25, 2025

Context Engineering: Much More than Prompts

gemini-code openai langchain cognition google-deepmind vercel cloudflare openrouter context-engineering retrieval-augmented-generation tools state-management history-management prompt-engineering software-layer chatgpt-connectors api-integration karpathy walden_yan tobi_lutke hwchase17 rlancemartin kwindla dex_horthy

Context Engineering emerges as a significant trend in AI, highlighted by experts like Andrej Karpathy, Walden Yan from Cognition, and Tobi Lutke. It involves managing an LLM's context window with the right mix of prompts, retrieval, tools, and state to optimize performance, going beyond traditional prompt engineering. LangChain and its tool LangGraph are noted for advancing this approach. Additionally, OpenAI has launched ChatGPT connectors for platforms like Google Drive, Dropbox, SharePoint, and Box, enhancing context integration for Pro users. Other notable news includes the launch of Vercel Sandbox, Cloudflare Containers, the leak and release of Gemini Code by Google DeepMind, and fundraising efforts by OpenRouter.

Jun 13, 2025

Cognition vs Anthropic: Don't Build Multi-Agents/How to Build Multi-Agents

claude cognition anthropic langchain huggingface microsoft llamaindex linkedin blackrock multi-agent-systems context-engineering agent-memory model-elicitation ai-evaluation deep-research-workflows framework-migration pydantic-schema walden_yan hwchase17 assaf_elovic sh_reya hamelhusain omarsar0 clefourrier jerryjliu0 akbirkhan

Within the last 24 hours, Cognition's Walden Yan advised "Don't Build Multi-Agents," while Anthropic shared their approach to building multi-agent systems with Claude's multi-agent research architecture. LangChain highlighted advances in context engineering and production AI agents used by LinkedIn and BlackRock. The community is engaging in a debate on multi-agent AI development. Additionally, Hugging Face announced deprecating TensorFlow and Flax support in favor of PyTorch. Research on agent memory and model elicitation techniques from LlamaIndex and Anthropic were also discussed.