All tags
Person: "badlogicgames"
not much happened today
Hermes Agent is gaining attention as a leading open agent stack with features like self-improving skills, persistent memory, and a self-improvement loop. Its new Manim skill enables generation of math/technical animations, expanding agent capabilities. The Hermes ecosystem is rapidly growing with GUI tools, WebUI, HUD updates, OAuth support, and integrations. An open training-data movement for agents is emerging, focusing on sharing reusable behavioral data and harness traces. Meanwhile, Anthropic's Claude Code faces distribution and policy challenges, with reports of restrictions and unreliability impacting third-party coding agents, highlighting issues with subscription economics for always-on agents. "Claude Code now errors if used to analyze Claude Code source" and "basically unusable" are key community sentiments.
OpenAI Codex App: death of the VSCode fork, multitasking worktrees, Skills Automations
codex openai agent-based-systems parallel-processing software-testing developer-workflows automation product-feedback-loop neurosymbolic-ai benchmarking sama reach_vb gdb skirano embirico ajambrosino thsottiaux nbaschez yuchenj_uw badlogicgames random_walker
OpenAI launched the Codex app on macOS as a dedicated agent-native command center for coding, featuring multiple agents in parallel, built-in worktrees for conflict isolation, skills for reusable bundles, and scheduled automations. The app emphasizes developer workflows like Plan mode for upfront task decomposition and is gaining positive adoption signals from insiders including @sama. There is movement towards ecosystem standardization of skills folders, signaling early conventions in agent tooling. Codex also exemplifies a "self-improving" product feedback loop combining humans and agents. In coding agents practice, best practices include a "test-first" approach to bug fixes, the "conductor" model where one developer manages 5-10 agents in parallel, and a neurosymbolic framing explaining why coding agents succeed due to software's verifiability and symbolic tooling. Benchmark skepticism remains about productivity studies that do not reflect agentic workflows.