All tags
Person: "boazbaraktcs"
not much happened today
mythos anthropic openai langchain nous-research cybersecurity sandboxing reinforcement-learning agent-architecture memory-management model-deployment software-security evaluation-methods kimmonismus paul_cal gneubig kentonvarda boazbaraktcs ylecun deanwball hwchase17 vtrivedy10 sarahcat21 aijoey
Anthropic's Mythos and OpenAI's upcoming restricted cyber-capable models are central to recent discussions, with debates on their security realism and evaluation methods. LangChain's Deep Agents deploy introduces an open memory, model-agnostic agent harness architecture emphasizing open protocols and memory ownership. Sandboxes are gaining prominence as a core infrastructure for reinforcement learning, with labs running up to 100K concurrent sandboxes aiming for 1M. The Hermes Agent by Nous continues to gain traction with new integrations and features like a web-based HUD and token cost tracking.
ChatGPT Agent: new o* model + unified Deep Research browser + Operator computer use + Code Interpreter terminal
o3 o4 gptnext openai reinforcement-learning benchmarking model-performance model-risk long-context model-deployment fine-tuning sama gdb kevinweil xikun_zhang_ keren_gu boazbaraktcs
OpenAI launched the ChatGPT Agent, a new advanced AI system capable of browsing the web, coding, analyzing data, and creating reports, marking a significant step towards human-like computer use. The agent, distinct from and superior to o3, is considered the first public exposure of what was internally called o4, now merged into GPTNext. It features end-to-end reinforcement learning, can operate for extended periods (tested up to 2 hours), and is classified as "High" risk for biological misuse, with safeguards activated. Early benchmarks show mixed results, excelling in some tests like WebArena and BrowserComp but underperforming on others like PaperBench. Key figures involved include Sam Altman, Greg Brockman, and Kevin Weil, with technical insights from xikun_zhang_ and risk commentary from KerenGu and boazbaraktcs. The launch sparked speculation about GPT-5, which was confirmed not to be the case.