All tags
Company: "figma"
not much happened today
molmo-2-4b molmo-2-8b hermes-agent-v0.4.0 anthropic figma github cursor_ai langchain nous-research ai2 genreasoning zhipu-ai huggingface agent-infrastructure multi-agent-systems orchestration computer-use tool-calling design-canvases open-agent-platforms reinforcement-learning-environments benchmarking rl-environments self-improvement api memory-optimization
Anthropic advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. Figma, GitHub, and Cursor launch design canvases with direct AI editing, showcasing tool-calling becoming product-native. Nous Research releases Hermes Agent v0.4.0 with 300+ PRs, adding OpenAI-compatible APIs and self-improving memory agents. Open agent ecosystems mature with AI2's MolmoWeb (4B and 8B models), GenReasoning's OpenReward platform offering 330+ RL environments and 4.5M+ tasks, and Zhipu's ZClawBench benchmark with 116 real-world agent tasks, highlighting progress toward standardized environment serving and benchmarkable agent tasks.
not much happened today
gpt-5.4 gpt-5.2 gemini-3.1-pro openai artificial-analysis gemini claude mit figma github benchmarking physics-reasoning agentic-coding hallucination-detection context-windows cost-efficiency agent-prompting scheduled-tasks loop-patterns ai-evaluation design-code-integration agent-orchestration open-source
OpenAI rolled out GPT-5.4, achieving tied #1 on the Artificial Analysis Intelligence Index with Gemini 3.1 Pro Preview scoring 57 (up from 51 for GPT-5.2 xhigh). GPT-5.4 features a larger ~1.05M token context window and higher per-token prices ($2.50/$15 vs $1.75/$14 for GPT-5.2), with strengths in physics reasoning (CritPt) and agentic coding (TerminalBench Hard) but a higher hallucination rate and ~28% higher benchmark run cost. The GPT-5.4 Pro variant shows a +10 point jump on CritPt reaching 30% but at an extreme output token cost of $180 / 1M tokens. Community benchmarks show GPT-5.4 excels in agentic/coding tasks but mixed feedback on reasoning efficiency and literalness compared to Claude. OpenAI updated agent prompting guidance for GPT-5.4 API users, emphasizing tool use, structured outputs, and verification loops. Claude Code added local scheduled tasks and loop patterns for agents. The MCP framework is highlighted as a connective tissue for AI evaluation and design-code round-trips, with Truesight MCP enabling AI evaluation like unit testing and Figma MCP server supporting bidirectional design-code integration. Open-source T3 Code launched as an agent orchestration coding app built on Codex CLI.
not much happened today
Poolside raised $1B at a $12B valuation. Eric Zelikman raised $1B after leaving Xai. Weavy joined Figma. New research highlights FP16 precision reduces training-inference mismatch in reinforcement-learning fine-tuning compared to BF16. Kimi AI introduced a hybrid KDA (Kimi Delta Attention) architecture improving long-context throughput and RL stability, alongside a new Kimi CLI for coding with agent protocol support. OpenAI previewed Agent Mode in ChatGPT enabling autonomous research and planning during browsing.
OpenAI Dev Day: Apps SDK, AgentKit, Codex GA, GPT‑5 Pro and Sora 2 APIs
gpt-5-pro gpt-realtime-mini-2025-10-06 gpt-audio-mini-2025-10-06 gpt-image-1-mini sora-2 sora-2-pro openai canva figma zillow coursera api model-release fine-tuning agentic-ai code-generation model-deployment pricing prompt-optimization software-development multimodality sama edwinarbus gdb dbreunig stevenheidel
OpenAI showcased major product launches at their DevDay including the Apps SDK, AgentKit, and Codex now generally available with SDK and enterprise features. They introduced new models such as gpt-5-pro, gpt-realtime-mini-2025-10-06, gpt-audio-mini-2025-10-06, gpt-image-1-mini, and sora-2 with a pro variant. The Apps SDK enables embedding interactive apps inside ChatGPT with partners like Canva, Figma, Zillow, and Coursera. AgentKit offers a full stack for building and deploying production agents with tools like ChatKit and Guardrails. Codex supports speech and controller-driven coding, credited with high internal shipping velocity. Pricing for GPT-5 Pro was revealed at $15 input and $120 output per million tokens. "OpenAI turned ChatGPT into an application platform" and "AgentKit built a working agent in under 8 minutes" were highlights.