All tags
Person: "levie"
not much happened today
gpt-5.5-cyber mythos fable glm-5.2 openai anthropic sakana-ai-labs vercel artificial-analysis cybersecurity closed-loop-patch-generation model-orchestration test-time-scaling agentic-ai model-selection infrastructure-adoption benchmarking cost-accounting sama blackhc shashj levie audreyt eliebakouch blancheminerva
OpenAI expanded its Daybreak program with the GPT-5.5-Cyber model, focusing on closed-loop patch generation for cybersecurity, scanning over 30 million commits and covering major projects like cURL and Python. The release sparked debate on policy and export controls, contrasting with Anthropic's restricted Mythos/Fable access. Sakana Fugu introduced an orchestration API that learns model selection and delegation across multiple models, but faced criticism for opaque baselines and cost reporting. Meanwhile, GLM-5.2 is gaining attention as an open-weight model suitable for agentic applications and infrastructure adoption. "The notable shift is from 'find bugs' to closed-loop patch generation with human review" and "test-time coordination can beat monolithic calls on long-horizon tasks" highlight key technical insights.
Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched
kimi-k2-0905 qwen-3-max qwen-3 moonshot-ai alibaba huggingface together-ai groq lmsys openrouter llamaindex long-context agents coding tool-use model-evaluation instruction-following context-windows semantic-search discriminator-models swyx karpathy willdepue levie bebischof andrew_n_carr bigeagle_xd
Moonshot AI updated their Kimi K2-0905 open model with doubled context length to 256k tokens, improved coding and tool-calling, and integration with agent scaffolds. Alibaba released Qwen 3 Max, a 1 trillion parameter model with agent-oriented behavior, available via Qwen Chat, Alibaba Cloud API, and OpenRouter. The community highlights China's dominance in open models and debates around meaningful evaluation methods for code agents, emphasizing long-horizon and domain-specific evals. Influential voices like @swyx and @karpathy discuss the importance of practical evals and discriminator models for ranking outputs.