All tags
Model: "gpt-5.5"
not much happened today
gpt-5.5 gpt-image-2 gpt-5.5-pro gpt-5.5-instant gpt-realtime-2 gpt-5.5-cyber codex zaya1-74b-preview zaya1-vl-8b qwen3-omni openai zyphra amd deepseek vllm_project model-release model-training mixture-of-experts inference model-optimization sandboxing alignment cybersecurity agent-runtime throughput quantization telemetry real-time-detection reach_vb dhh gdb patience_cave ithilgore cryps1s sama deredleritt3r
OpenAI rapidly expanded the GPT-5.5 family with multiple variants including gpt-image-2, GPT-5.5 Pro, and GPT-5.5 Cyber, receiving positive feedback for efficiency and usability. Codex evolved into a long-running agent runtime with a new /goal mechanism, achieving 61% success on ARC-AGI-3 games after extensive testing. OpenAI also introduced cybersecurity-focused models like GPT-5.5-Cyber targeting enterprise and government sectors. Meanwhile, Zyphra released the open-model ZAYA1-74B-Preview, a 74B parameter mixture-of-experts model trained on AMD hardware under Apache 2.0 license, alongside a vision-language model ZAYA1-VL-8B. Inference infrastructure competition intensified with vLLM updates improving throughput and latency, including support for DeepSeek V4 and enhanced quantization/backends.
GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs
gpt-realtime-2 gpt-5.5 codex openai anthropic goodfireai scale-ai voice-models streaming-translation transcription benchmarking context-windows browser-automation cybersecurity interpretability neural-geometry manifolds ai-safety rlhf micahcarroll milesbrundage ryanpgreenblatt
OpenAI released GPT-Realtime-2, a voice model with GPT-5-class reasoning, tool use, interruption handling, and extended context windows up to 128K tokens, achieving top scores on Big Bench Audio and Conversational Dynamics benchmarks. They also launched a Chrome plugin for Codex enabling browser control and multitasking, and introduced GPT-5.5 with Trusted Access for Cyber for secure defensive workflows and red teaming. Anthropic introduced Natural Language Autoencoders for interpreting model activations as human-readable text, aiding interpretability and debugging, while Goodfire proposed a neural geometry research agenda focusing on manifolds as primitives for neural network behavior. Anthropic also announced The Anthropic Institute to advance AI safety and economic resilience research.
not much happened today
grok-4.3 deepseek-v4-pro kimi-k2.6 mimo-v2.5-pro gemini-3.1-pro claude-opus-4.7 gpt-5.5 deepskvit xai deepseek artificial-analysis andon-labs benchmarking cost-efficiency agentic-ai token-efficiency attention-mechanisms inference-speed multimodality spatial-reasoning model-architecture model-performance scaling01 teortaxestex omarsar0
xAI released Grok 4.3, improving cost/performance with a 53 Intelligence Index score, 4 points higher than Grok 4.20, and significant gains on GDPval-AA and τ²-Bench Telecom. However, accuracy tradeoffs raised reliability concerns. Community opinions are mixed, with some praising token-efficiency and others noting regressions and pricing concerns. DeepSeek V4 Pro emerges as a leading open-weight coding/agent model, comparable to Codex and Claude Code, featuring a 1M context window and efficient attention mechanisms. Benchmarking shows open-weight models like Kimi K2.6, MiMo V2.5 Pro, and DeepSeek V4 Pro closing the gap with closed models such as Gemini 3.1 Pro Preview, Claude Opus 4.7, and GPT-5.5. DeepSeek's multimodal efforts focus on explicit spatial grounding with a novel "point while thinking" approach using DeepSeek-ViT and CSA compression.
not much happened today
gpt-5.5 claude-mythos-preview gpt-5.5-pro qwen3.6-27b hy3-preview grok-4.3 gemma-4-31b glm-5.1 deepseek-v4-flash openai anthropic x-ai tencent deepseek cybersecurity model-efficiency multimodality model-benchmarking agentic-ai model-cost-optimization context-windows model-performance open-weight-models software-integration security-updates sama scaling01 cryps1s polynoamial ajambrosino arix
OpenAI's GPT-5.5 achieves top-tier performance in long-horizon cyber tasks, matching or surpassing Claude Mythos Preview with a 71.4% pass rate and showing ongoing improvement beyond 100M tokens inference. OpenAI also released an Advanced Account Security update for ChatGPT enhancing phishing resistance. The Codex update expands beyond coding to general computer tasks, improving speed by up to 42% and introducing role-based onboarding and app integrations. Economically, GPT-5.5 Pro shows a slight SOTA improvement on CritPt with ~60% lower cost and token use compared to GPT-5.4 Pro. In open-weight models, Qwen3.6 27B leads under 150B parameters with an Intelligence Index score of 46, featuring 262K context, native multimodal input, and efficient BF16 weights. Tencent's Hy3-preview (295B total, 21B active MoE) scores 42 on the Intelligence Index with strong scientific reasoning on CritPt. xAI's Grok 4.3 shows sharp improvements on agentic benchmarks with reduced cost.
not much happened today
gpt-5.5 gpt-5.4 opus-4.7 mimo-v2.5-pro mimo-v2.5 kimi-k2.6 codex copilot openai microsoft google amazon github xiaomi openai-devs vllm_project kimi-moonshot model-distribution cloud-computing benchmarking usage-based-billing model-orchestration open-source large-context-models agent-scaling coding model-training fp8 attention-mechanisms multi-agent-systems sama scaling01 kimmonismus ajassy simonw htihle arena gdb hangsiin eliebakouch _luofuli teortaxestex
OpenAI loosens its Azure exclusivity, allowing distribution across Google TPU, AWS Trainium, and Bedrock with commitments through 2032 and revenue share through 2030. GPT-5.5 shows improved benchmarks but is not uniformly dominant, ranking variably across coding, document, math, and vision tasks. GitHub's Copilot shifts to usage-based billing starting June 1, reflecting increased runtime costs. OpenAI open-sourced Symphony, an orchestration layer for issue tracking and Codex agents. Xiaomi released MiMo-V2.5 and MiMo-V2.5-Pro, large context models with up to 1M-token context and trillions of tokens trained, emphasizing complex agent and omni-modal capabilities. Kimi K2.6 leads OpenRouter's leaderboard, noted for coding and long-horizon agent capabilities with large-scale sub-agent coordination.
DeepSeek v4
deepseek-v4 deepseek-v4-pro deepseek-v4-flash kimi-k2.6 glm-5.1 xiaomi-mimo-v2.5-pro gpt-5.5 gpt-5.5-pro deepseek nvidia openai lambdaapi togethercompute xiaomi long-context mixture-of-experts model-quantization memory-optimization hardware-model-co-design inference-speed agent-integration token-efficiency model-deployment open-weights reasoning hallucination-detection scaling01 ben_burtenshaw artificialanlys
DeepSeek-V4 technical release features a 1.6T-parameter MoE with 49B active parameters and 1M-token context, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the #2 open-weights reasoning model behind Kimi K2.6 but has a high hallucination rate and higher serving costs. Hardware-model co-design is emphasized, with NVIDIA Blackwell Ultra delivering 150+ TPS/user and support for FP4 and FP8 quantization enabling deployment on single nodes. Positioning among open Chinese models is competitive with GLM-5.1 and Xiaomi MiMo V2.5 Pro. Meanwhile, OpenAI launched GPT-5.5 and GPT-5.5 Pro APIs with a 1M context window, focusing on improved long-running workflows and token efficiency, quickly integrated into tools like GitHub Copilot and Cursor. "GPT-5.5 handles complex, tool-heavy, ambiguous workflows with fewer retries," highlighting rapid distribution and agent integration.
GPT 5.5
gpt-5.5 gpt-5.4 gpt-5.5-pro openai scaling01 anthropic teknium agentic-ai token-efficiency tool-use self-checking coding long-horizon-planning model-pricing api-access model-safety software-integration sama reach_vb
OpenAI launched GPT-5.5 as its new flagship model for "real work and powering agents," immediately available in ChatGPT and Codex but with delayed API access due to enhanced safety requirements. The model features improved token efficiency and supports longer multi-step execution with tool use and self-checking. Pricing is set at $5/$30 per million tokens for GPT-5.5 and $30/$180 for GPT-5.5 Pro, roughly double the cost of GPT-5.4. The release includes significant Codex upgrades such as browser control, document handling, and OS-wide dictation. Early reactions are mixed but generally positive, noting improvements in coding and long-horizon tasks, though some benchmarks show incremental gains and hallucination issues persist. Third-party ecosystem support like Hermes Agent integration appeared quickly.