All tags
Model: "grok-4.3"
not much happened today
grok-4.3 deepseek-v4-pro kimi-k2.6 mimo-v2.5-pro gemini-3.1-pro claude-opus-4.7 gpt-5.5 deepskvit xai deepseek artificial-analysis andon-labs benchmarking cost-efficiency agentic-ai token-efficiency attention-mechanisms inference-speed multimodality spatial-reasoning model-architecture model-performance scaling01 teortaxestex omarsar0
xAI released Grok 4.3, improving cost/performance with a 53 Intelligence Index score, 4 points higher than Grok 4.20, and significant gains on GDPval-AA and τ²-Bench Telecom. However, accuracy tradeoffs raised reliability concerns. Community opinions are mixed, with some praising token-efficiency and others noting regressions and pricing concerns. DeepSeek V4 Pro emerges as a leading open-weight coding/agent model, comparable to Codex and Claude Code, featuring a 1M context window and efficient attention mechanisms. Benchmarking shows open-weight models like Kimi K2.6, MiMo V2.5 Pro, and DeepSeek V4 Pro closing the gap with closed models such as Gemini 3.1 Pro Preview, Claude Opus 4.7, and GPT-5.5. DeepSeek's multimodal efforts focus on explicit spatial grounding with a novel "point while thinking" approach using DeepSeek-ViT and CSA compression.
not much happened today
gpt-5.5 claude-mythos-preview gpt-5.5-pro qwen3.6-27b hy3-preview grok-4.3 gemma-4-31b glm-5.1 deepseek-v4-flash openai anthropic x-ai tencent deepseek cybersecurity model-efficiency multimodality model-benchmarking agentic-ai model-cost-optimization context-windows model-performance open-weight-models software-integration security-updates sama scaling01 cryps1s polynoamial ajambrosino arix
OpenAI's GPT-5.5 achieves top-tier performance in long-horizon cyber tasks, matching or surpassing Claude Mythos Preview with a 71.4% pass rate and showing ongoing improvement beyond 100M tokens inference. OpenAI also released an Advanced Account Security update for ChatGPT enhancing phishing resistance. The Codex update expands beyond coding to general computer tasks, improving speed by up to 42% and introducing role-based onboarding and app integrations. Economically, GPT-5.5 Pro shows a slight SOTA improvement on CritPt with ~60% lower cost and token use compared to GPT-5.4 Pro. In open-weight models, Qwen3.6 27B leads under 150B parameters with an Intelligence Index score of 46, featuring 262K context, native multimodal input, and efficient BF16 weights. Tencent's Hy3-preview (295B total, 21B active MoE) scores 42 on the Intelligence Index with strong scientific reasoning on CritPt. xAI's Grok 4.3 shows sharp improvements on agentic benchmarks with reduced cost.