All tags
Topic: "cybersecurity"
not much happened today
gpt-5.3-codex claude-opus-4.6 openai anthropic cursor_ai github microsoft builder-tooling cybersecurity api-access model-rollout agentic-ai long-context serving-economics throughput-latency token-efficiency workflow-design sama pierceboggan kylebrussell natolambert omarsar0 sam_altman
OpenAI launched GPT-5.3-Codex with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across Cursor, VS Code, and GitHub with phased API access and is flagged as their first "high cybersecurity capability" model. Sam Altman reported over 1M Codex app downloads in the first week and strong weekly user growth. Meanwhile, Anthropic's Claude Opus 4.6 is recognized as a leading "agentic generalist" model, topping text and code leaderboards but noted for high token usage. Discussions around serving economics and "fast mode" behavior highlight practical deployment considerations. Additionally, Recursive Language Models (RLMs) introduce a novel approach using a second programmatic context space to extend long-context capabilities.
not much happened today
claude-3-sonnet claude-3-opus gpt-5-codex grok-4-fast qwen-3-next gemini-2.5-pro sora-2-pro ray-3 kling-2.5 veo-3 modernvbert anthropic x-ai google google-labs openai arena epoch-ai mit luma akhaliq coding-agents cybersecurity api model-taxonomy model-ranking video-generation benchmarking multi-modal-generation retrieval image-text-retrieval finbarrtimbers gauravisnotme justinlin610 billpeeb apples_jimmy akhaliq
Anthropic announces a new CTO. Frontier coding agents see updates with Claude Sonnet 4.5 showing strong cybersecurity and polished UX but trailing GPT-5 Codex in coding capability. xAI Grok Code Fast claims higher edit success at lower cost. Google's Jules coding agent launches a programmable API with CI/CD integration. Qwen clarifies its model taxonomy and API tiers. Vision/LM Arena rankings show a tight competition among Claude Sonnet 4.5, Claude Opus 4.1, Gemini 2.5 Pro, and OpenAI's latest models. In video generation, Sora 2 Pro leads App Store rankings with rapid iteration and a new creator ecosystem; early tests show it answers GPQA-style questions at 55% accuracy versus GPT-5's 72%. Video Arena adds new models like Luma's Ray 3 and Kling 2.5 for benchmarking. Multi-modal video+audio generation model Ovi (Veo-3-like) is released. Retrieval models include ModernVBERT from MIT with efficient image-text retrieval capabilities. "Claude Sonnet 4.5 is basically the same as Opus 4.1 for coding" and "Jules is a programmable team member" highlight key insights.
Ideogram 2 + Berkeley Function Calling Leaderboard V2
llama-3-70b gpt-4 phi-3.5 functionary-llama-3-70b llama-3 ideogram midjourney berkeley openai hugging-face microsoft meta-ai-fair baseten kai claude functionary function-calling benchmarking image-generation model-optimization vision multimodality model-performance fine-tuning context-windows cybersecurity code-analysis ai-assisted-development
Ideogram returns with a new image generation model featuring color palette control, a fully controllable API, and an iOS app, reaching a milestone of 1 billion images created. Meanwhile, Midjourney released a Web UI but still lacks an API. In function calling, the Berkeley Function Calling Leaderboard (BFCL) updated to BFCL V2 • Live, adding 2251 live, user-contributed function documentation and queries to improve evaluation quality. GPT-4 leads the leaderboard, but the open-source Functionary Llama 3-70B finetune from Kai surpasses Claude. On AI model releases, Microsoft launched three Phi-3.5 models with impressive reasoning and context window capabilities, while Meta AI FAIR introduced UniBench, a unified benchmark suite for over 50 vision-language model tasks. Baseten improved Llama 3 inference speed by up to 122% using Medusa. A new cybersecurity benchmark, Cyberbench, featuring 40 CTF tasks, was released. Additionally, Codegen was introduced as a tool for programmatic codebase analysis and AI-assisted development. "Multiple functions > parallel functions" was highlighted as a key insight in function calling.