Model: "claude-3.5-sonnet"

OpenAI fires back: GPT-5.1-Codex-Max (API) and GPT 5.1 Pro (ChatGPT)

Claude Haiku 4.5

Anthropic releases Claude 4 Sonnet and Opus: Memory, Agent Capabilities, Claude Code, Redteam Drama

not much happened today

not much happened today

lots of small launches

not much happened today

Mistral Small 3 24B and Tulu 3 405B

Titans: Learning to Memorize at Test Time

not much happened today

PRIME: Process Reinforcement through Implicit Rewards

DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

not much happened today

Genesis: Generative Physics Engine for Robotics (o1-mini version)

OpenAI Voice Mode Can See Now - After Gemini Does

o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning

Google wakes up: Gemini 2.0 et al

OpenAI Sora Turbo and Sora.com

$200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews

not much happened today

not much happened to end the week

Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500

Anthropic launches the Model Context Protocol

Gemini (Experimental-1114) retakes #1 LLM rank with 1344 Elo

Common Corpus: 2T Open Tokens with Provenance

not much happened today

not much happened today

The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more

Creating a LLM-as-a-Judge

not much happened this weekend

not much happened today

Claude 3.5 Sonnet (New) gets Computer Use

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality

not much happened today

Did Nvidia's Nemotron 70B train on test?

not much happened today

The AI Nobel Prize

not much happened this weekend

not much happened today

Learnings from o1 AMA

Reflection 70B, by Matt from IT Department

$1150m for SSI, Sakana, You.com + Claude 500m context

not much happened today

not much happened today

not much happened today

Grok 2! and ChatGPT-4o-latest confuses everybody

not much happened today

GPT4o August + 100% Structured Outputs for All (GPT4o August edition)

SciCode: HumanEval gets a STEM PhD upgrade

Qdrant's BM42: "Please don't trust us"

GraphRAG: The Marriage of Knowledge Graphs and RAG

Gemma 2: The Open Model for Everyone

Shall I compare thee to a Sonnet's day?

Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary

Shazeer et al (2024): you are overpaying for inference >13x

Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts