Topic: "instruction-following"

OpenAI GPT Image-1.5 claims to beat Nano Banana Pro, #1 across all Arenas, but completely fails Vibe Checks

Mistral 3: Mistral Large 3 + Ministral 3B/8B/14B open weights models

AI Engineer Code Summit

Gemini 3 Pro — new GDM frontier model 6, Gemini 3 Deep Think, and Antigravity IDE

GPT 5.1 in ChatGPT: No evals, but adaptive thinking and instruction following

not much happened today

MiniMax M2 230BA10B — 8% of Claude Sonnet's price, ~2x faster, new SOTA open model

not much happened today

Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched

OpenAI Realtime API GA and new `gpt-realtime` model, 20% cheaper than 4o

Western Open Models get Funding: Cohere $500m @ 6.8B, AI2 gets $152m NSF+NVIDIA grants

Qwen-Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT

not much happened today

OpenAI releases Deep Research API (o3/o4-mini)

The Quiet Rise of Claude Code vs Codex

Anthropic releases Claude 4 Sonnet and Opus: Memory, Agent Capabilities, Claude Code, Redteam Drama

not much happened today

Gemini's AlphaEvolve agent uses Gemini 2.0 to find new Math and cuts Gemini cost 1% — without RL

Granola launches team notes, while Notion launches meeting transcription

QwQ-32B claims to match DeepSeek R1-671B

SOTA Video Gen: Veo 2 and Kling 2 are GA for developers

GPT 4.1: The New OpenAI Workhorse

not much happened today

not much happened today

Gemini 2.5 Pro + 4o Native Image Gen

lots of little things happened this week

LLaDA: Large Language Diffusion Models

Mistral Small 3 24B and Tulu 3 405B

Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)

not much happened today

AIPhone 16: the Visual Intelligence Phone

Reflection 70B, by Matt from IT Department

Apple Intelligence Beta + Segment Anything Model 2

Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)

Microsoft AgentInstruct + Orca 3

Not much happened today.

GraphRAG: The Marriage of Knowledge Graphs and RAG

Shall I compare thee to a Sonnet's day?

Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts

Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata

Qwen 2 beats Llama 3 (and we don't know how)

OpenAI's Instruction Hierarchy for the LLM OS

Perplexity, the newest AI unicorn

Not much happened today

Claude 3 just destroyed GPT 4 (see for yourself)

Miqu confirmed to be an early Mistral-medium checkpoint