2025

April

  • Cognition's DeepWiki, a free encyclopedia of all GitHub repos
  • gpt-image-1 - ChatGPT's imagegen model, confusingly NOT 4o, now available in API
  • not much happened today
  • not much happened today; New email provider for AINews
  • Grok 3 & 3-mini now API Available
  • Gemini 2.5 Flash completes the total domination of the Pareto Frontier
  • OpenAI o3, o4-mini, and Codex CLI
  • QwQ-32B claims to match DeepSeek R1-671B
  • SOTA Video Gen: Veo 2 and Kling 2 are GA for developers
  • GPT 4.1: The New OpenAI Workhorse
  • not much happened today
  • not much happened today
  • Google's Agent2Agent Protocol (A2A)
  • DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
  • Llama 4's Controversial Weekend Release
  • not much happened today
  • not much happened today
  • not much happened today
  • >$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b)

March

  • not much happened today
  • not much happened today
  • OpenAI adopts MCP
  • Gemini 2.5 Pro + 4o Native Image Gen
  • Halfmoon is Reve Image: a new SOTA Image Model from ex-Adobe/Stability trio
  • lots of little things happened this week
  • Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI
  • Every 7 Months: The Moore's Law for Agent Autonomy
  • not much happened today
  • Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)
  • not much happened today
  • not much happened today
  • Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen
  • The new OpenAI Agents Platform
  • not much happened today
  • DeepSeek's Open Source Stack
  • not much happened today
  • not much happened today
  • Anthropic's $61.5B Series E
  • not much happened today

February

  • GPT 4.5 — Chonky Orion ships!
  • lots of small launches
  • not much happened today
  • Claude 3.7 Sonnet
  • AI Engineer Summit Day 1
  • not much happened today
  • The Ultra-Scale Playbook: Training LLMs on GPU Clusters
  • X.ai Grok 3 and Mira Murati's Thinking Machines
  • LLaDA: Large Language Diffusion Models
  • not much happened today
  • Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)
  • small news items
  • not much happened today
  • not much happened today
  • not much happened today
  • s1: Simple test-time scaling (and Kyutai Hibiki)
  • Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking
  • How To Scale Your Model, by DeepMind
  • OpenAI takes on Gemini's Deep Research
  • o3-mini launches, OpenAI on "wrong side of history"

January

  • Mistral Small 3 24B and Tulu 3 405B
  • not much happened today
  • not much happened today
  • DeepSeek #1 on US App Store, Nvidia stock tanks -17%
  • TinyZero: Reproduce DeepSeek R1-Zero for $30
  • OpenAI launches Operator, its first Agent
  • Bespoke-Stratos + Sky-T1: The Vicuna+Alpaca moment for reasoning
  • Project Stargate: $500b datacenter (1.7% of US GDP) and Gemini 2 Flash Thinking 2
  • DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
  • not much happened today
  • not much happened today
  • Titans: Learning to Memorize at Test Time
  • small little news items
  • not much happened today
  • Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model
  • not much happened today
  • not much happened today
  • not much happened today
  • PRIME: Process Reinforcement through Implicit Rewards
  • not much happened today

2024

December

  • not much happened to end the year
  • not much happened today
  • not much happened today
  • DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
  • not much happened today
  • not much happened this weekend
  • o3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMath
  • ModernBert: small new Retriever/Classifier workhorse, 8k context, 2T tokens,
  • Genesis: Generative Physics Engine for Robotics (o1-mini version)
  • Genesis: Generative Physics Engine for Robotics (o1-2024-12-17)
  • OpenAI Voice Mode Can See Now - After Gemini Does
  • o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning
  • Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights
  • Meta BLT: Tokenizer-free, Byte-level LLM
  • Google wakes up: Gemini 2.0 et al
  • ChatGPT Canvas GA
  • OpenAI Sora Turbo and Sora.com
  • Meta Llama 3.3: 405B/Nova Pro performance at 70B price
  • $200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews
  • not much happened today
  • Olympus has dropped (aka, Amazon Nova Micro|Lite|Pro|Premier|Canvas|Reel)
  • not much happened today

November

  • not much happened to end the week
  • Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500
  • OLMo 2 - new SOTA Fully Open LLM
  • Anthropic launches the Model Context Protocol
  • Vision Everywhere: Apple AIMv2 and Jina CLIP v2
  • LMSys killed Model Versioning (gpt 4o 1120, gemini exp 1121)
  • DeepSeek-R1 claims to beat o1-preview AND will be open sourced
  • Perplexity starts Shopping for you
  • Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11
  • Stripe lets Agents spend money with StripeAgentToolkit
  • Gemini (Experimental-1114) retakes #1 LLM rank with 1344 Elo
  • Common Corpus: 2T Open Tokens with Provenance
  • BitNet was a lie?
  • FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
  • not much happened today
  • not much happened today
  • Not much happened today
  • Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
  • OpenAI beats Anthropic to releasing Speculative Decoding
  • not much happened today
  • The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more

October

  • Creating a LLM-as-a-Judge
  • GitHub Copilot Strikes Back
  • not much happened this weekend
  • not much happened today
  • s{imple|table|calable} Consistency Models
  • not much happened today
  • Claude 3.5 Sonnet (New) gets Computer Use
  • DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
  • DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality
  • not much happened today
  • Did Nvidia's Nemotron 70B train on test?
  • not much happened today
  • Not much (in AI) happened this weekend
  • not much happened today
  • State of AI 2024
  • not much happened today
  • The AI Nobel Prize
  • not much happened this weekend
  • Contextual Document Embeddings: `cde-small-v1`
  • Canvas: OpenAI's answer to Claude Artifacts
  • Not much technical happened today
  • OpenAI Realtime API and other Dev Day Goodies
  • Liquid Foundation Models: A New Transformers alternative + AINews Pod 2

September

  • not much happened today
  • not much happened today
  • Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)
  • ChatGPT Advanced Voice Mode
  • a calm before the storm
  • not much happened today
  • not much happened today
  • o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release
  • nothing much happened today
  • a quiet weekend
  • Learnings from o1 AMA
  • o1: OpenAI's new general reasoning models
  • Pixtral 12B: Mistral beats Llama to Multimodality
  • not much happened today + AINews Podcast?
  • AIPhone 16: the Visual Intelligence Phone
  • Reflection 70B, by Matt from IT Department
  • Replit Agent - How did everybody beat Devin to market?
  • $1150m for SSI, Sakana, You.com + Claude 500m context
  • Everybody shipped small things this holiday weekend

August

  • not much happened today
  • Summer of Code AI: $1.6b raised, 1 usable product
  • Cerebras Inference: Faster, Better, AND Cheaper
  • CogVideoX: Zhipu's Open Source Sora
  • not much happened this weekend
  • Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1
  • super quiet day
  • Ideogram 2 + Berkeley Function Calling Leaderboard V2
  • not much happened today
  • The DSPy Roadmap
  • not much happened today
  • not much happened today
  • Grok 2! and ChatGPT-4o-latest confuses everybody
  • Gemini Live
  • a quiet weekend
  • not much happened today
  • Too Cheap To Meter: AI prices cut 50-70% in last 30 days
  • not much happened today
  • GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)
  • GPT4o August + 100% Structured Outputs for All (GPT4o August edition)
  • How Carlini Uses AI
  • Execuhires: Tempting The Wrath of Khan
  • Rombach et al: FLUX.1 [pro|dev|schnell], $31m seed for Black Forest Labs
  • Gemma 2 2B + Scope + Shield

July

  • not much happened today
  • Apple Intelligence Beta + Segment Anything Model 2
  • AlphaProof + AlphaGeometry2 reach 1 point short of IMO Gold
  • Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B
  • Llama 3.1: The Synthetic Data Model
  • Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model
  • DataComp-LM: the best open-data 7B model/benchmark/dataset
  • Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o-mini version)
  • Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)
  • Gemma 2 tops /r/LocalLlama vibe check
  • SciCode: HumanEval gets a STEM PhD upgrade
  • Microsoft AgentInstruct + Orca 3
  • We Solved Hallucinations
  • FlashAttention 3, PaliGemma, OpenAI's 5 Levels to Superintelligence
  • Nothing much happened today
  • Test-Time Training, MobileLLM, Lilian Weng on Hallucination (Plus: Turbopuffer)
  • Problems with MMLU-Pro
  • Qdrant's BM42: "Please don't trust us"
  • Not much happened today.
  • GraphRAG: The Marriage of Knowledge Graphs and RAG
  • RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)

June

  • That GPT-4o Demo
  • Gemma 2: The Open Model for Everyone
  • Mozilla's AI Second Act
  • Shall I compare thee to a Sonnet's day?
  • Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary
  • Shazeer et al (2024): you are overpaying for inference >13x
  • Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts
  • There's Ilya!
  • Gemini launches context caching... or does it?
  • Is this... OpenQ*?
  • Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
  • Hybrid SSM/Transformers > Pure SSMs/Pure Transformers
  • The Last Hurrah of Stable Diffusion?
  • Francois Chollet launches $1m ARC Prize
  • Talaria: Apple's new MLOps Superweapon
  • HippoRAG: First, do know(ledge) Graph
  • Qwen 2 beats Llama 3 (and we don't know how)
  • 5 small news items
  • Not much happened today
  • Mamba-2: State Space Duality

May

  • Ways to use Anthropic's Tool Use GA
  • Contextual Position Encoding (CoPE)
  • 1 TRILLION token context, real time, on device?
  • Somebody give Andrej some H100s already
  • Life after DPO (RewardBench)
  • Ten Commandments for Deploying Fine-Tuned Models
  • Clémentine Fourrier on LLM evals
  • ALL of AI Engineering in One Place
  • Anthropic's "LLM Genome Project": learning & clamping 34m features on Claude Sonnet
  • Skyfall
  • Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model
  • Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing
  • Not much happened today
  • Google I/O in 60 seconds
  • GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)
  • GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)
  • Quis promptum ipso promptiet?
  • LMSys advances Llama 3 eval analysis
  • OpenAI's PR Campaign?
  • Kolmogorov-Arnold Networks: MLP killers or just spicy MLPs?
  • DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost
  • $100k to predict LMSYS human preferences in a Kaggle contest
  • Evals: The Next Generation
  • Not much happened today
  • LLMs-as-Juries

April

  • A quiet weekend
  • Apple's OpenELM beats OLMo with 50% of its dataset, using DeLighT
  • Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM
  • OpenAI's Instruction Hierarchy for the LLM OS
  • Perplexity, the newest AI unicorn
  • FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)
  • Llama-3-70b is GPT-4-level Open Model
  • Meta Llama 3 (8B, 70B)
  • Mixtral 8x22B Instruct sparks efficiency memes
  • Lilian Weng on Video Diffusion
  • Multi-modal, Multi-Aspect, Multi-Form-Factor AI
  • Zero to GPT in 1 Year
  • Mergestral, Meta MTIAv2, Cohere Rerank 3, Google Infini-Attention
  • Music's Dall-E moment
  • Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence
  • Anime pfp anon eclipses $10k A::B prompting challenge
  • Mixture of Depths: Dynamically allocating compute in transformer-based language models
  • Cohere Command R+, Anthropic Claude Tool Use, OpenAI Finetuning
  • ReALM: Reference Resolution As Language Modeling
  • Not much happened today
  • AdamW -> AaronD?

March

  • Evals-based AI Engineering
  • Jamba: Mixture of Architectures dethrones Mixtral
  • DBRX: Best open model (just not most efficient)
  • Claude 3 is officially America's Next Top Model
  • Andrew likes Agents
  • not much happened today
  • Welcome /r/LocalLlama!
  • Shipping and Dipping: Inflection + Stability edition
  • World_sim.exe
  • Grok-1 in Bio
  • MM1: Apple's first Large Multimodal Model
  • Not much happened piday
  • DeepMind SIMA: one AI, 9 games, 600 tasks, vision+language ONLY
  • The world's first fully autonomous AI Engineer
  • Fixing Gemma
  • FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs
  • Inflection-2.5 at 94% of GPT4, and Pi at 6m MAU
  • Not much happened today
  • Stable Diffusion 3 — Rombach & Esser did it again!
  • Claude 3 just destroyed GPT 4 (see for yourself)
  • The Era of 1-bit LLMs
  • Dia de las Secuelas (StarCoder, The Stack, Dune, SemiAnalysis)

February

  • ... and welcome AI Twitter!
  • Welcome Interconnects and OpenRouter
  • Mistral Large disappoints
  • One Year of Latent Space
  • Ring Attention for >1M Context
  • Google AI: Win some (Gemma, 1.5 Pro), Lose some (Image gen)
  • Karpathy emerges from stealth?
  • Companies liable for AI hallucination is Good Actually for AI Engineers
  • Sora pushes SOTA
  • AI gets Memory
  • The Dissection of Smaug (72B)
  • Gemini Ultra is out, to mixed reviews
  • MetaVoice & RIP Bard
  • Qwen 1.5 Released
  • Less Lazy AI
  • The Core Skills of AI Engineering
  • AI2 releases OLMo - the 4th open-everything LLM
  • Trust in GPTs at all time low

January

  • Miqu confirmed to be an early Mistral-medium checkpoint
  • CodeLLama 70B beats GPT4 on HumanEval
  • RWKV "Eagle" v5: Your move, Mamba
  • GPT4Turbo A/B Test: gpt-4-0125-preview
  • GPT4Turbo A/B Test: gpt-4-1106-preview
  • Adept Fuyu-Heavy: Multimodal model for Agents
  • Google Solves Text to Video
  • RIP Latent Diffusion, Hello Hourglass Diffusion
  • Nightshade poisons AI art... kinda?
  • Sama says: GPT-5 soon
  • 1/17/2024: Help crowdsource function calling datasets
  • 1/16/2024: ArtificialAnalysis - a new model/host benchmark site
  • 1/16/2024: TIES-Merging
  • 1/13-14/2024: Don't sleep on #prompt-engineering
  • 1/12/2024: Anthropic coins Sleeper Agents
  • 1/11/2024: Mixing Experts vs Merging Models
  • 1/10/2024: All the best papers for AI Engineers
  • 1/9/2024: Nous Research lands $5m for Open Source AI
  • 1/8/2024: The Four Wars of the AI Stack
  • 1/6-7/2024: LlaMA Pro - an alternative to PEFT/RAG??
  • 1/4/2024: Jeff Bezos backs Perplexity's $520m Series B.
  • 1/3/2024: RIP Coqui
  • 1/2/2024: Smol tweaks to Smol Talk
  • 1/1/2024: How to start with Open Source AI
  • 12/31/2023: Happy New Year

2023

December

  • 12/30/2023: Mega List of all LLMs