Topic: "reinforcement-learning"

Anthropic launches the MCP Apps open spec, in Claude.ai

not much happened today

OpenEvidence, the ‘ChatGPT for doctors,’ raises $250m at $12B valuation, 12x from $1b last Feb

not much happened today

not much happened today

not much happened today

NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B

not much happened today

MCP -> Agentic AI Foundation, Mistral Devstral 2

not much happened today

not much happened today

DeepSeek V3.2 & 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling

not much happened today

not much happened today

not much happened today

not much happened today

Cursor 2.0 & Composer-1: Fast Models and New Agents UI

not much happened today

not much happened today

not much happened today

OpenAI Titan XPU: 10GW of self-designed chips with Broadcom

not much happened today

Air Street's State of AI 2025 Report

not much happened today

Alibaba Yunqi: 7 models released in 4 days (Qwen3-Max, Qwen3-Omni, Qwen3-VL) and $52B roadmap

NVIDIA to invest $100B in OpenAI for 10GW of Vera Rubin rollout

Softbank, NVIDIA and US Govt take 2%, 5% and 10% of Intel, will develop Intel x86 RTX SOCs for consumer & datacenters

not much happened today

not much happened today

Oracle jumps +36% in a day after winning $300B OpenAI contract

not much happened today

Anthropic raises $13B at $183B Series F

OpenAI updates Codex, VSCode Extension that can sync tasks with Codex Cloud

not much happened today

DeepSeek V3.1: 840B token continued pretrain, beating Claude 4 Sonnet at 11% of its cost

not much happened today

OpenAI's IMO Gold model also wins IOI Gold

not much happened today

GLM-4.5: Deeper, Headier, & better than Kimi/Qwen/DeepSeek (SOTA China LLM?)

not much happened today

not much happened today

OAI and GDM announce IMO Gold-level results with natural language reasoning, no specialized training or tools, under human time limits

ChatGPT Agent: new o* model + unified Deep Research browser + Operator computer use + Code Interpreter terminal

not much happened today

not much happened today

OpenAI releases Deep Research API (o3/o4-mini)

Bartz v. Anthropic PBC — "Training use is Fair Use"

Not much happened today

not much happened today

Apple exposes Foundation Models API and... no new Siri

not much happened today

DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release

not much happened today

Mistral's Agents API and the 2025 LLM OS

ChatGPT Codex, OpenAI's first cloud SWE agent

codex-1 openai-o3 codex-mini gemma-3 blip3-o qwen-2.5 marigold-iid deepseek-v3 lightlab gemini-2.0 lumina-next openai runway salesforce qwen deepseek google google-deepmind j1 software-engineering parallel-processing multimodality diffusion-models depth-estimation scaling-laws reinforcement-learning fine-tuning model-performance multi-turn-conversation reasoning audio-processing sama kevinweil omarsar0 iscienceluvr akhaliq osanseviero c_valenzuelab mervenoyann arankomatsuzaki jasonwei demishassabis philschmid swyx teortaxestex jaseweston

OpenAI launched Codex, a cloud-based software engineering agent powered by codex-1 (an optimized version of OpenAI o3) available in research preview for Pro, Enterprise, and Team ChatGPT users, featuring parallel task execution like refactoring and bug fixing. The Codex CLI was enhanced with quick sign-in and a new low-latency model, codex-mini. Gemma 3 is highlighted as the best open model runnable on a single GPU. Runway released the Gen-4 References API for style transfer in generation. Salesforce introduced BLIP3-o, a unified multimodal model family using diffusion transformers for CLIP image features. The Qwen 2.5 models (1.5B and 3B versions) were integrated into the PocketPal app with various chat templates. Marigold IID, a new state-of-the-art open-source depth estimation model, was released. In research, DeepSeek shared insights on scaling and hardware for DeepSeek-V3. Google unveiled LightLab, a diffusion-based light source control in images. Google DeepMind's AlphaEvolve uses Gemini 2.0 to discover new math and reduce costs without reinforcement learning. Omni-R1 studied audio's role in fine-tuning audio LLMs. Qwen proposed a parallel scaling law inspired by classifier-free guidance. Salesforce released Lumina-Next on the Qwen base, outperforming Janus-Pro. A study found LLM performance degrades in multi-turn conversations due to unreliability. J1 is incentivizing LLM-as-a-Judge thinking via reinforcement learning. A new Qwen study correlates question and strategy similarity to predict reasoning strategies.

Gemini's AlphaEvolve agent uses Gemini 2.0 to find new Math and cuts Gemini cost 1% — without RL

Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning

not much happened today

not much happened today

AI Engineer World's Fair: Second Run, Twice The Fun

not much happened today

LlamaCon: Meta AI gets into the Llama API platform business

Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1

Cognition's DeepWiki, a free encyclopedia of all GitHub repos

not much happened today

Grok 3 & 3-mini now API Available

Gemini 2.5 Flash completes the total domination of the Pareto Frontier

OpenAI o3, o4-mini, and Codex CLI

QwQ-32B claims to match DeepSeek R1-671B

not much happened today

Google's Agent2Agent Protocol (A2A)

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

lots of little things happened this week

not much happened today

The new OpenAI Agents Platform

not much happened today

AI Engineer Summit Day 1

X.ai Grok 3 and Mira Murati's Thinking Machines

Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)

small news items

not much happened today

Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking

How To Scale Your Model, by DeepMind

OpenAI takes on Gemini's Deep Research

Mistral Small 3 24B and Tulu 3 405B

not much happened today

TinyZero: Reproduce DeepSeek R1-Zero for $30

Bespoke-Stratos + Sky-T1: The Vicuna+Alpaca moment for reasoning

DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level

not much happened today

not much happened today

PRIME: Process Reinforcement through Implicit Rewards

not much happened to end the year

DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

Meta BLT: Tokenizer-free, Byte-level LLM

Meta Llama 3.3: 405B/Nova Pro performance at 70B price

not much happened today

not much happened to end the week

OLMo 2 - new SOTA Fully Open LLM

Vision Everywhere: Apple AIMv2 and Jina CLIP v2

not much happened this weekend

DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality

Did Nvidia's Nemotron 70B train on test?

Liquid Foundation Models: A New Transformers alternative + AINews Pod 2

a calm before the storm

nothing much happened today

a quiet weekend

Learnings from o1 AMA

not much happened today

not much happened today

Llama 3.1: The Synthetic Data Model

Gemini launches context caching... or does it?

HippoRAG: First, do know(ledge) Graph

Not much happened today

$100k to predict LMSYS human preferences in a Kaggle contest

The world's first fully autonomous AI Engineer

FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs

Mistral Large disappoints

RWKV "Eagle" v5: Your move, Mamba

1/12/2024: Anthropic coins Sleeper Agents