All tags

Topic: "reinforcement-learning"

    Apple exposes Foundation Models API and... no new Siri
    not much happened today
    DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release
    not much happened today
    Mistral's Agents API and the 2025 LLM OS
    ChatGPT Codex, OpenAI's first cloud SWE agent
    Gemini's AlphaEvolve agent uses Gemini 2.0 to find new Math and cuts Gemini cost 1% — without RL
    Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning
    not much happened today
    not much happened today
    AI Engineer World's Fair: Second Run, Twice The Fun
    not much happened today
    LlamaCon: Meta AI gets into the Llama API platform business
    Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1
    Cognition's DeepWiki, a free encyclopedia of all GitHub repos
    not much happened today
    Grok 3 & 3-mini now API Available
    Gemini 2.5 Flash completes the total domination of the Pareto Frontier
    OpenAI o3, o4-mini, and Codex CLI
    QwQ-32B claims to match DeepSeek R1-671B
    not much happened today
    Google's Agent2Agent Protocol (A2A)
    DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
    lots of little things happened this week
    not much happened today
    The new OpenAI Agents Platform
    not much happened today
    AI Engineer Summit Day 1
    X.ai Grok 3 and Mira Murati's Thinking Machines
    Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)
    small news items
    not much happened today
    Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking
    How To Scale Your Model, by DeepMind
    OpenAI takes on Gemini's Deep Research
    Mistral Small 3 24B and Tulu 3 405B
    not much happened today
    TinyZero: Reproduce DeepSeek R1-Zero for $30
    Bespoke-Stratos + Sky-T1: The Vicuna+Alpaca moment for reasoning
    DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
    not much happened today
    not much happened today
    PRIME: Process Reinforcement through Implicit Rewards
    not much happened to end the year
    DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
    Meta BLT: Tokenizer-free, Byte-level LLM
    Meta Llama 3.3: 405B/Nova Pro performance at 70B price
    not much happened today
    not much happened to end the week
    OLMo 2 - new SOTA Fully Open LLM
    Vision Everywhere: Apple AIMv2 and Jina CLIP v2
    not much happened this weekend
    DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality
    Did Nvidia's Nemotron 70B train on test?
    Liquid Foundation Models: A New Transformers alternative + AINews Pod 2
    a calm before the storm
    nothing much happened today
    a quiet weekend
    Learnings from o1 AMA
    not much happened today
    not much happened today
    Llama 3.1: The Synthetic Data Model
    Gemini launches context caching... or does it?
    HippoRAG: First, do know(ledge) Graph
    Not much happened today
    $100k to predict LMSYS human preferences in a Kaggle contest
    The world's first fully autonomous AI Engineer
    FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs
    Mistral Large disappoints
    RWKV "Eagle" v5: Your move, Mamba
    1/12/2024: Anthropic coins Sleeper Agents