All tags

Topic: "model-performance"

    Execuhires Round 2: Scale-Meta, Lamini-AMD, and Instacart-OpenAI
    not much happened today
    Gemini 2.5 Pro (06-05) launched at AI Engineer World's Fair
    DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release
    not much happened today
    Mistral's Agents API and the 2025 LLM OS
    Google I/O: new Gemini native voice, Flash, DeepThink, AI Mode (DeepSearch+Mariner+Astra)
    not much happened today
    ChatGPT Codex, OpenAI's first cloud SWE agent
    Granola launches team notes, while Notion launches meeting transcription
    not much happened today
    not much happened today
    not much happened today
    Cursor @ $9b, OpenAI Buys Windsurf @ $3b
    not much happened today
    gpt-image-1 - ChatGPT's imagegen model, confusingly NOT 4o, now available in API
    not much happened today
    DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
    Llama 4's Controversial Weekend Release
    not much happened today
    >$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b)
    not much happened today
    OpenAI adopts MCP
    Halfmoon is Reve Image: a new SOTA Image Model from ex-Adobe/Stability trio
    Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI
    Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)
    not much happened today
    Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen
    DeepSeek's Open Source Stack
    not much happened today
    Anthropic's $61.5B Series E
    not much happened today
    lots of small launches
    not much happened today
    AI Engineer Summit Day 1
    X.ai Grok 3 and Mira Murati's Thinking Machines
    not much happened today
    small news items
    not much happened today
    not much happened today
    OpenAI takes on Gemini's Deep Research
    o3-mini launches, OpenAI on "wrong side of history"
    Mistral Small 3 24B and Tulu 3 405B
    TinyZero: Reproduce DeepSeek R1-Zero for $30
    Project Stargate: $500b datacenter (1.7% of US GDP) and Gemini 2 Flash Thinking 2
    not much happened today
    PRIME: Process Reinforcement through Implicit Rewards
    o3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMath
    ModernBert: small new Retriever/Classifier workhorse, 8k context, 2T tokens,
    o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning
    Meta Llama 3.3: 405B/Nova Pro performance at 70B price
    $200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews
    not much happened today
    Olympus has dropped (aka, Amazon Nova Micro|Lite|Pro|Premier|Canvas|Reel)
    DeepSeek-R1 claims to beat o1-preview AND will be open sourced
    Perplexity starts Shopping for you
    BitNet was a lie?
    not much happened today
    Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)
    ChatGPT Advanced Voice Mode
    a calm before the storm
    o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release
    Learnings from o1 AMA
    o1: OpenAI's new general reasoning models
    Pixtral 12B: Mistral beats Llama to Multimodality
    not much happened today + AINews Podcast?
    not much happened this weekend
    super quiet day
    Ideogram 2 + Berkeley Function Calling Leaderboard V2
    not much happened today
    not much happened today
    not much happened today
    Grok 2! and ChatGPT-4o-latest confuses everybody
    Gemini Live
    a quiet weekend
    GPT4o August + 100% Structured Outputs for All (GPT4o August edition)
    Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model
    DataComp-LM: the best open-data 7B model/benchmark/dataset
    Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)
    SciCode: HumanEval gets a STEM PhD upgrade
    Microsoft AgentInstruct + Orca 3
    Problems with MMLU-Pro
    RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)
    Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts
    Gemini launches context caching... or does it?
    Hybrid SSM/Transformers > Pure SSMs/Pure Transformers
    Not much happened today
    1 TRILLION token context, real time, on device?
    Ten Commandments for Deploying Fine-Tuned Models
    Skyfall
    Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model
    Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing
    Google I/O in 60 seconds
    GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)
    Evals: The Next Generation
    Not much happened today
    Perplexity, the newest AI unicorn
    Llama-3-70b is GPT-4-level Open Model
    Mixtral 8x22B Instruct sparks efficiency memes
    Lilian Weng on Video Diffusion
    Mergestral, Meta MTIAv2, Cohere Rerank 3, Google Infini-Attention
    Claude 3 is officially America's Next Top Model
    Grok-1 in Bio
    Dia de las Secuelas (StarCoder, The Stack, Dune, SemiAnalysis)
    Miqu confirmed to be an early Mistral-medium checkpoint
    CodeLLama 70B beats GPT4 on HumanEval
    RWKV "Eagle" v5: Your move, Mamba
    1/17/2024: Help crowdsource function calling datasets
    1/11/2024: Mixing Experts vs Merging Models
    12/29/2023: TinyLlama on the way
    12/27/2023: NYT vs OpenAI
    12/26/2023: not much happened today
    12/18/2023: Gaslighting Mistral for fun and profit
    12/13/2023 SOLAR10.7B upstages Mistral7B?
    12/9/2023: The Mixtral Rush
    12/7/2023: Anthropic says "skill issue"