All tags

Topic: "model-benchmarking"

    not much happened today
    not much happened today
    not much happened today
    ChatGPT responds to GlazeGate + LMArena responds to Cohere
    Grok 3 & 3-mini now API Available
    OpenAI o3, o4-mini, and Codex CLI
    QwQ-32B claims to match DeepSeek R1-671B
    Google's Agent2Agent Protocol (A2A)
    OpenAI adopts MCP
    Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen
    not much happened today
    Vision Everywhere: Apple AIMv2 and Jina CLIP v2
    DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
    a calm before the storm
    o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release
    a quiet weekend
    Everybody shipped small things this holiday weekend
    Execuhires: Tempting The Wrath of Khan
    Rombach et al: FLUX.1 [pro|dev|schnell], $31m seed for Black Forest Labs
    DataComp-LM: the best open-data 7B model/benchmark/dataset
    Mozilla's AI Second Act
    Talaria: Apple's new MLOps Superweapon
    Not much happened today
    Contextual Position Encoding (CoPE)
    Not much happened today
    Mergestral, Meta MTIAv2, Cohere Rerank 3, Google Infini-Attention
    RWKV "Eagle" v5: Your move, Mamba
    12/25/2023: Nous Hermes 2 Yi 34B for Christmas
    12/10/2023: not much happened today