All tags

Model: "claude-3.5-sonnet"

    Anthropic releases Claude 4 Sonnet and Opus: Memory, Agent Capabilities, Claude Code, Redteam Drama
    not much happened today
    not much happened today
    lots of small launches
    not much happened today
    Mistral Small 3 24B and Tulu 3 405B
    Titans: Learning to Memorize at Test Time
    not much happened today
    PRIME: Process Reinforcement through Implicit Rewards
    DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
    not much happened today
    Genesis: Generative Physics Engine for Robotics (o1-mini version)
    OpenAI Voice Mode Can See Now - After Gemini Does
    o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning
    Google wakes up: Gemini 2.0 et al
    OpenAI Sora Turbo and Sora.com
    $200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews
    not much happened today
    not much happened to end the week
    Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500
    Anthropic launches the Model Context Protocol
    Gemini (Experimental-1114) retakes #1 LLM rank with 1344 Elo
    Common Corpus: 2T Open Tokens with Provenance
    not much happened today
    not much happened today
    The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more
    Creating a LLM-as-a-Judge
    not much happened this weekend
    not much happened today
    Claude 3.5 Sonnet (New) gets Computer Use
    DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
    DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality
    not much happened today
    Did Nvidia's Nemotron 70B train on test?
    not much happened today
    The AI Nobel Prize
    not much happened this weekend
    not much happened today
    Learnings from o1 AMA
    Reflection 70B, by Matt from IT Department
    $1150m for SSI, Sakana, You.com + Claude 500m context
    not much happened today
    not much happened today
    not much happened today
    Grok 2! and ChatGPT-4o-latest confuses everybody
    not much happened today
    GPT4o August + 100% Structured Outputs for All (GPT4o August edition)
    SciCode: HumanEval gets a STEM PhD upgrade
    Qdrant's BM42: "Please don't trust us"
    GraphRAG: The Marriage of Knowledge Graphs and RAG
    Gemma 2: The Open Model for Everyone
    Shall I compare thee to a Sonnet's day?
    Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary
    Shazeer et al (2024): you are overpaying for inference >13x
    Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts