All tags

Model: "claude-3.5-haiku"

    not much happened today
    not much happened today
    FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
    Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
    not much happened today
    Claude 3.5 Sonnet (New) gets Computer Use