All tags

Topic: "model-optimization"

    Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params
    not much happened today
    OpenAI releases Deep Research API (o3/o4-mini)
    not much happened today
    Mary Meeker is so back: BOND Capital AI Trends report
    not much happened today
    Google I/O: new Gemini native voice, Flash, DeepThink, AI Mode (DeepSearch+Mariner+Astra)
    not much happened today
    Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning
    not much happened today
    ChatGPT responds to GlazeGate + LMArena responds to Cohere
    LlamaCon: Meta AI gets into the Llama API platform business
    not much happened today
    Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)
    DeepSeek's Open Source Stack
    not much happened today
    Mistral Small 3 24B and Tulu 3 405B
    DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
    not much happened today
    not much happened today
    DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
    Meta BLT: Tokenizer-free, Byte-level LLM
    not much happened today
    Olympus has dropped (aka, Amazon Nova Micro|Lite|Pro|Premier|Canvas|Reel)
    Vision Everywhere: Apple AIMv2 and Jina CLIP v2
    Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
    not much happened today
    not much happened this weekend
    not much happened today
    not much happened today
    DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
    DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality
    not much happened today
    Not much technical happened today
    not much happened today
    Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)
    a calm before the storm
    Cerebras Inference: Faster, Better, AND Cheaper
    Ideogram 2 + Berkeley Function Calling Leaderboard V2
    The DSPy Roadmap
    DataComp-LM: the best open-data 7B model/benchmark/dataset
    Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o-mini version)
    Qdrant's BM42: "Please don't trust us"
    RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)
    That GPT-4o Demo
    Gemma 2: The Open Model for Everyone
    Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts
    Talaria: Apple's new MLOps Superweapon
    Not much happened today
    Skyfall
    Google I/O in 60 seconds
    GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)
    A quiet weekend
    Anime pfp anon eclipses $10k A::B prompting challenge
    Evals-based AI Engineering
    Jamba: Mixture of Architectures dethrones Mixtral
    The Era of 1-bit LLMs
    Welcome Interconnects and OpenRouter
    Mistral Large disappoints
    Karpathy emerges from stealth?
    AI2 releases OLMo - the 4th open-everything LLM
    Miqu confirmed to be an early Mistral-medium checkpoint
    RWKV "Eagle" v5: Your move, Mamba
    GPT4Turbo A/B Test: gpt-4-0125-preview
    12/25/2023: Nous Hermes 2 Yi 34B for Christmas