All tags

Topic: "mixture-of-experts"

    not much happened today
    AI Engineer World's Fair Talks Day 1
    Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1
    DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
    Llama 4's Controversial Weekend Release
    not much happened today
    not much happened today
    Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking
    How To Scale Your Model, by DeepMind
    not much happened today
    Titans: Learning to Memorize at Test Time
    not much happened today
    DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
    Stripe lets Agents spend money with StripeAgentToolkit
    FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
    Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
    not much happened today
    Not much technical happened today
    $1150m for SSI, Sakana, You.com + Claude 500m context
    CogVideoX: Zhipu's Open Source Sora
    Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1
    GPT4o August + 100% Structured Outputs for All (GPT4o August edition)
    Microsoft AgentInstruct + Orca 3
    RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)
    Hybrid SSM/Transformers > Pure SSMs/Pure Transformers
    Francois Chollet launches $1m ARC Prize
    Skyfall
    GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)
    DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost
    Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM
    Mixture of Depths: Dynamically allocating compute in transformer-based language models
    Not much happened today
    Jamba: Mixture of Architectures dethrones Mixtral
    DBRX: Best open model (just not most efficient)
    Grok-1 in Bio
    Not much happened piday
    Nightshade poisons AI art... kinda?
    Sama says: GPT-5 soon
    1/16/2024: TIES-Merging
    1/13-14/2024: Don't sleep on #prompt-engineering
    1/11/2024: Mixing Experts vs Merging Models
    1/8/2024: The Four Wars of the AI Stack
    12/12/2023: Towards LangChain 0.1
    12/10/2023: not much happened today
    12/8/2023 - Mamba v Mistral v Hyena