Topic: "mixture-of-experts"

not much happened today

Microsoft Build: MAI-Thinking-1 and MAI Family models, Surface RTX Spark Dev Box, and OpenClaw in Windows

not much happened today

not much happened today

not much happened today

not much happened today

Context Graphs: Hype or actually Trillion-dollar opportunity?

Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B

Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch

not much happened today

Cursor 2.0 & Composer-1: Fast Models and New Agents UI

Air Street's State of AI 2025 Report

not much happened today

GPT-5 Codex launch and OpenAI's quiet rise in Agentic Coding

Qwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiency

not much happened today

Cohere Command A Reasoning beats GPT-OSS-120B and DeepSeek R1 0528

not much happened today

OpenAI's gpt-oss 20B and 120B, Claude Opus 4.1, DeepMind Genie 3

not much happened today

not much happened today

Voxtral - Mistral's SOTA ASR model in 3B (mini) and 24B ("small") sizes beats OpenAI Whisper large-v3

not much happened today

Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params

not much happened today

not much happened today

Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview

not much happened today

AI Engineer World's Fair Talks Day 1

Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

Llama 4's Controversial Weekend Release

not much happened today

not much happened today

Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking

How To Scale Your Model, by DeepMind

not much happened today

Titans: Learning to Memorize at Test Time

not much happened today

DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

Stripe lets Agents spend money with StripeAgentToolkit

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data

not much happened today

Not much technical happened today

$1150m for SSI, Sakana, You.com + Claude 500m context

CogVideoX: Zhipu's Open Source Sora

Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1

GPT4o August + 100% Structured Outputs for All (GPT4o August edition)

Microsoft AgentInstruct + Orca 3

RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)

Hybrid SSM/Transformers > Pure SSMs/Pure Transformers

Francois Chollet launches $1m ARC Prize

GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)

DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost

Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM

Mixture of Depths: Dynamically allocating compute in transformer-based language models

Not much happened today

Jamba: Mixture of Architectures dethrones Mixtral

DBRX: Best open model (just not most efficient)

Not much happened piday

Nightshade poisons AI art... kinda?

Sama says: GPT-5 soon

1/16/2024: TIES-Merging

1/13-14/2024: Don't sleep on #prompt-engineering

1/11/2024: Mixing Experts vs Merging Models

1/8/2024: The Four Wars of the AI Stack

12/12/2023: Towards LangChain 0.1

12/10/2023: not much happened today

12/8/2023 - Mamba v Mistral v Hyena