Topic: "model-training"

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

Anthropic @ $30B ARR, Project GlassWing and Claude Mythos Preview — first model too dangerous to release since GPT-2

not much happened today

Context Graphs: Hype or actually Trillion-dollar opportunity?

Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager

not much happened today

NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B

not much happened today

DeepSeek V3.2 & 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling

Thinking Machines' Tinker: LoRA based LLM fine-tuning API

not much happened today

Grok 4 Fast: Xai's distilled, 40% more token efficient, 2m context, 344 tok/s frontier model

Qwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiency

not much happened today

not much happened today

OpenAI's gpt-oss 20B and 120B, Claude Opus 4.1, DeepMind Genie 3

GLM-4.5: Deeper, Headier, & better than Kimi/Qwen/DeepSeek (SOTA China LLM?)

not much happened today

OAI and GDM announce IMO Gold-level results with natural language reasoning, no specialized training or tools, under human time limits

not much happened today

Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params

Zuck goes Superintelligence Founder Mode: $100M bonuses + $100M+ salaries + NFDG Buyout?

Chinese Models Launch - MiniMax-M1, Hailuo 2 "Kangaroo", Moonshot Kimi-Dev-72B

not much happened today

not much happened today

Google's Agent2Agent Protocol (A2A)

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

The new OpenAI Agents Platform

not much happened today

not much happened today

AI Engineer Summit Day 1

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

not much happened today

TinyZero: Reproduce DeepSeek R1-Zero for $30

DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level

DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

Genesis: Generative Physics Engine for Robotics (o1-mini version)

OpenAI Voice Mode Can See Now - After Gemini Does

not much happened today

OLMo 2 - new SOTA Fully Open LLM

Vision Everywhere: Apple AIMv2 and Jina CLIP v2

Canvas: OpenAI's answer to Claude Artifacts

Not much technical happened today

Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)

a quiet weekend

$1150m for SSI, Sakana, You.com + Claude 500m context

Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model

SciCode: HumanEval gets a STEM PhD upgrade

We Solved Hallucinations

Gemma 2: The Open Model for Everyone

Gemini launches context caching... or does it?

Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata

The Last Hurrah of Stable Diffusion?

Qwen 2 beats Llama 3 (and we don't know how)

Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model

Google I/O in 60 seconds

DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost

Evals: The Next Generation

OpenAI's Instruction Hierarchy for the LLM OS

Meta Llama 3 (8B, 70B)

Zero to GPT in 1 Year

DBRX: Best open model (just not most efficient)

MM1: Apple's first Large Multimodal Model

FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs

Dia de las Secuelas (StarCoder, The Stack, Dune, SemiAnalysis)

Mistral Large disappoints

Miqu confirmed to be an early Mistral-medium checkpoint

1/6-7/2024: LlaMA Pro - an alternative to PEFT/RAG??

12/24/2023: Dolphin Mixtral 8x7b is wild