All tags

Topic: "model-training"

    not much happened today
    not much happened today
    Google's Agent2Agent Protocol (A2A)
    DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
    The new OpenAI Agents Platform
    not much happened today
    not much happened today
    AI Engineer Summit Day 1
    The Ultra-Scale Playbook: Training LLMs on GPU Clusters
    not much happened today
    TinyZero: Reproduce DeepSeek R1-Zero for $30
    DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
    DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
    Genesis: Generative Physics Engine for Robotics (o1-mini version)
    OpenAI Voice Mode Can See Now - After Gemini Does
    not much happened today
    OLMo 2 - new SOTA Fully Open LLM
    Vision Everywhere: Apple AIMv2 and Jina CLIP v2
    Canvas: OpenAI's answer to Claude Artifacts
    Not much technical happened today
    Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)
    a quiet weekend
    $1150m for SSI, Sakana, You.com + Claude 500m context
    Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model
    SciCode: HumanEval gets a STEM PhD upgrade
    We Solved Hallucinations
    Gemma 2: The Open Model for Everyone
    Gemini launches context caching... or does it?
    Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
    The Last Hurrah of Stable Diffusion?
    Qwen 2 beats Llama 3 (and we don't know how)
    Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model
    Google I/O in 60 seconds
    DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost
    Evals: The Next Generation
    OpenAI's Instruction Hierarchy for the LLM OS
    Meta Llama 3 (8B, 70B)
    Zero to GPT in 1 Year
    DBRX: Best open model (just not most efficient)
    MM1: Apple's first Large Multimodal Model
    FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs
    Dia de las Secuelas (StarCoder, The Stack, Dune, SemiAnalysis)
    Mistral Large disappoints
    AI gets Memory
    Less Lazy AI
    Miqu confirmed to be an early Mistral-medium checkpoint
    1/6-7/2024: LlaMA Pro - an alternative to PEFT/RAG??
    12/24/2023: Dolphin Mixtral 8x7b is wild