All tags

Topic: "model-scaling"

    not much happened today
    LLaDA: Large Language Diffusion Models
    Meta BLT: Tokenizer-free, Byte-level LLM
    Common Corpus: 2T Open Tokens with Provenance
    not much happened today
    Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
    not much happened today
    AlphaProof + AlphaGeometry2 reach 1 point short of IMO Gold
    Test-Time Training, MobileLLM, Lilian Weng on Hallucination (Plus: Turbopuffer)
    Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
    GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)