All tags

Topic: "reward-models"

    DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
    Did Nvidia's Nemotron 70B train on test?
    Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
    Life after DPO (RewardBench)
    1/16/2024: TIES-Merging