All tags

Topic: "overfitting"

    not much happened today
    Shazeer et al (2024): you are overpaying for inference >13x
    DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost