Topic: "overfitting"

not much happened today

Shazeer et al (2024): you are overpaying for inference >13x

DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost