Topic: "datasets"

$100k to predict LMSYS human preferences in a Kaggle contest

FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)