All tags

Model: "deepseek-native-sparse-attention"

    The Ultra-Scale Playbook: Training LLMs on GPU Clusters