All tags
Model: "mamba-2-hybrid-8b"
Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
nemotron-4-340b mixtral llama-3 gemini-1.5 gpt-4o mamba-2-hybrid-8b samba-3.8b-instruct dolphin-2.9.3 faro-yi-9b-dpo nvidia hugging-face mistral-ai llamaindex cohere gemini mistral synthetic-data model-alignment reward-models fine-tuning long-context model-scaling inference-speed mixture-of-agents open-source-models model-training instruction-following context-windows philipp-schmid bryan-catanzaro oleksii-kuchaiev rohanpaul_ai cognitivecompai _philschmid 01ai_yi
NVIDIA has scaled up its Nemotron-4 model from 15B to a massive 340B dense model, trained on 9T tokens, achieving performance comparable to GPT-4. The model alignment process uses over 98% synthetic data, with only about 20K human-annotated samples for fine-tuning and reward model training. The synthetic data generation pipeline is open-sourced, including synthetic prompts and preference data generation. The base and instruct versions outperform Mixtral and Llama 3, while the reward model ranks better than Gemini 1.5, Cohere, and GPT-4o. Other notable models include Mamba-2-Hybrid 8B, which is up to 8x faster than Transformers and excels on long-context tasks, Samba-3.8B-instruct for infinite context length with linear complexity, Dolphin-2.9.3 tiny models optimized for low-resource devices, and Faro Yi 9B DPO with a 200K context window running efficiently on 16GB VRAM. The Mixture-of-Agents technique boosts open-source LLMs beyond GPT-4 Omni on AlpacaEval 2.0.