All tags
Model: "nemotron-4-340b"
Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary
gemini-nano gemini-pro claude-3.5-sonnet gpt-4o deepseek-coder-v2 glm-0520 nemotron-4-340b gpt-4-turbo-0409 google gemini huggingface anthropic deepseek zhipu-ai tsinghua nvidia model-quantization prompt-api optimization model-weights benchmarking code-generation math synthetic-data automatic-differentiation retrieval-augmented-generation mitigating-memorization tree-search inference-time-algorithms adcock_brett dair_ai lmsysorg
The latest Chrome Canary now includes a feature flag for Gemini Nano, offering a prompt API and on-device optimization guide, with models Nano 1 and 2 at 1.8B and 3.25B parameters respectively, showing decent performance relative to Gemini Pro. The base and instruct-tuned model weights have been extracted and posted to HuggingFace. In AI model releases, Anthropic launched Claude 3.5 Sonnet, which outperforms GPT-4o on some benchmarks, is twice as fast as Opus, and is free to try. DeepSeek-Coder-V2 achieves 90.2% on HumanEval and 75.7% on MATH, surpassing GPT-4-Turbo-0409, with models up to 236B parameters and 128K context length. GLM-0520 from Zhipu AI/Tsinghua ranks highly in coding and overall benchmarks. NVIDIA announced Nemotron-4 340B, an open model family for synthetic data generation. Research highlights include TextGrad, a framework for automatic differentiation on textual feedback; PlanRAG, an iterative plan-then-RAG decision-making technique; a paper on goldfish loss to mitigate memorization in LLMs; and a tree search algorithm for language model agents.
Is this... OpenQ*?
deepseek-coder-v2 llama-3-8b nemotron-4-340b stable-diffusion-3-medium deepseek_ai anthropic runwayml openai apple nvidia stability-ai luma-labs reward-tampering test-time-search mathematical-reasoning process-supervision fine-tuning on-device-ai video-generation cost-efficiency context-length coding image-understanding multimodality adcock_brett clementdelangue svpino
DeepSeekCoder V2 promises GPT4T-beating performance at a fraction of the cost. Anthropic released new research on reward tampering. Runway launched their Sora response and Gen-3 Alpha video generation model. A series of papers explore "test-time" search techniques improving mathematical reasoning with models like LLaMa-3 8B. Apple announced Apple Intelligence with smarter Siri and image/document understanding, partnered with OpenAI to integrate ChatGPT into iOS 18, and released 20 new CoreML models with LoRA fine-tuning for specialization. NVIDIA released Nemotron-4 340B, an open model matching GPT-4 performance. DeepSeek-Coder-V2 excels in coding and math with 338 programming languages and 128K context length. Stability AI released Stable Diffusion 3 Medium weights. Luma Labs launched Dream Machine for 5-second video generation from text and images.
Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
nemotron-4-340b mixtral llama-3 gemini-1.5 gpt-4o mamba-2-hybrid-8b samba-3.8b-instruct dolphin-2.9.3 faro-yi-9b-dpo nvidia hugging-face mistral-ai llamaindex cohere gemini mistral synthetic-data model-alignment reward-models fine-tuning long-context model-scaling inference-speed mixture-of-agents open-source-models model-training instruction-following context-windows philipp-schmid bryan-catanzaro oleksii-kuchaiev rohanpaul_ai cognitivecompai _philschmid 01ai_yi
NVIDIA has scaled up its Nemotron-4 model from 15B to a massive 340B dense model, trained on 9T tokens, achieving performance comparable to GPT-4. The model alignment process uses over 98% synthetic data, with only about 20K human-annotated samples for fine-tuning and reward model training. The synthetic data generation pipeline is open-sourced, including synthetic prompts and preference data generation. The base and instruct versions outperform Mixtral and Llama 3, while the reward model ranks better than Gemini 1.5, Cohere, and GPT-4o. Other notable models include Mamba-2-Hybrid 8B, which is up to 8x faster than Transformers and excels on long-context tasks, Samba-3.8B-instruct for infinite context length with linear complexity, Dolphin-2.9.3 tiny models optimized for low-resource devices, and Faro Yi 9B DPO with a 200K context window running efficiently on 16GB VRAM. The Mixture-of-Agents technique boosts open-source LLMs beyond GPT-4 Omni on AlpacaEval 2.0.