All tags
Topic: "pruning"
not much happened today
llama-3-2 llama-3 gemma-2 phi-3-5-mini claude-3-haiku gpt-4o-mini molmo gemini-1.5 gemini meta-ai-fair openai allenai google-deepmind multimodality model-optimization benchmarks ai-safety model-distillation pruning adapter-layers open-source-models performance context-windows mira-murati demis-hassabis ylecun sama
Meta AI released Llama 3.2 models including 1B, 3B text-only and 11B, 90B vision variants with 128K token context length and adapter layers for image-text integration. These models outperform competitors like Gemma 2 and Phi 3.5-mini, and are supported on major platforms including AWS, Azure, and Google Cloud. OpenAI CTO Mira Murati announced her departure. Allen AI released Molmo, an open-source multimodal model family outperforming proprietary systems. Google improved Gemini 1.5 with Flash and Pro models. Meta showcased Project Orion AR glasses and hinted at a Quest 3S priced at $300. Discussions covered new benchmarks for multimodal models, model optimization, and AI safety and alignment.
Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1
llama-3-1-8b llama-3-1 jamba-1.5 claude-3 dracarys-70b dracarys-72b mistral-nemo-minitron-8b mistral-7b nvidia meta-ai-fair ai21-labs anthropic hugging-face pruning knowledge-distillation weight-pruning activation-based-pruning width-pruning kl-divergence teacher-correction prompt-optimization multilinguality long-context mixture-of-experts model-fine-tuning
Nvidia and Meta researchers updated their Llama 3 results with a paper demonstrating the effectiveness of combining weight pruning and knowledge distillation to reduce training costs by training only the largest model from scratch and deriving smaller models via pruning and distillation. The process involves teacher correction, activation-based pruning (favoring width pruning), and retraining with distillation using KL Divergence loss, resulting in better-performing models at comparable sizes. However, distillation incurs some accuracy tradeoffs. Additionally, AI21 Labs launched Jamba 1.5, a hybrid SSM-Transformer MoE model with large context windows and multilingual support. Anthropic updated Claude 3 with LaTeX rendering and prompt caching. An open-source coding-focused LLM, Dracarys, was released in 70B and 72B sizes, showing improved coding performance. The Mistral Nemo Minitron 8B model outperforms Llama 3.1 8B and Mistral 7B on the Hugging Face leaderboard, highlighting pruning and distillation benefits. Research on prompt optimization reveals the complexity of prompt search spaces and the surprising effectiveness of simple algorithms like AutoPrompt/GCG.