All tags
Topic: "prompt-optimization"
not much happened this weekend
claude-3.5-sonnet llama-3 llama-3-8b notebookllama min-omni-2 moondream openai anthropic hugging-face mistral-ai google-deepmind langchain deepmind microsoft pattern-recognition reinforcement-learning prompt-optimization text-to-speech model-optimization tensor-parallelism hyperparameters multimodal modal-alignment multimodal-fine-tuning ai-productivity privacy generative-ai rag retrieval-augmentation enterprise-text-to-sql amanda-askell philschmid stasbekman francois-fleuret mervenoyann reach_vb dzhng aravsrinivas sama lateinteraction andrew-y-ng bindureddy jerryjliu0
Moondream, a 1.6b vision language model, secured seed funding, highlighting a trend in moon-themed tiny models alongside Moonshine (27-61m ASR model). Claude 3.5 Sonnet was used for AI Twitter recaps. Discussions included pattern recognition vs. intelligence in LLMs, reinforcement learning for prompt optimization, and NotebookLlama, an open-source NotebookLM variant using LLaMA models for tasks like text-to-speech. Advances in model optimization with async-TP in PyTorch for tensor parallelism and hyperparameter tuning were noted. Mini-Omni 2 demonstrated multimodal capabilities across image, audio, and text for voice conversations with emphasis on modal alignment and multimodal fine-tuning. AI productivity tools like an AI email writer and LlamaCloud-based research assistants were introduced. Emphasis on practical skill development and privacy-conscious AI tool usage with Llama3-8B was highlighted. Generative AI tools such as #AIPythonforBeginners and GenAI Agents with LangGraph were shared. Business insights covered rapid execution in AI product development and emerging AI-related job roles. Challenges in enterprise-grade text-to-SQL and advanced retrieval methods were discussed with tutorials on RAG applications using LangChain and MongoDB.
Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1
llama-3-1-8b llama-3-1 jamba-1.5 claude-3 dracarys-70b dracarys-72b mistral-nemo-minitron-8b mistral-7b nvidia meta-ai-fair ai21-labs anthropic hugging-face pruning knowledge-distillation weight-pruning activation-based-pruning width-pruning kl-divergence teacher-correction prompt-optimization multilinguality long-context mixture-of-experts model-fine-tuning
Nvidia and Meta researchers updated their Llama 3 results with a paper demonstrating the effectiveness of combining weight pruning and knowledge distillation to reduce training costs by training only the largest model from scratch and deriving smaller models via pruning and distillation. The process involves teacher correction, activation-based pruning (favoring width pruning), and retraining with distillation using KL Divergence loss, resulting in better-performing models at comparable sizes. However, distillation incurs some accuracy tradeoffs. Additionally, AI21 Labs launched Jamba 1.5, a hybrid SSM-Transformer MoE model with large context windows and multilingual support. Anthropic updated Claude 3 with LaTeX rendering and prompt caching. An open-source coding-focused LLM, Dracarys, was released in 70B and 72B sizes, showing improved coding performance. The Mistral Nemo Minitron 8B model outperforms Llama 3.1 8B and Mistral 7B on the Hugging Face leaderboard, highlighting pruning and distillation benefits. Research on prompt optimization reveals the complexity of prompt search spaces and the surprising effectiveness of simple algorithms like AutoPrompt/GCG.