All tags
Model: "mistral-nemo-minitron-8b"
Everybody shipped small things this holiday weekend
gpt-4o-voice gemini claude jamba-1.5 mistral-nemo-minitron-8b xai google anthropic openai cognition ai21-labs nvidia langchain fine-tuning long-context parameter-efficient-fine-tuning latex-rendering real-time-audio virtual-try-on resource-tags low-code ai-agents workspace-organization model-benchmarking dario-amodei scott-wu fchollet svpino
xAI announced the Colossus 100k H100 cluster capable of training an FP8 GPT-4 class model in 4 days. Google introduced Structured Output for Gemini. Anthropic discussed Claude's performance issues possibly due to API prompt modifications. OpenAI enhanced controls for File Search in their Assistants API. Cognition and Anthropic leaders appeared on podcasts. The viral Kwai-Kolors virtual try-on model and the open-source real-time audio conversational model Mini-Omni (similar to gpt-4o-voice) were released. Tutorials on parameter-efficient fine-tuning with LoRA and QLoRA, long-context embedding challenges, and Claude's LaTeX rendering feature were highlighted. AI21 Labs released Jamba 1.5 models with a 256K context window and faster long-context performance. NVIDIA debuted Mistral-Nemo-Minitron-8B on the Open LLM Leaderboard. LangChain introduced resource tags for workspace organization, and a low-code AI app toolkit was shared by svpino. Legal AI agents and financial agent evaluations using LangSmith were also featured.
not much happened this weekend
jamba-1.5 dream-machine-1.5 ideogram-v2 mistral-nemo-minitron-8b mistral-7b llama-3-8b nous-research cursor-ai gdm george-hotz agibot unitree eth-zurich disney uc-san-diego ai21-labs luma-labs ideogram nvidia mistral-ai meta-ai-fair distributed-ai optimizer inter-gpu-communication low-latency-training open-source humanoid-robots robotics physics-based-motion teleoperation multilingual-models long-context text-to-video text-to-image model-performance george-hotz adcock_brett aman
Nous Research announced DisTrO, a new optimizer that drastically reduces inter-GPU communication by 1000x to 10,000x enabling efficient training on slow networks, offering an alternative to GDM's DiLoCo. Cursor AI gained viral attention from an 8-year-old user and announced a new fundraise, with co-host Aman returning to their podcast. George Hotz launched tinybox for sale. In robotics, AGIBOT revealed 5 new humanoid robots with open-source plans, and Unitree showcased its G1 humanoid robot nearing mass production at $16,000. ETH Zurich and Disney developed an AI system for physics-based robot motion generation from text or images. UC San Diego released ACE, an open-source teleoperation system for controlling multiple robots. AI21 Labs unveiled Jamba 1.5, a multilingual model with 256k context length and permissive licensing. Luma Labs released Dream Machine 1.5 for improved text-to-video generation. Ideogram launched v2 of its text-to-image model with near-perfect text generation. Nvidia and Mistral released Mistral-NeMo-Minitron 8B, a small model outperforming Mistral-7B and llama-3-8b on the Open LLM leaderboard.
Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1
llama-3-1-8b llama-3-1 jamba-1.5 claude-3 dracarys-70b dracarys-72b mistral-nemo-minitron-8b mistral-7b nvidia meta-ai-fair ai21-labs anthropic hugging-face pruning knowledge-distillation weight-pruning activation-based-pruning width-pruning kl-divergence teacher-correction prompt-optimization multilinguality long-context mixture-of-experts model-fine-tuning
Nvidia and Meta researchers updated their Llama 3 results with a paper demonstrating the effectiveness of combining weight pruning and knowledge distillation to reduce training costs by training only the largest model from scratch and deriving smaller models via pruning and distillation. The process involves teacher correction, activation-based pruning (favoring width pruning), and retraining with distillation using KL Divergence loss, resulting in better-performing models at comparable sizes. However, distillation incurs some accuracy tradeoffs. Additionally, AI21 Labs launched Jamba 1.5, a hybrid SSM-Transformer MoE model with large context windows and multilingual support. Anthropic updated Claude 3 with LaTeX rendering and prompt caching. An open-source coding-focused LLM, Dracarys, was released in 70B and 72B sizes, showing improved coding performance. The Mistral Nemo Minitron 8B model outperforms Llama 3.1 8B and Mistral 7B on the Hugging Face leaderboard, highlighting pruning and distillation benefits. Research on prompt optimization reveals the complexity of prompt search spaces and the surprising effectiveness of simple algorithms like AutoPrompt/GCG.