All tags
Model: "phi"
Talaria: Apple's new MLOps Superweapon
gemma mixtral phi dbrx apple google mistral-ai microsoft mosaic quantization on-device-ai adapter-models model-optimization model-latency lossless-quantization low-bit-palletization token-generation model-benchmarking human-evaluation craig-federighi andrej-karpathy
Apple Intelligence introduces a small (~3B parameters) on-device model and a larger server model running on Apple Silicon with Private Cloud Compute, aiming to surpass Google Gemma, Mistral Mixtral, Microsoft Phi, and Mosaic DBRX. The on-device model features a novel lossless quantization strategy using mixed 2-bit and 4-bit LoRA adapters averaging 3.5 bits-per-weight, enabling dynamic adapter hot-swapping and efficient memory management. Apple credits the Talaria tool for optimizing quantization and model latency, achieving about 0.6 ms time-to-first-token latency and 30 tokens per second generation rate on iPhone 15 Pro. Apple focuses on an "adapter for everything" strategy with initial deployment on SiriKit and App Intents. Performance benchmarks rely on human graders, emphasizing consumer-level adequacy over academic dominance. The Apple ML blog also mentions an Xcode code-focused model and a diffusion model for Genmoji.
DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost
deepseek-v2 llama-3-120b llama-3-400b gpt-4 mistral phi claude gemini mai-1 med-gemini deepseek-ai mistral-ai microsoft openai scale-ai tesla nvidia google-deepmind mixture-of-experts multi-head-attention model-inference benchmarking overfitting robotics teleoperation open-source multimodality hallucination-detection fine-tuning medical-ai model-training erhartford maximelabonne bindureddy adcock_brett drjimfan clementdelangue omarsar0 rohanpaul_ai
DeepSeek V2 introduces a new state-of-the-art MoE model with 236B parameters and a novel Multi-Head Latent Attention mechanism, achieving faster inference and surpassing GPT-4 on AlignBench. Llama 3 120B shows strong creative writing skills, while Microsoft is reportedly developing a 500B parameter LLM called MAI-1. Research from Scale AI highlights overfitting issues in models like Mistral and Phi, whereas GPT-4, Claude, Gemini, and Llama maintain benchmark robustness. In robotics, Tesla Optimus advances with superior data collection and teleoperation, LeRobot marks a move toward open-source robotics AI, and Nvidia's DrEureka automates robot skill training. Multimodal LLM hallucinations are surveyed with new mitigation strategies, and Google's Med-Gemini achieves SOTA on medical benchmarks with fine-tuned multimodal models.