All tags
Model: "llama-3-400b"
Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)
gpt-4o-mini mistral-nemo llama-3 llama-3-400b deepseek-v2 openai nvidia mistral-ai togethercompute deepseek-ai lmsys model-quantization context-windows instruction-following model-performance cost-efficiency multimodality benchmarking open-source model-release sam-altman
GPT-4o-mini launches with a 99% price reduction compared to text-davinci-003, offering 3.5% the price of GPT-4o and matching Opus-level benchmarks. It supports 16k output tokens, is faster than previous models, and will soon support text, image, video, and audio inputs and outputs. Mistral Nemo, a 12B parameter model developed with Nvidia, features a 128k token context window, FP8 checkpoint, and strong benchmark performance. Together Lite and Turbo offer fp8/int4 quantizations of Llama 3 with up to 4x throughput and significantly reduced costs. DeepSeek V2 is now open-sourced. Upcoming releases include at least 5 unreleased models and Llama 4 leaks ahead of ICML 2024.
DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost
deepseek-v2 llama-3-120b llama-3-400b gpt-4 mistral phi claude gemini mai-1 med-gemini deepseek-ai mistral-ai microsoft openai scale-ai tesla nvidia google-deepmind mixture-of-experts multi-head-attention model-inference benchmarking overfitting robotics teleoperation open-source multimodality hallucination-detection fine-tuning medical-ai model-training erhartford maximelabonne bindureddy adcock_brett drjimfan clementdelangue omarsar0 rohanpaul_ai
DeepSeek V2 introduces a new state-of-the-art MoE model with 236B parameters and a novel Multi-Head Latent Attention mechanism, achieving faster inference and surpassing GPT-4 on AlignBench. Llama 3 120B shows strong creative writing skills, while Microsoft is reportedly developing a 500B parameter LLM called MAI-1. Research from Scale AI highlights overfitting issues in models like Mistral and Phi, whereas GPT-4, Claude, Gemini, and Llama maintain benchmark robustness. In robotics, Tesla Optimus advances with superior data collection and teleoperation, LeRobot marks a move toward open-source robotics AI, and Nvidia's DrEureka automates robot skill training. Multimodal LLM hallucinations are surveyed with new mitigation strategies, and Google's Med-Gemini achieves SOTA on medical benchmarks with fine-tuned multimodal models.
Meta Llama 3 (8B, 70B)
llama-3-8b llama-3-70b llama-3-400b stable-diffusion-3 mixtral-8x22b-instruct-v0.1 vasa-1 meta-ai-fair stability-ai boston-dynamics microsoft mistral-ai hugging-face transformer tokenization model-training benchmarking robotics natural-language-processing real-time-processing synthetic-data dataset-cleaning behavior-trees ai-safety model-accuracy api model-release humor helen-toner
Meta partially released Llama 3 models including 8B and 70B variants, with a 400B variant still in training, touted as the first GPT-4 level open-source model. Stability AI launched Stable Diffusion 3 API with model weights coming soon, showing competitive realism against Midjourney V6. Boston Dynamics unveiled an electric humanoid robot Atlas, and Microsoft introduced the VASA-1 model generating lifelike talking faces at 40fps on RTX 4090. Mistral AI, a European OpenAI rival, is seeking $5B funding with its Mixtral-8x22B-Instruct-v0.1 model achieving 100% accuracy on 64K context benchmarks. AI safety discussions include calls from former OpenAI board member Helen Toner for audits of top AI companies, and the Mormon Church released AI usage principles. New AI development tools include Ctrl-Adapter for diffusion models, Distilabel 1.0.0 for synthetic dataset pipelines, Data Bonsai for data cleaning with LLMs, and Dendron for building LLM agents with behavior trees. Memes highlight AI development humor and cultural references. The release of Llama 3 models features improved reasoning, a 128K token vocabulary, 8K token sequences, and grouped query attention.