All tags
Model: "mistral-nemo-12b"
Pixtral 12B: Mistral beats Llama to Multimodality
pixtral-12b mistral-nemo-12b llama-3-1-70b llama-3-1-8b deeps-eek-v2-5 gpt-4-turbo llama-3-1 strawberry claude mistral-ai meta-ai-fair hugging-face arcee-ai deepseek-ai openai anthropic vision multimodality ocr benchmarking model-release model-architecture model-performance fine-tuning model-deployment reasoning code-generation api access-control reach_vb devendra_chapilot _philschmid rohanpaul_ai
Mistral AI released Pixtral 12B, an open-weights vision-language model with a Mistral Nemo 12B text backbone and a 400M vision adapter, featuring a large vocabulary of 131,072 tokens and support for 1024x1024 pixel images. This release notably beat Meta AI in launching an open multimodal model. At the Mistral AI Summit, architecture details and benchmark performances were shared, showing strong OCR and screen understanding capabilities. Additionally, Arcee AI announced SuperNova, a distilled Llama 3.1 70B & 8B model outperforming Meta's Llama 3.1 70B instruct on benchmarks. DeepSeek released DeepSeek-V2.5, scoring 89 on HumanEval, surpassing GPT-4-Turbo, Opus, and Llama 3.1 in coding tasks. OpenAI plans to release Strawberry as part of ChatGPT soon, though its capabilities are debated. Anthropic introduced Workspaces for managing multiple Claude deployments with enhanced access controls.
Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B
mistral-large-2 mistral-nemo-12b llama-3.1-8b llama-3.1-70b llama-3.1 llama-3-405b yi-34b-200k gpt-4o mistral-ai meta-ai-fair groq togethercompute code-generation math function-calling reasoning context-windows model-deprecation pretraining posttraining benchmarking
Mistral Large 2 introduces 123B parameters with Open Weights under a Research License, focusing on code generation, math performance, and a massive 128k context window, improving over Mistral Large 1's 32k context. It claims better function calling capabilities than GPT-4o and enhanced reasoning. Meanwhile, Meta officially released Llama-3.1 models including Llama-3.1-70B and Llama-3.1-8B with detailed pre-training and post-training insights. The Llama-3.1 8B model's 128k context performance was found underwhelming compared to Mistral Nemo and Yi 34B 200K. Mistral is deprecating older Apache open-source models, focusing on Large 2 and Mistral Nemo 12B. The news also highlights community discussions and benchmarking comparisons.
DataComp-LM: the best open-data 7B model/benchmark/dataset
mistral-nemo-12b gpt-4o-mini deepseek-v2-0628 mistral-7b llama-3 gemma-2 qwen-2 datacomp hugging-face openai nvidia mistral-ai deepseek dataset-design scaling-laws model-benchmarking model-performance fine-tuning multilinguality function-calling context-windows open-source-models model-optimization cost-efficiency benchmarking sam-altman guillaume-lample philschmid miramurati
DataComp team released a competitive 7B open data language model trained on only 2.5T tokens from the massive DCLM-POOL dataset of 240 trillion tokens, showing superior scaling trends compared to FineWeb. OpenAI launched GPT-4o mini, a cost-effective model with 82% MMLU and performance near GPT-4-Turbo, aimed at developers for broad applications. NVIDIA and Mistral jointly released the Mistral NeMo 12B model featuring a 128k token context window, FP8 checkpoint, multilingual support, and Apache 2.0 licensing. DeepSeek announced DeepSeek-V2-0628 as the top open-source model on the LMSYS Chatbot Arena leaderboard with strong rankings in coding, math, and hard prompts. This news highlights advances in dataset design, model efficiency, and open-source contributions in the AI community.