All tags
Model: "pixtral-12b"
a quiet weekend
o1 datagemma aloha demostart firefly-ai-video-model pixtral-12b gamegen-o openai google-deepmind adobe mistral-ai tencent supermaven 11x cohere anthropic latent-space-university stanford microsoft mila notre-dame reinforcement-learning chain-of-thought reasoning robotics diffusion-models multimodality video-generation model-training reflection-tuning mathematical-reasoning model-benchmarking fine-tuning george-hotz terence-tao adcock_brett rohanpaul_ai bindureddy fchollet philschmid
OpenAI released the new o1 model, leveraging reinforcement learning and chain-of-thought prompting to excel in reasoning benchmarks, achieving an IQ-like score of 120. Google DeepMind introduced DataGemma to reduce hallucinations by connecting LLMs with real-world data, and unveiled ALOHA and DemoStart for robot dexterity using diffusion methods. Adobe previewed its Firefly AI Video Model with text-to-video and generative extend features. Mistral launched the multimodal Pixtral 12B model, and Tencent presented the GameGen-O open-world video game generation model. Several research papers from Stanford, OpenAI, Microsoft, Mila, and Notre Dame focus on advanced reasoning, self-verification, and reflection tuning techniques. Experts like Terence Tao and George Hotz have shared mixed but optimistic views on o1's capabilities. Seed funding rounds include Supermaven ($12M) and 11x ($24M).
Pixtral 12B: Mistral beats Llama to Multimodality
pixtral-12b mistral-nemo-12b llama-3-1-70b llama-3-1-8b deeps-eek-v2-5 gpt-4-turbo llama-3-1 strawberry claude mistral-ai meta-ai-fair hugging-face arcee-ai deepseek-ai openai anthropic vision multimodality ocr benchmarking model-release model-architecture model-performance fine-tuning model-deployment reasoning code-generation api access-control reach_vb devendra_chapilot _philschmid rohanpaul_ai
Mistral AI released Pixtral 12B, an open-weights vision-language model with a Mistral Nemo 12B text backbone and a 400M vision adapter, featuring a large vocabulary of 131,072 tokens and support for 1024x1024 pixel images. This release notably beat Meta AI in launching an open multimodal model. At the Mistral AI Summit, architecture details and benchmark performances were shared, showing strong OCR and screen understanding capabilities. Additionally, Arcee AI announced SuperNova, a distilled Llama 3.1 70B & 8B model outperforming Meta's Llama 3.1 70B instruct on benchmarks. DeepSeek released DeepSeek-V2.5, scoring 89 on HumanEval, surpassing GPT-4-Turbo, Opus, and Llama 3.1 in coding tasks. OpenAI plans to release Strawberry as part of ChatGPT soon, though its capabilities are debated. Anthropic introduced Workspaces for managing multiple Claude deployments with enhanced access controls.