All tags
Company: "ai2"
Mistral Small 3 24B and Tulu 3 405B
mistral-small-3 tulu-3-405b llama-3 tiny-swallow-1.5b qwen-2.5-max deepseek-v3 claude-3.5-sonnet gemini-1.5-pro gpt4o-mini llama-3-3-70b mistral-ai ai2 sakana-ai alibaba_qwen deepseek ollama llamaindex reinforcement-learning model-fine-tuning local-inference model-performance model-optimization on-device-ai instruction-following api training-data natural-language-processing clementdelangue dchaplot reach_vb
Mistral AI released Mistral Small 3, a 24B parameter model optimized for local inference with low latency and 81% accuracy on MMLU, competing with Llama 3.3 70B, Qwen-2.5 32B, and GPT4o-mini. AI2 released Tülu 3 405B, a large finetuned model of Llama 3 using Reinforcement Learning from Verifiable Rewards (RVLR), competitive with DeepSeek v3. Sakana AI launched TinySwallow-1.5B, a Japanese language model using TAID for on-device use. Alibaba_Qwen released Qwen 2.5 Max, trained on 20 trillion tokens, with performance comparable to DeepSeek V3, Claude 3.5 Sonnet, and Gemini 1.5 Pro, and updated API pricing. These releases highlight advances in open models, efficient inference, and reinforcement learning techniques.
OLMo 2 - new SOTA Fully Open LLM
llama-3-1-8b olmo-2 qwen2-5-72b-instruct smolvlm tulu-3 ai2 huggingface intel reinforcement-learning quantization learning-rate-annealing ocr fine-tuning model-training vision
AI2 has updated OLMo-2 to roughly Llama 3.1 8B equivalent, training with 5T tokens and using learning rate annealing and new high-quality data (Dolmino). They credit Tülu 3 and its "Reinforcement Learning with Verifiable Rewards" approach. On Reddit, Qwen2.5-72B instruct model shows near lossless performance with AutoRound 4-bit quantization, available on HuggingFace in 4-bit and 2-bit versions, with discussions on MMLU benchmark and quantization-aware training. HuggingFace released SmolVLM, a 2B parameter vision-language model running efficiently on consumer GPUs, supporting fine-tuning on Google Colab and demonstrating strong OCR capabilities with adjustable resolution and quantization options.
Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)
llama-3-2 llama-3-1 claude-3-haiku gpt-4o-mini molmo-72b molmo-7b gemma-2 phi-3-5 llama-3-2-vision llama-3-2-3b llama-3-2-20b meta-ai-fair ai2 qualcomm mediatek arm ollama together-ai fireworks-ai weights-biases cohere weaviate multimodality vision context-windows quantization model-release tokenization model-performance model-optimization rag model-training instruction-following mira-murati daniel-han
Meta released Llama 3.2 with new multimodal versions including 3B and 20B vision adapters on a frozen Llama 3.1, showing competitive performance against Claude Haiku and GPT-4o-mini. AI2 launched multimodal Molmo 72B and 7B models outperforming Llama 3.2 in vision tasks. Meta also introduced new 128k-context 1B and 3B models competing with Gemma 2 and Phi 3.5, with collaborations hinted with Qualcomm, Mediatek, and Arm for on-device AI. The release includes a 9 trillion token count for Llama 1B and 3B. Partner launches include Ollama, Together AI offering free 11B model access, and Fireworks AI. Additionally, a new RAG++ course from Weights & Biases, Cohere, and Weaviate offers systematic evaluation and deployment guidance for retrieval-augmented generation systems based on extensive production experience.
$1150m for SSI, Sakana, You.com + Claude 500m context
olmo llama2-13b-chat claude claude-3.5-sonnet safe-superintelligence sakana-ai you-com perplexity-ai anthropic ai2 mixture-of-experts model-architecture model-training gpu-costs retrieval-augmented-generation video-generation ai-alignment enterprise-ai agentic-ai command-and-control ilya-sutskever mervenoyann yuchenj_uw rohanpaul_ai ctojunior omarsar0
Safe Superintelligence raised $1 billion at a $5 billion valuation, focusing on safety and search approaches as hinted by Ilya Sutskever. Sakana AI secured a $100 million Series A funding round, emphasizing nature-inspired collective intelligence. You.com pivoted to a ChatGPT-like productivity agent after a $50 million Series B round, while Perplexity AI raised over $250 million this summer. Anthropic launched Claude for Enterprise with a 500 million token context window. AI2 released a 64-expert Mixture-of-Experts (MoE) model called OLMo, outperforming Llama2-13B-Chat. Key AI research trends include efficient MoE architectures, challenges in AI alignment and GPU costs, and emerging AI agents for autonomous tasks. Innovations in AI development feature command and control for video generation, Retrieval-Augmented Generation (RAG) efficiency, and GitHub integration under Anthropic's Enterprise plan. "Our logo is meant to invoke the idea of a school of fish coming together and forming a coherent entity from simple rules as we want to make use of ideas from nature such as evolution and collective intelligence in our research."
$100k to predict LMSYS human preferences in a Kaggle contest
llama-3-70b llama-3 gpt-4 claude-3-opus prometheus-2 groq openai lmsys scale-ai ai2 nvidia benchmarking datasets fine-tuning reinforcement-learning model-alignment hallucination parameter-efficient-fine-tuning scalable-training factuality chatbot-performance bindureddy drjimfan percyliang seungonekim mobicham clefourrier
Llama 3 models are making breakthroughs with Groq's 70B model achieving record low costs per million tokens. A new Kaggle competition offers a $100,000 prize to develop models predicting human preferences from a dataset of over 55,000 user-LLM conversations. Open source evaluator LLMs like Prometheus 2 outperform proprietary models such as GPT-4 and Claude 3 Opus in judgment tasks. New datasets like WildChat1M provide over 1 million ChatGPT interaction logs with diverse and toxic examples. Techniques like LoRA fine-tuning show significant performance gains, and NVIDIA's NeMo-Aligner toolkit enables scalable LLM alignment across hundreds of GPUs. Factuality-aware alignment methods are proposed to reduce hallucinations in LLM outputs.
The Core Skills of AI Engineering
miqumaid olmo aphrodite awq exl2 mistral-medium internlm ssd-1b lora qlora loftq ai2 hugging-face ai-engineering quantization fine-tuning open-source model-deployment data-quality tokenization prompt-adherence distillation ai-security batching hardware role-playing eugene-yan
AI Discords for 2/2/2024 analyzed 21 guilds, 312 channels, and 4782 messages saving an estimated 382 minutes of reading time. Discussions included Eugene Yan initiating a deep dive into AI engineering challenges, highlighting overlaps between software engineering and data science skills. The TheBloke Discord featured talks on MiquMaid, OLMo (an open-source 65B LLM by AI2 under Apache 2.0), Aphrodite model batching, AWQ quantization, and LoRA fine-tuning techniques like QLoRA and LoftQ. The LAION Discord discussed SSD-1B distillation issues, data quality optimization with captioning datasets like BLIP, COCO, and LLaVA, and tokenization strategies for prompt adherence in image generation. Other topics included AI security with watermarking, superconductors and carbon nanotubes for hardware, and deployment of LLMs via Hugging Face tools.
AI2 releases OLMo - the 4th open-everything LLM
olmo-1b olmo-7b olmo-65b miqu-70b mistral-medium distilbert-base-uncased ai2 allenai mistral-ai tsmc asml zeiss fine-tuning gpu-shortage embedding-chunking json-generation model-optimization reproducible-research self-correction vram-constraints programming-languages nathan-lambert lhc1921 mrdragonfox yashkhare_ gbourdin
AI2 is gaining attention in 2024 with its new OLMo models, including 1B and 7B sizes and a 65B model forthcoming, emphasizing open and reproducible research akin to Pythia. The Miqu-70B model, especially the Mistral Medium variant, is praised for self-correction and speed optimizations. Discussions in TheBloke Discord covered programming language preferences, VRAM constraints for large models, and fine-tuning experiments with Distilbert-base-uncased. The Mistral Discord highlighted challenges in the GPU shortage affecting semiconductor production involving TSMC, ASML, and Zeiss, debates on open-source versus proprietary models, and fine-tuning techniques including LoRA for low-resource languages. Community insights also touched on embedding chunking strategies and JSON output improvements.