All tags
Company: "aiatmeta"
Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning
intellect-2 dreamo qwen gemini-2.5-pro dynamic-byte-latent-transformer gen-4-references mistral-medium-3 le-chat-enterprise primeintellect bytedance qwen gemma meta-ai-fair runwayml mistral-ai google distributed-training reinforcement-learning gpu-clusters model-optimization quantization multimodality agentic-ai video-understanding fine-tuning _akhaliq reach_vb osanseviero aiatmeta c_valenzuelab lmarena_ai adcock_brett
Prime Intellect released INTELLECT-2, a decentralized GPU training and RL framework with a vision for distributed AI training overcoming colocation limits. ByteDance launched DreamO, a unified image customization model on Hugging Face. Qwen released models optimized for GPTQ, GGUF, and AWQ quantization. Gemma surpassed 150 million downloads on Hugging Face. Meta released weights for the Dynamic Byte Latent Transformer and the Collaborative Reasoner framework to improve language model efficiency and reasoning. RunwayML introduced Gen-4 References, a near-realtime model requiring no fine-tuning. Mistral AI released Mistral Medium 3, a strong multimodal model, and Le Chat Enterprise, an agentic AI assistant for business. Google updated Gemini 2.5 Pro Preview with video understanding and UI improvements. "Airbnb for spare GPUs from all over the world" highlights the ongoing challenges and potential of distributed GPU training.
not much happened today
oute-tts-0.3-1b oute-tts-0.3-500m olm-1b qwen-2.5-0.5b hover gpt-4o deepseek-v3 harvey meta-ai-fair stability-ai alibaba deepseek hugging-face text-to-speech zero-shot-learning multilinguality emotion-control motor-control reinforcement-learning local-ai distributed-inference pipeline-parallelism mathematical-reasoning process-reward-models legal-ai education-ai ai-security humor reach_vb drjimfan vikhyatk mervenoyann aiatmeta iscienceluvr alibaba_qwen awnihannun ajeya_cotra emollick qtnx_ designerx
Harvey secured a new $300M funding round. OuteTTS 0.3 1B & 500M text-to-speech models were released featuring zero-shot voice cloning, multilingual support (en, jp, ko, zh, fr, de), and emotion control, powered by OLMo-1B and Qwen 2.5 0.5B. The HOVER model, a 1.5M-parameter neural net for agile motor control, was introduced, leveraging human motion capture datasets and massively parallel reinforcement learning. kokoro.js enables running AI models locally in browsers with minimal dependencies. Meta AI awarded $200K LLM evaluation grants for projects on regional language understanding, complex reasoning, and interactive programming environments. Stability AI's Twitter account was hacked, prompting security warnings. Alibaba Qwen improved Process Reward Models (PRMs) for better mathematical reasoning using a consensus filtering mechanism. DeepSeek V3 uses pipeline parallelism to enhance distributed inference and long-context generation efficiency. Discussions on AI policy in legal frameworks and AI's role in democratizing education were highlighted. Lighthearted AI-related humor was also shared.
Miqu confirmed to be an early Mistral-medium checkpoint
miqu-1-70b mistral-medium llama-2-70b-chat mixtral sqlcoder-70b codellama-70b bagelmistery-tour-v2 psyfighter-v2 mistral-ai hugging-face nous-research aiatmeta instruction-following sampling-methods fp16-quantization fine-tuning model-training context-length text-to-sql model-performance model-optimization intrstllrninja
Miqu, an open access model, scores 74 on MMLU and 84.5 on EQ-Bench, sparking debates about its performance compared to Mistral Medium. The CEO of Mistral confirmed these results. Discussions in the TheBloke Discord highlight Miqu's superiority in instruction-following and sampling methods like dynatemp and min-p. Developers also explore browser preferences and Discord UI themes. Role-playing with models like BagelMistery Tour v2 and Psyfighter v2 is popular, alongside technical talks on fp16 quantization of Miqu-1-70b. Training and fine-tuning tips for models like Unsloth and Mistral 7B are shared. In the Nous Research AI Discord, the Activation Beacon method is discussed for extending LLM context length from 4K to 400K tokens. SQLCoder-70B, fine-tuned on CodeLlama-70B, leads in text-to-SQL generation and is available on Hugging Face. The Miqu model also impresses with an 83.5 EQ-Bench score, fueling speculation about its capabilities.