All tags
Model: "grok"
not much happened today; New email provider for AINews
gpt-4.1 gpt-4o gpt-4o-mini gemini-2.5-flash seaweed-7b claude embed-4 grok smol-ai resend openai google bytedance anthropic cohere x-ai email-deliverability model-releases reasoning video-generation multimodality embedding-models agentic-workflows document-processing function-calling tool-use ai-coding adcock_brett swyx jerryjliu0 alexalbert omarsar0
Smol AI is migrating its AI news email service to Resend to improve deliverability and enable new features like personalizable AI news and a "Hacker News of AI." Recent AI model updates include OpenAI's API-only GPT-4.1, Google Gemini 2.5 Flash reasoning model, ByteDance Seaweed 7B-param video AI, Anthropic Claude's values system, Cohere Embed 4 multimodal embedding model, and xAI Grok updates with Memory and Studio features. Discussions also cover agentic workflows for document automation and AI coding patterns.
DBRX: Best open model (just not most efficient)
dbrx grok mixtral llama-2 mpt-7b gpt-4 databricks hugging-face mistral-ai mosaicml openai mixture-of-experts model-efficiency tokenization model-training code-generation model-architecture open-source-models benchmarking fine-tuning
Databricks Mosaic has released a new open-source model called DBRX that outperforms Grok, Mixtral, and Llama2 on evaluations while being about 2x more efficient than Llama2 and Grok. The model was trained on 12 trillion tokens using 3,000 H100 GPUs over 2 months, with an estimated compute cost of $10 million. It uses OpenAI's 100k tiktoken tokenizer and shows strong zero-shot code generation performance, even beating GPT-4 on the Humaneval benchmark. DBRX also upstreamed work to MegaBlocks open source. Despite its scale and efficiency, DBRX's performance on MMLU is only slightly better than Mixtral, raising questions about its scaling efficiency. The focus of DBRX is on enabling users to train models efficiently, with MoE training being about 2x more FLOP-efficient than dense models, achieving similar quality with nearly 4x less compute than previous MPT models. This release is part of the ongoing competition for open-source AI leadership, including models like Dolly, MPT, and Mistral. "If it activates 36B params, the model's perf should be equivalent to a 72B dense model or even 80B," says Qwen's tech lead.