All tags
Person: "dbrxmosaicai"
not much happened today
aria o1-preview o1-mini gemini-1.5-pro gemini-1.5-flash gemini-1.5 claude-3.5-sonnet rhymes-ai openai anthropic google meta-ai-fair oxylabs multimodality mixture-of-experts long-context retrieval-augmented-generation benchmarking software-engineering llm-evaluation prompt-engineering web-scraping python production-applications mervenoyann osanseviero dbrxmosaicai ylecun ofirpress clefourrier omarsar0 rohanpaul_ai svpino finbarrtimbers _philschmid
Rhymes AI released Aria, a new 25.3B parameter multimodal MoE model supporting text, code, image, and video with a 64k token context window and Apache-2.0 license. OpenAI's o1-preview and o1-mini models show consistent improvement over Anthropic and Google Gemini 1.5 Pro/Flash on long context RAG benchmarks up to 128k tokens, while Google Gemini 1.5 models excel at extreme context lengths up to 2 million tokens. Meta AI expanded rollout to 21 countries with new language support but remains unavailable in the EU. The one-year anniversary of SWE-bench benchmark for software engineering tasks was celebrated, alongside the introduction of SWE-bench Multimodal. New AI tools include OxyCopilot by Oxylabs for web scraping, Taipy for Python-based production apps, and Latitude for prompt engineering. Industry insights highlight changing AI funding dynamics and OpenAI's strategic focus on consumer products like ChatGPT. "all recaps done by Claude 3.5 Sonnet, best of 4 runs."
Gemini Live
gemini-1.5-pro genie falcon-mamba gemini-1.5 llamaindex google anthropic tii supabase perplexity-ai llamaindex openai hugging-face multimodality benchmarking long-context retrieval-augmented-generation open-source model-releases model-integration model-performance software-engineering linear-algebra hugging-face-hub debugging omarsar0 osanseviero dbrxmosaicai alphasignalai perplexity_ai _jasonwei svpino
Google launched Gemini Live on Android for Gemini Advanced subscribers during the Pixel 9 event, featuring integrations with Google Workspace apps and other Google services. The rollout began on 8/12/2024, with iOS support planned. Anthropic released Genie, an AI software engineering system achieving a 57% improvement on SWE-Bench. TII introduced Falcon Mamba, a 7B attention-free open-access model scalable to long sequences. Benchmarking showed that longer context lengths do not always improve Retrieval-Augmented Generation. Supabase launched an AI-powered Postgres service dubbed the "ChatGPT of databases," fully open source. Perplexity AI partnered with Polymarket to integrate real-time probability predictions into search results. A tutorial demonstrated a multimodal recipe recommender using Qdrant, LlamaIndex, and Gemini. An OpenAI engineer shared success tips emphasizing debugging and hard work. The connection between matrices and graphs in linear algebra was highlighted for insights into nonnegative matrices and strongly connected components. Keras 3.5.0 was released with Hugging Face Hub integration for model saving and loading.
We Solved Hallucinations
gpt-2 flashattention-3 lynx meta-ai-fair nvidia princeton colfax patronus-ai databricks mosaic-ai openai compute-hardware gpu-optimization flashattention llm-evaluation hallucination-detection vision benchmarking synthetic-data model-training karpathy tri_dao giffmana vikhyatk dbrxmosaicai
Reddit's URL structure causes link errors in AI-generated summaries, especially with NSFW content affecting models like Claude and GPT-4. The team fixed this glitch while still leveraging LLMs for summarizing Reddit content. GPT-2 training costs have dramatically dropped to ~$672 using H100 GPUs and software improvements like CUDA and FlashAttention. FlashAttention-3 was released, achieving up to 740 TFLOPS on H100 GPUs, with FP8 nearing 1.2 PFLOPS, developed collaboratively by Meta, NVIDIA, Princeton, and Colfax. Hopper GPUs enable major speedups with new hardware features. Synthetic data may not improve vision tasks, as shown in recent research. The Avocado360 benchmark evaluates vision-language models' ability to detect avocados in images. Lynx, a hallucination detection model for LLMs, was introduced for real-world healthcare and fintech applications, trained by Patronus AI on Databricks Mosaic AI using Composer.