All tags
Model: "claude-3.5-haiku"
not much happened today
gemini-2.0-flash imagen-3 mistral-small-3.1 mistral-3 gpt-4o-mini claude-3.5-haiku olm0-32b qwen-2.5 shieldgemma-2 julian fasttransform nvidia google mistral-ai allen-ai anthropic langchainai perplexity-ai kalshi stripe qodoai multimodality image-generation context-windows model-pricing open-source-models image-classification frameworks python-libraries partnerships jeremyphoward karpathy abacaj mervenoyann
At Nvidia GTC Day 1, several AI updates were highlighted: Google's Gemini 2.0 Flash introduces image input/output but is not recommended for text-to-image tasks, with Imagen 3 preferred for that. Mistral AI released Mistral Small 3.1 with 128k token context window and competitive pricing. Allen AI launched OLMo-32B, an open LLM outperforming GPT-4o mini and Qwen 2.5. ShieldGemma 2 was introduced for image safety classification. LangChainAI announced multiple updates including Julian powered by LangGraph and integration with AnthropicAI's MCP. Jeremy Howard released fasttransform, a Python library for data transformations. Perplexity AI partnered with Kalshi for NCAA March Madness predictions.
not much happened today
zonos-v0.1 audiobox-aesthetics moshi sonar llama-3-70b gpt-4o-mini claude-3.5-haiku gpt-4o claude-3.5-sonnet deepseek-r1-distilled-qwen-1.5b reasonflux-32b o1-preview zyphra-ai meta-ai-fair kyutai-labs perplexity-ai cerebras uc-berkeley brilliant-labs google-deepmind text-to-speech speech-to-speech benchmarking model-performance reinforcement-learning math real-time-processing open-source cross-platform-integration multilinguality zero-shot-learning danhendrycks
Zyphra AI launched Zonos-v0.1, a leading open-weight text-to-speech model supporting multiple languages and zero-shot voice cloning. Meta FAIR released the open-source Audiobox Aesthetics model trained on 562 hours of audio data. Kyutai Labs introduced Moshi, a real-time speech-to-speech system with low latency. Perplexity AI announced the Sonar model based on Llama 3.3 70b, outperforming top models like GPT-4o and Claude 3.5 Sonnet with 1200 tokens/second speed, powered by Cerebras infrastructure. UC Berkeley open-sourced a 1.5B model trained with reinforcement learning that beats o1-preview on math tasks. ReasonFlux-32B achieved 91.2% on the MATH benchmark, outperforming OpenAI o1-preview. CrossPoster, an AI agent for cross-platform posting, was released using LlamaIndex workflows. Brilliant Labs integrated the Google DeepMind Gemini Live API into smart glasses for real-time translation and object identification.
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI
o1 claude-3.5-haiku gpt-4o epoch-ai openai microsoft anthropic x-ai langchainai benchmarking math moravecs-paradox mixture-of-experts chain-of-thought agent-framework financial-metrics-api pdf-processing few-shot-learning code-generation karpathy philschmid adcock_brett dylan522p
Epoch AI collaborated with over 60 leading mathematicians to create the FrontierMath benchmark, a fresh set of hundreds of original math problems with easy-to-verify answers, aiming to challenge current AI models. The benchmark reveals that all tested models, including o1, perform poorly, highlighting the difficulty of complex problem-solving and Moravec's paradox in AI. Key AI developments include the introduction of Mixture-of-Transformers (MoT), a sparse multi-modal transformer architecture reducing computational costs, and improvements in Chain-of-Thought (CoT) prompting through incorrect reasoning and explanations. Industry news covers OpenAI acquiring the chat.com domain, Microsoft launching the Magentic-One agent framework, Anthropic releasing Claude 3.5 Haiku outperforming gpt-4o on some benchmarks, and xAI securing 150MW grid power with support from Elon Musk and Trump. LangChain AI introduced new tools including a Financial Metrics API, Document GPT with PDF upload and Q&A, and LangPost AI agent for LinkedIn posts. xAI also demonstrated the Grok Engineer compatible with OpenAI and Anthropic APIs for code generation.
Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
claude-3.5-haiku llama-3-1 llama-3-2 mlx-lm tencent anthropic meta-ai-fair togethercompute llamaindex mixture-of-experts synthetic-data model-scaling model-architecture model-optimization kv-cache-quantization react fine-tuning scaling-laws model-efficiency model-deployment multimodality
Tencent released a notable >300B parameter MoE model pretrained on 7T tokens, including 1.5T synthetic data generated via Evol-Instruct. The model introduces novel techniques like "recycle routing" and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, Anthropic launched Claude 3.5 Haiku, now available on multiple platforms, praised for intelligence and speed but criticized for a 10x price increase. Meta opened Llama AI to the U.S. defense sector, and a Llama Impact Hackathon offers a $15K prize for projects using Llama 3.1 & 3.2 Vision. LlamaIndex released a React chat UI component with Tailwind CSS and LLM backend integrations. The MLX LM model advances text generation speed and efficiency with KV cache quantization.
not much happened today
claude-3.5-sonnet claude-3.5-haiku o1-preview mochi-1 stable-diffusion-3.5 embed-3 kerashub differential-transformer anthropic openai cohere microsoft computer-use coding-performance video-generation fine-tuning multimodality transformers attention-mechanisms model-optimization alexalbert fchollet rasbt
Anthropic released upgraded Claude 3.5 Sonnet and Claude 3.5 Haiku models featuring a new computer use capability that allows interaction with computer interfaces via screenshots and actions like mouse movement and typing. The Claude 3.5 Sonnet achieved state-of-the-art coding performance on SWE-bench Verified with a 49% score, surpassing OpenAI's o1-preview. Anthropic focuses on teaching general computer skills rather than task-specific tools, with expected rapid improvements. Other releases include Mochi 1, an open-source video generation model, Stable Diffusion 3.5 with Large and Medium variants, and Embed 3 by Cohere, a multimodal embedding model for text and image search. KerasHub was launched by François Chollet, unifying KerasNLP and KerasCV with 37 pretrained models. Microsoft introduced the Differential Transformer to reduce attention noise via differential attention maps, and research on transformer attention layers was shared by Rasbt.
Claude 3.5 Sonnet (New) gets Computer Use
claude-3.5-sonnet claude-3.5-haiku llama-3.1 nemotron anthropic zep nvidia coding benchmarks computer-use vision multimodal-memory model-updates ai-integration philschmid swyx
Anthropic announced new Claude 3.5 models: 3.5 Sonnet and 3.5 Haiku, improving coding performance significantly, with Sonnet topping several coding benchmarks like Aider and Vectara. The new Computer Use API enables controlling computers via vision, scoring notably higher than other AI systems, showcasing progress in AI-driven computer interaction. Zep launched a cloud edition for AI agents memory management, highlighting challenges in multimodal memory. The update also mentions Llama 3.1 and Nemotron models from NVIDIA.