All tags
Topic: "voice-synthesis"
The new OpenAI Agents Platform
reka-flash-3 o1-mini claude-3-7-sonnet llama-3-3-70b sonic-2 qwen-chat olympiccoder openai reka-ai hugging-face deepseek togethercompute alibaba ai-agents api model-releases fine-tuning reinforcement-learning model-training model-inference multimodality voice-synthesis gpu-clusters model-distillation performance-optimization open-source sama reach_vb
OpenAI introduced a comprehensive suite of new tools for AI agents, including the Responses API, Web Search Tool, Computer Use Tool, File Search Tool, and an open-source Agents SDK with integrated observability tools, marking a significant step towards the "Year of Agents." Meanwhile, Reka AI open-sourced Reka Flash 3, a 21B parameter reasoning model that outperforms o1-mini and powers their Nexus platform, with weights available on Hugging Face. The OlympicCoder series surpassed Claude 3.7 Sonnet and much larger models on competitive coding benchmarks. DeepSeek built a 32K GPU cluster capable of training V3-level models in under a week and is exploring AI distillation. Hugging Face announced Cerebras inference support, achieving over 2,000 tokens/s on Llama 3.3 70B, 70x faster than leading GPUs. Reka's Sonic-2 voice AI model delivers 40ms latency via the Together API. Alibaba's Qwen Chat enhanced its multimodal interface with video understanding up to 500MB, voice-to-text, guest mode, and expanded file uploads. Sama praised OpenAI's new API as "one of the most well-designed and useful APIs ever."
ChatGPT Advanced Voice Mode
o1-preview qwen-2.5 llama-3 claude-3.5 openai anthropic scale-ai togethercompute kyutai-labs voice-synthesis planning multilingual-datasets retrieval-augmented-generation open-source speech-assistants enterprise-ai price-cuts benchmarking model-performance sam-altman omarsar0 bindureddy rohanpaul_ai _philschmid alexandr_wang svpino ylecun _akhaliq
OpenAI rolled out ChatGPT Advanced Voice Mode with 5 new voices and improved accent and language support, available widely in the US. Ahead of rumored updates for Llama 3 and Claude 3.5, Gemini Pro saw a significant price cut aligning with the new intelligence frontier pricing. OpenAI's o1-preview model showed promising planning task performance with 52.8% accuracy on Randomized Mystery Blocksworld. Anthropic is rumored to release a new model, generating community excitement. Qwen 2.5 was released with models up to 32B parameters and support for 128K tokens, matching GPT-4 0613 benchmarks. Research highlights include PlanBench evaluation of o1-preview, OpenAI's release of a multilingual MMMLU dataset covering 14 languages, and RAGLAB framework standardizing Retrieval-Augmented Generation research. New AI tools include PDF2Audio for converting PDFs to audio, an open-source AI starter kit for local model deployment, and Moshi, a speech-based AI assistant from Kyutai. Industry updates feature Scale AI nearing $1B ARR with 4x YoY growth and Together Compute's enterprise platform offering faster inference and cost reductions. Insights from Sam Altman's blog post were also shared.