All tags
Topic: "speech-to-speech"
OpenAI Realtime API GA and new `gpt-realtime` model, 20% cheaper than 4o
gpt-realtime gpt-4o-realtime grok-code-fast-1 codex mai-1-preview mai-voice-1 gemini-cli openai xai microsoft google speech-to-speech instruction-following function-calling telephony webrtc voice-agents multilingual-switching voice-control benchmarks coding-models ide-integration developer-tools model-updates swyx juberti omarsar0 reach_vb pbbakkum skcd42 mohitreddy13 cline kevinweil gdb sama _philschmid
OpenAI launched the gpt-realtime model and Realtime API to GA, featuring advanced speech-to-speech capabilities, new voices (Cedar, Marin), image input, SIP telephony, and a ~20% price cut. Benchmarks show improvements over gpt-4o-realtime on BigBench and ComplexFuncBench. xAI introduced Grok Code Fast 1, a speed-optimized coding model integrated with popular IDEs, while OpenAI Codex received major upgrades for local and cloud development workflows. Googleโs Gemini CLI improved multi-editor support, and new models like Microsoft MAI-1-preview and MAI-Voice-1 were announced. "The new all-in-one WebRTC API removes the ephemeral token step and supports video on the same connection," highlighting enhanced developer tooling.
not much happened today
zonos-v0.1 audiobox-aesthetics moshi sonar llama-3-70b gpt-4o-mini claude-3.5-haiku gpt-4o claude-3.5-sonnet deepseek-r1-distilled-qwen-1.5b reasonflux-32b o1-preview zyphra-ai meta-ai-fair kyutai-labs perplexity-ai cerebras uc-berkeley brilliant-labs google-deepmind text-to-speech speech-to-speech benchmarking model-performance reinforcement-learning math real-time-processing open-source cross-platform-integration multilinguality zero-shot-learning danhendrycks
Zyphra AI launched Zonos-v0.1, a leading open-weight text-to-speech model supporting multiple languages and zero-shot voice cloning. Meta FAIR released the open-source Audiobox Aesthetics model trained on 562 hours of audio data. Kyutai Labs introduced Moshi, a real-time speech-to-speech system with low latency. Perplexity AI announced the Sonar model based on Llama 3.3 70b, outperforming top models like GPT-4o and Claude 3.5 Sonnet with 1200 tokens/second speed, powered by Cerebras infrastructure. UC Berkeley open-sourced a 1.5B model trained with reinforcement learning that beats o1-preview on math tasks. ReasonFlux-32B achieved 91.2% on the MATH benchmark, outperforming OpenAI o1-preview. CrossPoster, an AI agent for cross-platform posting, was released using LlamaIndex workflows. Brilliant Labs integrated the Google DeepMind Gemini Live API into smart glasses for real-time translation and object identification.