All tags
Company: "livekit"
OpenAI Realtime API and other Dev Day Goodies
gpt-4o-realtime-preview gpt-4o openai livekit agora twilio grab automat voice-activity-detection function-calling ephemeral-sessions auto-truncation vision-fine-tuning model-distillation prompt-caching audio-processing
OpenAI launched the gpt-4o-realtime-preview Realtime API featuring text and audio token processing with pricing details and future plans including vision and video support. The API supports voice activity detection modes, function calling, and ephemeral sessions with auto-truncation for context limits. Partnerships with LiveKit, Agora, and Twilio enhance audio components and AI virtual agent voice calls. Additionally, OpenAI introduced vision fine-tuning with only 100 examples improving mapping accuracy for Grab and RPA success for Automat. Model distillation and prompt caching features were also announced, including free eval inference for users opting to share data.
Not much happened today
gemini-1.5-flashmodel gemini-pro mixtral mamba-2 phi-3-medium phi-3-small gpt-3.5-turbo-0613 llama-3-8b llama-2-70b mistral-finetune twelve-labs livekit groq openai nea nvidia lmsys mistral-ai model-performance prompt-engineering data-curation ai-safety model-benchmarking model-optimization training sequence-models state-space-models daniel-kokotajlo rohanpaul_ai _arohan_ tri_dao _albertgu _philschmid sarahcat21 hamelhusain jachiam0 willdepue teknium1
Twelve Labs raised $50m in Series A funding co-led by NEA and NVIDIA's NVentures to advance multimodal AI. Livekit secured $22m in funding. Groq announced running at 800k tokens/second. OpenAI saw a resignation from Daniel Kokotajlo. Twitter users highlighted Gemini 1.5 FlashModel for high performance at low cost and Gemini Pro ranking #2 in Japanese language tasks. Mixtral models can run up to 8x faster on NVIDIA RTX GPUs using TensorRT-LLM. Mamba-2 model architecture introduces state space duality for larger states and faster training, outperforming previous models. Phi-3 Medium (14B) and Small (7B) models benchmark near GPT-3.5-Turbo-0613 and Llama 3 8B. Prompt engineering is emphasized for unlocking LLM capabilities. Data quality is critical for model performance, with upcoming masterclasses on data curation. Discussions on AI safety include a Frontier AI lab employee letter advocating whistleblower protections and debates on aligning AI to user intent versus broader humanity interests.