All tags
Topic: "audio-generation"
xAI Grok Imagine API - the #1 Video Model, Best Pricing and Latency - and merging with SpaceX
genie-3 nano-banana-pro gemini lingbot-world grok-imagine runway-gen-4.5 hunyuan-3d-3.1-pro google-deepmind x-ai runway fal interactive-simulation real-time-generation promptability character-customization world-models open-source video-generation audio-generation animation-workflows model-as-a-service 3d-generation latency coherence demishassabis sundarpichai
Google DeepMind launched Project Genie (Genie 3 + Nano Banana Pro + Gemini), a prototype for creating interactive, real-time generated worlds from text or image prompts, currently available to Google AI Ultra subscribers in the U.S. (18+) with noted limitations like ~60s generation limits and imperfect physics. In parallel, the open-source LingBot-World offers a real-time interactive world model with <1s latency at 16 FPS and minute-level coherence, emphasizing interactivity and causal consistency. In video generation, xAI Grok Imagine debuted strongly with native audio support, 15s duration, and competitive pricing at $4.20/min including audio, while Runway Gen-4.5 focuses on animation workflows with new features like Motion Sketch and Character Swap. The 3D generation space sees fal adding Hunyuan 3D 3.1 Pro/Rapid to its API offerings, extending model-as-a-service workflows into 3D pipelines.
not much happened today
flux-schnell meta-ai-fair anthropic togethercompute hugging-face audio-generation quantization prompt-caching long-term-memory llm-serving-framework hallucination-detection ai-safety ai-governance geoffrey-hinton john-hopfield demis-hassabis rohanpaul_ai svpino hwchase17 shreyar philschmid mmitchell_ai bindureddy
Geoffrey Hinton and John Hopfield won the Nobel Prize in Physics for foundational work on neural networks linking AI and physics. Meta AI introduced a 13B parameter audio generation model as part of Meta Movie Gen for video-synced audio. Anthropic launched the Message Batches API enabling asynchronous processing of up to 10,000 queries at half the cost. Together Compute released Flux Schnell, a free model for 3 months. New techniques like PrefixQuant quantization and Prompt Caching for low-latency inference were highlighted by rohanpaul_ai. LangGraph added long-term memory support for persistent document storage. Hex-LLM framework was introduced for TPU-based low-cost, high-throughput LLM serving from Hugging Face models. Discussions on AI safety emphasized gender equality in science, and concerns about premature AI regulation by media and Hollywood were raised.
Cohere Command R+, Anthropic Claude Tool Use, OpenAI Finetuning
c4ai-command-r-plus claude-3 gpt-3.5-turbo gemini mistral-7b gemma-2 claude-3-5 llama-3 vicuna cohere anthropic openai microsoft stability-ai opera-software meta-ai-fair google-deepmind mistral-ai tool-use multilingual-models rag fine-tuning quantum-computing audio-generation local-inference context-windows model-size-analysis model-comparison
Cohere launched Command R+, a 104B dense model with 128k context length focusing on RAG, tool-use, and multilingual capabilities across 10 key languages. It supports Multi-Step Tool use and offers open weights for research. Anthropic introduced tool use in beta for Claude, supporting over 250 tools with new cookbooks for practical applications. OpenAI enhanced its fine-tuning API with new upgrades and case studies from Indeed, SK Telecom, and Harvey, promoting DIY fine-tuning and custom model training. Microsoft achieved a quantum computing breakthrough with an 800x error rate improvement and the most usable qubits to date. Stability AI released Stable Audio 2.0, improving audio generation quality and control. The Opera browser added local inference support for large language models like Meta's Llama, Google's Gemma, and Vicuna. Discussions on Reddit highlighted Gemini's large context window, analysis of GPT-3.5-Turbo model size, and a battle simulation between Claude 3 and ChatGPT using local 7B models like Mistral and Gemma.