All tags
Topic: "voice-models"
not much happened today
o1-preview o1-mini qwen-2.5 gpt-4o deepseek-v2.5 gpt-4-turbo-2024-04-09 grin llama-3-1-405b veo kat openai qwen deepseek-ai microsoft kyutai-labs perplexity-ai together-ai meta-ai-fair google-deepmind hugging-face google anthropic benchmarking math coding instruction-following model-merging model-expressiveness moe voice voice-models generative-video competition open-source model-deployment ai-agents hyung-won-chung noam-brown bindureddy akhaliq karpathy aravsrinivas fchollet cwolferesearch philschmid labenz ylecun
OpenAI's o1-preview and o1-mini models lead benchmarks in Math, Hard Prompts, and Coding. Qwen 2.5 72B model shows strong performance close to GPT-4o. DeepSeek-V2.5 tops Chinese LLMs, rivaling GPT-4-Turbo-2024-04-09. Microsoft's GRIN MoE achieves good results with 6.6B active parameters. Moshi voice model from Kyutai Labs runs locally on Apple Silicon Macs. Perplexity app introduces voice mode with push-to-talk. LlamaCoder by Together.ai uses Llama 3.1 405B for app generation. Google DeepMind's Veo is a new generative video model for YouTube Shorts. The 2024 ARC-AGI competition increases prize money and plans a university tour. A survey on model merging covers 50+ papers for LLM alignment. The Kolmogorov–Arnold Transformer (KAT) paper proposes replacing MLP layers with KAN layers for better expressiveness. Hugging Face Hub integrates with Google Cloud Vertex AI Model Garden for easier open-source model deployment. Agent.ai is introduced as a professional network for AI agents. "Touching grass is all you need."
1 TRILLION token context, real time, on device?
gemini-1.5-pro gemini-1.5 cartesia mistral-ai scale-ai state-space-models voice-models multimodality model-performance on-device-ai long-context evaluation-leaderboards learning-rate-optimization scientific-publishing research-vs-engineering yann-lecun elon-musk
Cartesia, a startup specializing in state space models (SSMs), launched a low latency voice model outperforming transformer-based models with 20% lower perplexity, 2x lower word error, and 1 point higher NISQA quality. This breakthrough highlights the potential for models that can continuously process and reason over massive streams of multimodal data (text, audio, video) with a trillion token context window on-device. The news also covers recent AI developments including Mistral's Codestral weights release, Schedule Free optimizers paper release, and Scale AI's new elo-style eval leaderboards. Additionally, a debate between yann-lecun and elon-musk on the importance of publishing AI research versus engineering achievements was noted. The Gemini 1.5 Pro/Advanced models were mentioned for their strong performance.