Frozen AI News archive

Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI

**OpenAI** has launched three new state-of-the-art audio models in their API, including **gpt-4o-transcribe**, a speech-to-text model outperforming Whisper, and **gpt-4o-mini-tts**, a text-to-speech model with promptable prosody allowing control over timing and emotion. The **Agents SDK** now supports audio, enabling voice agents. OpenAI also updated turn detection for real-time voice activity detection (VAD) based on speech content. Additionally, **OpenAI's o1-pro** model is available to select developers with advanced features like vision and function calling, though at higher compute costs. The community shows strong enthusiasm for these audio advancements, with a radio contest for TTS creations underway. Meanwhile, **Kokoro-82M v1.0** emerges as a leading open weights TTS model with competitive pricing on Replicate.

Canonical issue URL

AI News for 3/19/2025-3/20/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (227 channels, and 4533 messages) for you. Estimated reading time saved (at 200wpm): 386 minutes. You can now tag @smol_ai for AINews discussions!

As one commenter said, the best predictor of an OpenAI launch is a launch from another frontier lab. Today's OpenAI mogging takes the cake because of how broadly it revamps OpenAI's offering - if you care about voice at all, this is as sweeping a change as the Agents platform revamp from last week.

We think Justin Uberti's summary is the best one: image.png

But you should also watch the livestream:

https://www.youtube.com/watch?v=lXb0L16ISAc

The major three highlights are

OpenAI.fm, a demo site that shows off the new promptable prosody in 4o-mini-tts:

image.png

4o-transcribe, a new (non open source?) ASR model that beats whisper and commercial peers:

image.png

and finally, blink and you will miss it, but even turn detection got an update, so now realtime voice will use the CONTENT of speech to dynamically adjust VAD:

image.png

Technical detail on the blogpost is light of course, only one paragraph each per point.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

Audio Models, Speech-to-Text, and Text-to-Speech Advancements

Model Releases, Open Source Initiatives, and Performance Benchmarks

AI Agents, Frameworks, and Tooling

AI in Robotics and Embodied Agents

LLM-Based Coding Assistants and Tools

Observations and Opinions

Humor/Memes


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. LLMs 800x Cheaper for Translation than DeepL

Theme 2. Budget 64GB VRAM GPU Server under $700

Theme 3. TikZero: AI-Generated Scientific Figures from Text

Theme 4. Creative Writing with Sub-15B LLM Models

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Claude 3.7 Regression: Widespread User Concerns

Theme 2. OpenAI's openai.fm Text-to-Speech Model Release

Theme 3. Kitboga's AI Bot Army: Creative Use Against Scammers

Theme 4. Vibe Coding: A New Trend in AI Development


AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. LLM Pricing and Market Volatility

Theme 2. LLM Model Quirks and Fixes

Theme 3. Tools and Frameworks Evolve for LLM Development

Theme 4. Hardware Headaches and Performance Hurdles

Theme 5. AI Ethics, Policy, and Safety Debates


PART 1: High level Discord summaries

Cursor Community Discord


Unsloth AI (Daniel Han) Discord


aider (Paul Gauthier) Discord


LM Studio Discord


Perplexity AI Discord


Interconnects (Nathan Lambert) Discord


Notebook LM Discord


Nous Research AI Discord


MCP (Glama) Discord


OpenAI Discord


LMArena Discord


HuggingFace Discord


OpenRouter (Alex Atallah) Discord


GPU MODE Discord


Latent Space Discord


Eleuther Discord


Cohere Discord


Modular (Mojo 🔥) Discord


LlamaIndex Discord


Torchtune Discord


tinygrad (George Hotz) Discord


LLM Agents (Berkeley MOOC) Discord


DSPy Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Nomic.ai (GPT4All) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor Community ▷ #general (1517 messages🔥🔥🔥):

Agent Mode Down, Dan Perks, Keychron Keyboard, Vibe Coding, Pear AI vs Cursor

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (371 messages🔥🔥):

TPUs speed comparison, Gradient Accumulation fix, Gemma model version misinformation, Sophia optimizer experiments, Gemma 3 Activation Normalization

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (11 messages🔥):

GTC worth it?, Gemma 3 BFloat16 ranges, Cfloat16 idea, hiddenlayer with vllm

Link mentioned: Tweet from Daniel Han (@danielhanchen): On further thoughts - I actually find this to be extremely fascinating overall! Gemma 3 is the first model I encountered to "love" using larger full bfloat16 ranges, and I'm speculating, m...


Unsloth AI (Daniel Han) ▷ #help (63 messages🔥🔥):

Gemma 3 finetuning, Data format for prompt/response pairs, Multi-image training for Gemma 3, Triton downgrade for Gemma 3, DPO examples and patching

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (2 messages):

Unsloth mention, Miguel's content quality


Unsloth AI (Daniel Han) ▷ #research (10 messages🔥):

PPO Understanding, Multi-turn fine-tuning dataset, Inference-time optimization, DAPO algorithm

Links mentioned:


aider (Paul Gauthier) ▷ #general (278 messages🔥🔥):

Featherless.ai configuration issues, Alternatives to Claude Sonnet, DeepSeek R1 benchmark comparison, OpenAI o1-pro pricing, Aider and Claude Code comparison

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (39 messages🔥):

Claude 3.7 Sonnet, OpenRouter Gemini API, Aider's LLM Benchmarks, Local Model Codebase

Links mentioned:


aider (Paul Gauthier) ▷ #links (1 messages):

LLM Blindspots, AI Coding, Cursor Rules, Sonnet Family

Link mentioned: AI Blindspots: Blindspots in LLMs I’ve noticed while AI coding. Sonnet family emphasis. Maybe I will eventually suggest Cursor rules for these problems.


LM Studio ▷ #general (82 messages🔥🔥):

LM Studio proxy settings, PCIE bandwidth on inference speed, Q8 K and V cache Quant, LM Studio RAM and VRAM reporting issues, Mistral Small 24b 2503 vision support

Link mentioned: Download LM Studio - Mac, Linux, Windows: Discover, download, and run local LLMs


LM Studio ▷ #hardware-discussion (212 messages🔥🔥):

RTX 8000, GPU VRAM upgrades, GPU Shared memory, Multi-GPU performance issues, NPU support in LM Studio

Links mentioned:


Perplexity AI ▷ #general (183 messages🔥🔥):

Perplexity on locked screen, Perplexity Sources Count, O1 Pro on Perplexity, Perplexity Deep Research Limits, GPT 4.5 Missing

Links mentioned:


Perplexity AI ▷ #sharing (9 messages🔥):

Perplexity API, Machine Guns vs Lasers, Outrageous Yellow, Elon Musk Controversies


Perplexity AI ▷ #pplx-api (10 messages🔥):

Sonar API, Sonar Deep Research Model Improvements, Sonar Search Modes, API Billing Structure, API Key Naming

Link mentioned: ppl-ai/api-discussion: Discussion forum for Perplexity API. Contribute to ppl-ai/api-discussion development by creating an account on GitHub.


Interconnects (Nathan Lambert) ▷ #news (75 messages🔥🔥):

O1 Pro API, Cursor hires srush_nlp, Nvidia open sources Canary ASR, Anthropic web search, OpenAI radio contest

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (35 messages🔥):

NVIDIA GTC AI Training and Certification, ByteCraft generative model for video games, Gemma package for fine-tuning, Uncertain Eric Substack, OpenAI new audio models

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (19 messages🔥):

Sampling trajectories, o1pro pricing, Anthropic application cover letter


Interconnects (Nathan Lambert) ▷ #rl (1 messages):

twkillian: Can't wait to feel like I can keep up with all of this


Interconnects (Nathan Lambert) ▷ #reads (5 messages):

SWEET-RL, Sam Altman Interview

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (17 messages🔥):

post-training, GPU experiments


Interconnects (Nathan Lambert) ▷ #policy (30 messages🔥):

Allen Institute for AI's recommendation to OSTP, China's AI labeling regulations, Meta used pirated books for Llama3, Qwen2.5 Coder training data size

Links mentioned:


Notebook LM ▷ #use-cases (9 messages🔥):

Chrome extensions for web crawling, Customizing audio episodes, HY-MPS3 sequencer/arpeggiator plugin, Impact of attention span and social media


Notebook LM ▷ #general (122 messages🔥🔥):

Mindmap Feature, LaTeX rendering in NotebookLM, Table of contents on NotebookLM, Combine Notebooks, Audio option voices

Links mentioned:


Nous Research AI ▷ #general (85 messages🔥🔥):

QLoRA training for Hugging Face Transformer features, Debugging coding errors with LLMs, GGUF vs other model formats, Aphrodite Engine performance, Nvidia Blackwell RTX Pro series GPUs

Links mentioned:


Nous Research AI ▷ #ask-about-llms (36 messages🔥):

QWQ-32B Fine-Tuning, Alpaca Format for QWQ, Think Token Importance, Unsloth and QLoRA, Dataset Transformation with DeepSeek

Links mentioned:


Nous Research AI ▷ #interesting-links (2 messages):

Logan Kilpatrick YouTube video, Interesting chat


MCP (Glama) ▷ #general (112 messages🔥🔥):

Installing uv package manager, glama.json for claiming MCP servers, GitHub API rate limits, Turso database MCP server, HTTP baked into MCP

{
  "$schema": "https://glama.ai/mcp/schemas/server.json",
  "maintainers": [
    "your-github-username"
  ]
}

Links mentioned:


MCP (Glama) ▷ #showcase (11 messages🔥):

Asana tool filtering, Notion custom headers, Unity MCP integration, Game asset MCP, Semantic Workbench extension

Links mentioned:


OpenAI ▷ #annnouncements (4 messages):

o1-pro, TTS, Audio models

Link mentioned: OpenAI.fm: An interactive demo for developers to try the new text-to-speech model in the OpenAI API


OpenAI ▷ #ai-discussions (85 messages🔥🔥):

Chinese Model Censorship, o1-pro API Pricing, Future of Software Development with AI, OpenAI Agent SDK vs MCP, Midjourney Alternatives on iOS

Links mentioned:


OpenAI ▷ #gpt-4-discussions (6 messages):

GPT Emoji insertion, Custom GPTs Reasoning, Subscription PRO issues


OpenAI ▷ #prompt-engineering (8 messages🔥):

Stock Market Prediction with ChatGPT, AI behavior origins, adaptive AI behavior


OpenAI ▷ #api-discussions (8 messages🔥):

Stock Market Prediction with AI, AI and Financial Advice Policies, AI behavior origins, Adaptive behavior in AI, AI memory


LMArena ▷ #general (110 messages🔥🔥):

AI Hallucinations, Search Engine Limitations, Gemini Pro vs Flash Thinking, AI Model Rankings, o1-pro API Pricing

Links mentioned:


HuggingFace ▷ #general (46 messages🔥):

Hugging Face Spaces, Flux Diffusion Model, HF Inference API outage, Roblox Voice Safety Classifier, Chinese/Korean/Japanese WER vs CER

Links mentioned:


HuggingFace ▷ #i-made-this (9 messages🔥):

LLM Token Vocabulary Analysis, Neuro-sama like LLM, Telugu Speech Recognition Model, API interactions and token manipulation, Ollama-based Gradio UI

Links mentioned:


HuggingFace ▷ #computer-vision (2 messages):

GPU configuration with TensorFlow, FCOS implementation in TensorFlow, FCOS: Fully Convolutional One-Stage Object Detection

Link mentioned: Deep Learning model research implementation: FCOS: One of my current projects is working on implementing a computer vision model from the research paper which is the FCOS: Fully…


HuggingFace ▷ #smol-course (1 messages):

GSM8K Dataset, Tokenizer Method, ChatML Format


HuggingFace ▷ #agents-course (42 messages🔥):

Gaussian Blur Tool, HF Agent Hackathon Details, Korean Translation PR, Local Vision Model Issues, deeplearning.ai LangGraph Course

Links mentioned:


HuggingFace ▷ #open-r1 (3 messages):

Foundation Models, LLMs from scratch


OpenRouter (Alex Atallah) ▷ #general (101 messages🔥🔥):

O1-Pro Pricing, LLM Chess Tournament, OpenRouter API Free Models, Groq API Issues, OpenAI's New Audio Models

Links mentioned:


GPU MODE ▷ #general (9 messages🔥):

Vast.ai NCU profiling, Jake in discord, Marksaroufim in discord, Vast.ai bare metal access, Ways to get NCU and NSYS


GPU MODE ▷ #triton (28 messages🔥):

tl.atomic and bfloat16, tilelang for atomic operations, Triton's bfloat16 support, cuTile NVIDIA, DeepSeek DeepGEMM

Links mentioned:


GPU MODE ▷ #cuda (1 messages):

CUDA Kernels, Parallel computing


GPU MODE ▷ #torch (4 messages):

Autograd engine, Numerical stability in gradient accumulation, PyTorch pull request 149478

Link mentioned: [Distributed] Add repr methods for ParallelStyles by shink · Pull Request #149478 · pytorch/pytorch: Fixes #149470cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o


GPU MODE ▷ #algorithms (2 messages):

GEMM activation fusion, Triton kernels optimization, Register Spillage


GPU MODE ▷ #beginner (2 messages):

Training foundation models, LLM training, Data Science in LLM


GPU MODE ▷ #jax (1 messages):

Tenstorrent, JAX, MLIR compiler, Open Source Bounty Program

Link mentioned: tenstorrent/tt-forge: Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-source, general, and performant compiler. - tenstorrent...


GPU MODE ▷ #off-topic (1 messages):

LLMs for GPU Development, LLM Bug Detection in Kernels, Kernel Fusion Issues


GPU MODE ▷ #irl-meetup (3 messages):

Exhibition hall meetup, Conference in Poland

Link mentioned: homepage - PP-RAI 2025: Goals of the 6th Polish Conference on Artificial Intelligence PP-RAI aims to bring together researchers from the domain of Artificial Intelligence and provide a platform for: discussion on the new for...


GPU MODE ▷ #lecture-qa (1 messages):

FA3, CUTLASS, wgmma FLOPS calculation, 4096 FLOPS/cycle


GPU MODE ▷ #liger-kernel (2 messages):

Kernel development, Device meshes


GPU MODE ▷ #submissions (10 messages🔥):

Grayscale benchmarks, Conv2d benchmarks, Modal Runners on various GPUs


GPU MODE ▷ #ppc (4 messages):

Processor jump alignment, Alignment issues in Intel CPUs

Link mentioned: Log in: no description found


GPU MODE ▷ #hardware (3 messages):

Consumer GPUs for ML/CUDA, 5080 vs Cloud Credits, Home ML Development


Latent Space ▷ #ai-general-chat (41 messages🔥):

Orpheus TTS Model, DeepSeek R1 Cost, OpenAI's O1-Pro Model, Gemma Package, Perplexity Funding Round

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

swyxio: quick pod from NVIDIA GTC https://www.youtube.com/watch?v=AOL0RIZxJF0


Eleuther ▷ #general (12 messages🔥):

Monolingual Models, AI Safety, Interpretability


Eleuther ▷ #research (25 messages🔥):

Expert Choice Routing, Quantile Estimation for Thresholds, Gaussian Quantile Function, BatchTopK SAE, Node Limited Routing

Link mentioned: The KoLMogorov Test: Compression by Code Generation: Compression is at the heart of intelligence. A theoretically optimal way to compress any sequence of data is to find the shortest program that outputs that sequence and then halts. However, such '...


Cohere ▷ #「💬」general (23 messages🔥):

Cohere Expanse 32B Knowledge Date, Critique of Comparing Cohere to OpenAI, Cohere Model via OpenRouter and Azure AI Search, Cohere model mimicking Mexican people, Connectors Support in Recent Models (cmd-R, cmd-A)


Cohere ▷ #「🔌」api-discussions (7 messages):

OpenAI API context length limitations, Cohere vs OpenAI API, Aya model usage with Ollama, Checking Cohere API free limit


Cohere ▷ #「💡」projects (1 messages):

MCP Server, Cohere Command A, Positive News

Link mentioned: GitHub - VectorInstitute/mcp-goodnews: A simple MCP application that delivers curated positive and uplifting news stories.: A simple MCP application that delivers curated positive and uplifting news stories. - VectorInstitute/mcp-goodnews


Cohere ▷ #「🤝」introductions (2 messages):

RAG Federation, Agentic Apps/Research, Vector Institute


Modular (Mojo 🔥) ▷ #general (4 messages):

Photonics, Integrated CPU, Ruben GPUs, CX9, DIGITs successor


Modular (Mojo 🔥) ▷ #mojo (23 messages🔥):

debug_assert in Mojo, List bounds checking, Mojo compiler options, Undefined behavior in Mojo, Mojo test defaults

Links mentioned:


LlamaIndex ▷ #blog (2 messages):

DeepLearningAI short course, AI voice assistant pipeline


LlamaIndex ▷ #general (20 messages🔥):

LLM.as_structured_llm parallel tool calls, MariaDBChatStore, llamaparse QA


Torchtune ▷ #general (10 messages🔥):

Nvidia Delays, Gemma 3 Fine Tuning, Torchtune sprint


Torchtune ▷ #dev (2 messages):

nv-fabricmanager, driver versions


tinygrad (George Hotz) ▷ #general (5 messages):

ML4SCI/task1, Adam Optimizer

Link mentioned: gsoc_2025/ML4SCI/task1 at main · kayo09/gsoc_2025: GSOC 2025! Happy Coding! ☀️. Contribute to kayo09/gsoc_2025 development by creating an account on GitHub.


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (3 messages):

AgentX Research Track, LLM agents, Multi-agent systems, Advanced AI research


DSPy ▷ #general (1 messages):

kotykd: Can I do something like this using dspy? https://arxiv.org/abs/2502.06855






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}