All tags
Company: "deepinfra"
Voxtral - Mistral's SOTA ASR model in 3B (mini) and 24B ("small") sizes beats OpenAI Whisper large-v3
voxtal-3b voxtal-24b kimi-k2 mistral-ai moonshot-ai groq together-ai deepinfra huggingface langchain transcription long-context function-calling multilingual-models mixture-of-experts inference-speed developer-tools model-integration jeremyphoward teortaxestex scaling01 zacharynado jonathanross321 reach_vb philschmid
Mistral surprises with the release of Voxtral, a transcription model outperforming Whisper large-v3, GPT-4o mini Transcribe, and Gemini 2.5 Flash. Voxtral models (3B and 24B) support 32k token context length, handle audios up to 30-40 minutes, offer built-in Q&A and summarization, are multilingual, and enable function-calling from voice commands, powered by the Mistral Small 3.1 language model backbone. Meanwhile, Moonshot AI's Kimi K2, a non-reasoning Mixture of Experts (MoE) model built by a team of around 200 people, gains attention for blazing-fast inference on Groq hardware, broad platform availability including Together AI and DeepInfra, and local running on M4 Max 128GB Mac. Developer tool integrations include LangChain and Hugging Face support, highlighting Kimi K2's strong tool use capabilities.
Too Cheap To Meter: AI prices cut 50-70% in last 30 days
gpt-4o gpt-4o-mini llama-3-1-405b mistral-large-2 gemini-1.5-flash deepseek-v2 sonnet-3.5 exaone-3.0 minicpm-v-2.6 claude-3.5 gpt-4o-2024-08-06 llamaindex together-ai deepinfra deepseek-ai mistral-ai google-deepmind lg-ai-research llamaindex llamaindex llamaindex price-cuts context-caching instruction-tuning vision benchmarks pytorch attention-mechanisms reinforcement-learning-from-human-feedback compute-optimal-scaling rohanpaul_ai akhaliq mervenoyann sophiamyang chhillee karpathy
Gemini 1.5 Flash has cut prices by approximately 70%, offering a highly competitive free tier of 1 million tokens per minute at $0.075/mtok, intensifying the AI model price war. Other significant price reductions include GPT-4o (~50% cut to $2.50/mtok), GPT-4o mini (70-98.5% cut to $0.15/mtok), Llama 3.1 405b (46% cut to $2.7/mtok), and Mistral Large 2 (62% cut to $3/mtok). Deepseek v2 introduced context caching, reducing input token costs by up to 90% to $0.014/mtok. New model releases include Llama 3.1 405b, Sonnet 3.5, EXAONE-3.0 (7.8B instruction-tuned by LG AI Research), and MiniCPM V 2.6 (vision-language model combining SigLIP 400M and Qwen2-7B). Benchmarks show Mistral Large performing well on ZebraLogic and Claude-3.5 leading LiveBench. FlexAttention, a new PyTorch API, simplifies and optimizes attention mechanisms. Andrej Karpathy analyzed RLHF, highlighting its limitations compared to traditional reinforcement learning. Google DeepMind research on compute-optimal scaling was also summarized.