All tags
Company: "vllm-project"
Oracle jumps +36% in a day after winning $300B OpenAI contract
qwen3-235b qwen3-4b qwen2.5-7b vllm oracle openai microsoft moonshot-ai vllm-project thinking-machines-lab meta reinforcement-learning model-weight-updates deterministic-inference benchmarking long-context model-optimization cuda distributed-training kimi_moonshot arankomatsuzaki qgallouedec cHHillee woosuk_k stasbekman
Oracle's OCI division reported a stunning +359% revenue bookings growth to $455B with cloud revenue guidance of $144B by 2030, driven significantly by a large deal with OpenAI amid tensions with Microsoft. On AI infrastructure, Moonshot AI released Kimi’s checkpoint-engine, enabling rapid weight updates on 1T-parameter models across thousands of GPUs, integrating with vLLM. RLFactory introduced a plug-and-play reinforcement learning framework for tool-using agents, showing smaller models outperforming larger ones. TRL v0.23 added context parallelism for long-context training. Thinking Machines Lab published research on deterministic inference pipelines, making vLLM deterministic for Qwen models. Meta launched BackendBench, a PyTorch benchmarking tool.
Cohere Command A Reasoning beats GPT-OSS-120B and DeepSeek R1 0528
command-a-reasoning deepseek-v3.1 cohere deepseek intel huggingface baseten vllm-project chutes-ai anycoder agentic-ai hybrid-models long-context fp8-training mixture-of-experts benchmarking quantization reasoning coding-workflows model-pricing artificialanlys reach_vb scaling01 cline ben_burtenshaw haihaoshen jon_durbin _akhaliq willccbb teortaxestex
Cohere's Command A Reasoning model outperforms GPT-OSS in open deep research capabilities, emphasizing agentic use cases for 2025. DeepSeek-V3.1 introduces a hybrid reasoning architecture toggling between reasoning and non-reasoning modes, optimized for agentic workflows and coding, with extensive long-context pretraining (~630B tokens for 32k context, ~209B for 128k), FP8 training, and a large MoE expert count (~37B). Benchmarks show competitive performance with notable improvements in SWE-Bench and other reasoning tasks. The model supports a $0.56/M input and $1.68/M output pricing on the DeepSeek API and enjoys rapid ecosystem integration including HF weights, INT4 quantization by Intel, and vLLM reasoning toggles. Community feedback highlights the hybrid design's pragmatic approach to agent and software engineering workflows, though some note the lack of tool use in reasoning mode.
not much happened today
kimi-k2 qwen3-235b-a22b qwen3-coder-480b-a35b gemini-2.5-flash-lite mistral-7b deepseek-v3 moonshot-ai alibaba google google-deepmind openai hugging-face vllm-project mixture-of-experts agentic-ai model-optimization model-training benchmarking code-generation long-context multimodality math reinforcement-learning model-architecture model-performance open-source alignment demishassabis rasbt alexwei_ yitayml
Moonshot AI released the Kimi K2, a 1-trillion parameter ultra-sparse Mixture-of-Experts (MoE) model with the MuonClip optimizer and a large-scale agentic data pipeline using over 20,000 tools. Shortly after, Alibaba updated its Qwen3 model with the Qwen3-235B-A22B variant, which outperforms Kimi K2 and other top models on benchmarks like GPQA and AIME despite being 4.25x smaller. Alibaba also released Qwen3-Coder-480B-A35B, a MoE model specialized for coding with a 1 million token context window. Google DeepMind launched Gemini 2.5 Flash-Lite, a faster and more cost-efficient model outperforming previous versions in coding, math, and multimodal tasks. The MoE architecture is becoming mainstream, with models like Mistral, DeepSeek, and Kimi K2 leading the trend. In mathematics, an advanced Gemini model achieved a gold medal level score at the International Mathematical Olympiad (IMO), marking a first for AI. An OpenAI researcher noted their IMO model "knew" when it did not have a correct solution, highlighting advances in model reasoning and self-awareness.
Perplexity starts Shopping for you
pixtral-large-124b llama-3.1-405b claude-3.6 claude-3.5 stripe perplexity-ai mistral-ai hugging-face cerebras anthropic weights-biases google vllm-project multi-modal image-generation inference context-windows model-performance model-efficiency sdk ai-integration one-click-checkout memory-optimization patrick-collison jeff-weinstein mervenoyann sophiamyang tim-dettmers omarsar0 akhaliq aravsrinivas
Stripe launched their Agent SDK, enabling AI-native shopping experiences like Perplexity Shopping for US Pro members, featuring one-click checkout and free shipping via the Perplexity Merchant Program. Mistral AI released the Pixtral Large 124B multi-modal image model, now on Hugging Face and supported by Le Chat for image generation. Cerebras Systems offers a public inference endpoint for Llama 3.1 405B with a 128k context window and high throughput. Claude 3.6 shows improvements over Claude 3.5 but with subtle hallucinations. The Bi-Mamba 1-bit architecture improves LLM efficiency. The wandb SDK is preinstalled on Google Colab, and Pixtral Large is integrated into AnyChat and supported by vLLM for efficient model usage.