Model: "qwen3-235b"

Sep 10, 2025

Oracle jumps +36% in a day after winning $300B OpenAI contract

qwen3-235b qwen3-4b qwen2.5-7b vllm oracle openai microsoft moonshot-ai vllm-project thinking-machines-lab meta reinforcement-learning model-weight-updates deterministic-inference benchmarking long-context model-optimization cuda distributed-training kimi_moonshot arankomatsuzaki qgallouedec cHHillee woosuk_k stasbekman

Oracle's OCI division reported a stunning +359% revenue bookings growth to $455B with cloud revenue guidance of $144B by 2030, driven significantly by a large deal with OpenAI amid tensions with Microsoft. On AI infrastructure, Moonshot AI released Kimi’s checkpoint-engine, enabling rapid weight updates on 1T-parameter models across thousands of GPUs, integrating with vLLM. RLFactory introduced a plug-and-play reinforcement learning framework for tool-using agents, showing smaller models outperforming larger ones. TRL v0.23 added context parallelism for long-context training. Thinking Machines Lab published research on deterministic inference pipelines, making vLLM deterministic for Qwen models. Meta launched BackendBench, a PyTorch benchmarking tool.

Jul 30, 2025

not much happened today

glm-4.5 glm-4.5-air qwen3-coder qwen3-235b kimi-k2 grok-imagine wan-2.2 smollm3 figure-01 figure-02 vitpose++ chatgpt zhipu-ai alibaba moonshot-ai x-ai figure openai runway mlx ollama deeplearningai model-releases model-performance moe image-generation video-generation pose-estimation robotics training-code-release interactive-learning in-context-learning yuchenj_uw corbtt reach_vb ollama deeplearningai gdb sama c_valenzuelab adcock_brett skalskip92 loubnabenallal1 hojonathanho ostrisai

Chinese AI labs have released powerful open-source models like GLM-4.5 and GLM-4.5-Air from Zhipu AI, Qwen3 Coder and Qwen3-235B from Alibaba, and Kimi K2 from Moonshot AI, highlighting a surge in permissively licensed models. Zhipu AI's GLM-4.5 is a 355B parameter MoE model competitive with Claude 4 Opus and Gemini 2.5 Pro. Alibaba's Qwen3 Coder shows strong code generation performance with a low edit failure rate, while Moonshot AI's Kimi K2 is a 1 trillion-parameter MoE model surpassing benchmarks like LiveCodeBench. In video and image generation, xAI launched Grok Imagine, and Wan2.2 impressed with innovative image-to-video generation. Robotics advances include Figure's Figure-01 and Figure-02 humanoid robots and ViTPose++ for pose estimation in basketball analysis. SmolLM3 training and evaluation code was fully released under Apache 2.0. OpenAI introduced Study Mode in ChatGPT to enhance interactive learning, and Runway rolled out Runway Aleph, a new in-context video model for multi-task visual generation. The community notes a competitive disadvantage for organizations avoiding these Chinese open-source models. "Orgs avoiding these models are at a significant competitive disadvantage," noted by @corbtt.

Jul 29, 2025

not much happened today

glm-4.5 glm-4.5-air qwen3-coder qwen3-235b kimi-k2 wan-2.2 grok-imagine smollm3 figure-01 figure-02 vitpose++ zhipu-ai alibaba moonshot-ai x-ai ideogram figure smollm openai model-releases moe model-benchmarking image-generation video-generation pose-estimation robotics training-code-release apache-license yuchenj_uw corbtt cline reach_vb ollama deeplearningai ostrisai hojonathanho adcock_brett skalskip92 loubnabenallal1

Chinese labs have released a wave of powerful, permissively licensed models in July, including Zhipu AI's GLM-4.5 and GLM-4.5-Air, Alibaba's Qwen3 Coder and Qwen3-235B, and Moonshot AI's Kimi K2. These models feature large-scale Mixture of Experts architectures with active parameters ranging from 3B to 32B and context windows up to 256K tokens. Zhipu AI's GLM-4.5 competes with Claude 4 Opus and Gemini 2.5 Pro in benchmarks. Moonshot AI's Kimi K2 is a 1 trillion-parameter MoE model surpassing other open-weight models on LiveCodeBench and AceBench. In video and image generation, xAI launched Grok Imagine, and Wan2.2 impressed with its Image-to-Video approach. Ideogram released a character consistency model. Robotics advances include Figure's Figure-01 and Figure-02 humanoid robots and ViTPose++ for pose estimation in basketball analysis. The SmolLM3 training and evaluation code was fully released under an Apache 2.0 license. "Orgs avoiding these Chinese open-source models are at a significant competitive disadvantage," noted by @corbtt.

Jun 06, 2025

not much happened today

dots-llm1 qwen3-235b xiaohongshu rednote-hilab deepseek huggingface mixture-of-experts open-source model-benchmarking fine-tuning inference context-windows training-data model-architecture model-performance model-optimization

China's Xiaohongshu (Rednote) released dots.llm1, a 142B parameter open-source Mixture-of-Experts (MoE) language model with 14B active parameters and a 32K context window, pretrained on 11.2 trillion high-quality, non-synthetic tokens. The model supports efficient inference frameworks like Docker, HuggingFace, and vLLM, and provides intermediate checkpoints every 1 trillion tokens, enabling flexible fine-tuning. Benchmarking claims it slightly surpasses Qwen3 235B on MMLU, though some concerns exist about benchmark selection and synthetic data verification. The release is notable for its truly open-source licensing and no synthetic data usage, sparking community optimism for support in frameworks such as llama.cpp and mlx.

May 02, 2025

not much happened today

qwen3-14b qwen3-32b qwen3-235b phi-4-reasoning o3-mini command-a gemini-2.5-pro o4-mini olm-o2-1b o3 alibaba together-ai scaling01 microsoft deepseek cohere google epoch-ai-research inception-labs openai allenai quantization fine-tuning reinforcement-learning benchmarking video-generation diffusion-models model-performance model-evaluation model-release text-generation cline _philschmid iscienceluvr alexalbert__ _lewtun teortaxestex sarahookr reach_vb

Qwen model family released quantized versions of Qwen3 models including 14B, 32B, and 235B parameters, with promising coding capabilities in Qwen3-235B. Microsoft launched Phi-4-reasoning, a 14B parameter model distilled from OpenAI's o3-mini, emphasizing supervised fine-tuning and reinforcement learning, outperforming larger models in some benchmarks. Cohere's Command A leads SQL performance on Bird Bench. Google introduced the TRAJAN eval for video generation temporal consistency and updated the Gemini OpenAI compatibility layer. Inception Labs launched a diffusion LLM API claiming 5x speed improvements over autoregressive models. Community rankings show OpenAI's o3 model debuting strongly in web app-building tasks. Other releases include AllenAI's OLMo2 1B and additional Phi 4 variants. "Qwen3-235B shows promise for coding" and "Phi-4-reasoning tech report emphasizes SFT gains" highlight key advancements.

May 01, 2025

not much happened today

phi-4 phi-4-mini-reasoning qwen3-235b qwen3-moe-235b qwen3-moe-30b qwen3-dense-32b qwen3-dense-14b qwen3-dense-8b qwen3-dense-4b qwen3-dense-0.6b qwen2.5-omni-3b deepseek-prover-v2 llama llama-guard-4 prompt-guard-2 mimo-7b microsoft anthropic cursor alibaba togethercompute deepseek meta-ai-fair xiaomi openrouterai cohere reasoning model-fine-tuning model-evaluation benchmarking model-popularity open-source math model-scaling model-filtering jailbreak-prevention cline reach_vb vipulved akhaliq omarsar0 zhs05232838 huajian_xin mervenoyann karpathy random_walker sarahookr blancheminerva clefourrier

Microsoft released Phi-reasoning 4, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. Anthropic introduced remote MCP server support and a 45-minute Research mode in Claude. Cursor published a model popularity list. Alibaba launched Qwen3-235B and other Qwen3 variants, highlighting budget-friendly coding and reasoning capabilities, with availability on Together AI API. Microsoft also released Phi-4-Mini-Reasoning with benchmark performance on AIME 2025 and OmniMath. DeepSeek announced DeepSeek-Prover V2 with state-of-the-art math problem solving, scaling to 671B parameters. Meta AI's Llama models hit 1.2 billion downloads, with new Llama Guard 4 and Prompt Guard 2 for input/output filtering and jailbreak prevention. Xiaomi released the open-source reasoning model MiMo-7B trained on 25 trillion tokens. Discussions on AI model evaluation highlighted issues with the LMArena leaderboard, data access biases favoring proprietary models, and challenges in maintaining fair benchmarking, with suggestions for alternatives like OpenRouterAI rankings. "LMArena slop and biased" and "61.3% of all data going to proprietary model providers" were noted concerns.