All tags
Model: "qwen3-235b"
not much happened today
dots-llm1 qwen3-235b xiaohongshu rednote-hilab deepseek huggingface mixture-of-experts open-source model-benchmarking fine-tuning inference context-windows training-data model-architecture model-performance model-optimization
China's Xiaohongshu (Rednote) released dots.llm1, a 142B parameter open-source Mixture-of-Experts (MoE) language model with 14B active parameters and a 32K context window, pretrained on 11.2 trillion high-quality, non-synthetic tokens. The model supports efficient inference frameworks like Docker, HuggingFace, and vLLM, and provides intermediate checkpoints every 1 trillion tokens, enabling flexible fine-tuning. Benchmarking claims it slightly surpasses Qwen3 235B on MMLU, though some concerns exist about benchmark selection and synthetic data verification. The release is notable for its truly open-source licensing and no synthetic data usage, sparking community optimism for support in frameworks such as llama.cpp and mlx.
not much happened today
qwen3-14b qwen3-32b qwen3-235b phi-4-reasoning o3-mini command-a gemini-2.5-pro o4-mini olm-o2-1b o3 alibaba together-ai scaling01 microsoft deepseek cohere google epoch-ai-research inception-labs openai allenai quantization fine-tuning reinforcement-learning benchmarking video-generation diffusion-models model-performance model-evaluation model-release text-generation cline _philschmid iscienceluvr alexalbert__ _lewtun teortaxestex sarahookr reach_vb
Qwen model family released quantized versions of Qwen3 models including 14B, 32B, and 235B parameters, with promising coding capabilities in Qwen3-235B. Microsoft launched Phi-4-reasoning, a 14B parameter model distilled from OpenAI's o3-mini, emphasizing supervised fine-tuning and reinforcement learning, outperforming larger models in some benchmarks. Cohere's Command A leads SQL performance on Bird Bench. Google introduced the TRAJAN eval for video generation temporal consistency and updated the Gemini OpenAI compatibility layer. Inception Labs launched a diffusion LLM API claiming 5x speed improvements over autoregressive models. Community rankings show OpenAI's o3 model debuting strongly in web app-building tasks. Other releases include AllenAI's OLMo2 1B and additional Phi 4 variants. "Qwen3-235B shows promise for coding" and "Phi-4-reasoning tech report emphasizes SFT gains" highlight key advancements.
not much happened today
phi-4 phi-4-mini-reasoning qwen3-235b qwen3-moe-235b qwen3-moe-30b qwen3-dense-32b qwen3-dense-14b qwen3-dense-8b qwen3-dense-4b qwen3-dense-0.6b qwen2.5-omni-3b deepseek-prover-v2 llama llama-guard-4 prompt-guard-2 mimo-7b microsoft anthropic cursor alibaba togethercompute deepseek meta-ai-fair xiaomi openrouterai cohere reasoning model-fine-tuning model-evaluation benchmarking model-popularity open-source math model-scaling model-filtering jailbreak-prevention cline reach_vb vipulved akhaliq omarsar0 zhs05232838 huajian_xin mervenoyann karpathy random_walker sarahookr blancheminerva clefourrier
Microsoft released Phi-reasoning 4, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. Anthropic introduced remote MCP server support and a 45-minute Research mode in Claude. Cursor published a model popularity list. Alibaba launched Qwen3-235B and other Qwen3 variants, highlighting budget-friendly coding and reasoning capabilities, with availability on Together AI API. Microsoft also released Phi-4-Mini-Reasoning with benchmark performance on AIME 2025 and OmniMath. DeepSeek announced DeepSeek-Prover V2 with state-of-the-art math problem solving, scaling to 671B parameters. Meta AI's Llama models hit 1.2 billion downloads, with new Llama Guard 4 and Prompt Guard 2 for input/output filtering and jailbreak prevention. Xiaomi released the open-source reasoning model MiMo-7B trained on 25 trillion tokens. Discussions on AI model evaluation highlighted issues with the LMArena leaderboard, data access biases favoring proprietary models, and challenges in maintaining fair benchmarking, with suggestions for alternatives like OpenRouterAI rankings. "LMArena slop and biased" and "61.3% of all data going to proprietary model providers" were noted concerns.