All tags
Person: "prince_canuma"
not much happened today
glm-4.6v glm-4.6v-flash jina-vlm-2b hugging-face zhipu-ai jina-ai google-deepmind axiomprover fine-tuning multimodality model-optimization long-context mechanistic-interpretability formal-methods sequence-architectures reinforcement-learning lioronai akshay_pachaar _akhaliq ben_burtenshaw vllm_project prince_canuma zenmuxai eliebakouch theturingpost axiommathai neelnanda5 sarahookr
Claude Code Skills gains attention with a published talk and Hugging Face's new "skill" enabling one-line fine-tuning pipelines for models from ~0.5B to 70B parameters, supporting SFT, DPO, and GRPO, costing as low as ~$0.30 for small runs. Zhipu AI launches multimodal models GLM-4.6V (106B params MoE) and GLM-4.6V-Flash (9B dense), featuring 128k context and native multimodal function calling, with free Flash variant and API pricing detailed. Jina AI releases Jina-VLM (2B), a compact multilingual VLM excelling in diagrams and documents with top benchmark scores. At NeurIPS 2025, research highlights include Google's post-Transformer sequence architectures (Moneta, Yaad, Memora) showing up to 20% gains in long-context retrieval, AxiomProver's autonomous Lean system solving 9/12 Putnam 2025 problems rapidly, and mechanistic interpretability advances discussed by Chris Olah emphasizing scalable tooling.
Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1
qwen-3 qwen3-235b-a22b qwen3-30b-a3b deepseek-r1 o1 o3-mini grok-3 gemini-2.5-pro alibaba google-deepmind deepseek mistral-ai mixture-of-experts reinforcement-learning benchmarking model-release model-architecture long-context multi-agent-systems inference dataset-release awnihannun prince_canuma actuallyisaak oriolvinyalsml iscienceluvr reach_vb teortaxestex omarsar0
Qwen 3 has been released by Alibaba featuring a range of models including two MoE variants, Qwen3-235B-A22B and Qwen3-30B-A3B, which demonstrate competitive performance against top models like DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. The models introduce an "enable_thinking=True" mode with advanced soft switching for inference scaling. The release is notable for its Apache 2.0 license and broad inference platform support including MCP. The dataset improvements and multi-stage RL post-training contribute to performance gains. Meanwhile, Gemini 2.5 Pro from Google DeepMind shows strong coding and long-context reasoning capabilities, and DeepSeek R2 is anticipated soon. Twitter discussions highlight Qwen3's finegrained MoE architecture, large context window, and multi-agent system applications.