All tags
Person: "dorialexander"
not much happened today
DeepSeek released a new paper on mHC: Manifold-Constrained Hyper-Connections, advancing residual-path design as a key scaling lever in neural networks. Their approach constrains residual mixing matrices to the Birkhoff polytope to improve stability and performance, with only about 6.7% training overhead. The innovation includes systems-level optimizations like fused kernels and activation recomputation, highlighting a frontier-lab integration of math and kernel engineering. Additionally, discussions around long-horizon agents emphasize context management bottlenecks, introducing Recursive Language Models (RLMs) that manage context dynamically rather than relying on larger context windows. This work signals a shift in architectural design and efficiency for base model training and agent development.
not much happened today
qwen-image-2512 ax-k1 k-exaone sk-telecom lg upstage naver alibaba unsloth replicate mixture-of-experts model-release quantization open-source-models image-generation model-integration model-benchmarking compute-costs dataset-curation eliebakouch clementdelangue dorialexander rising_sayak _akhaliq ostrisai ivanfioravanti yupp_ai
South Korea's Ministry of Science launched a coordinated program with 5 companies to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like SK Telecom A.X-K1 (519B total / 33B active) and LG K-EXAONE (236B MoE / 23B active), with a total first-round budget of ~$140M. This initiative contrasts with EU approaches by focusing funding on fewer stakeholders and explicitly budgeting for data. Meanwhile, Alibaba's Qwen-Image-2512 emerges as a leading open-source image generation model, rapidly integrated into various toolchains including AI-Toolkit and local inference paths with quantization support, and hosted on platforms like Replicate. The model has undergone extensive blind testing with over 10,000 rounds on AI Arena, highlighting its ecosystem adoption.
Qwen-Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT
qwen-image mmdit gemini-2.5 o3-pro seedprover glm-4.5 xbai-o4 hunyuan alibaba google-deepmind openai bytedance kaggle tencent bilingual-text-rendering image-generation image-editing synthetic-data reasoning math-theorem-proving benchmarking instruction-following model-efficiency open-weight-models model-transparency competitive-evaluation swyx demishassabis tulseedoshi mparakhin teortaxestex cgeorgiaw dorialexander steph_palazzolo corbtt synthwavedd epochairesearch
Alibaba surprised with the release of Qwen-Image, a 20B MMDiT model excelling at bilingual text rendering and graphic poster creation, with open weights and demos available. Google DeepMind launched Gemini 2.5 Deep Think to Ultra subscribers, showing significant reasoning improvements and benchmark gains (+11.2% AIME, +13.2% HLE, +13.4% LiveCodeBench) rivaling OpenAI's o3 Pro. ByteDance's SeedProver achieved state-of-the-art math theorem proving results, surpassing DeepMind's AlphaGeometry2. OpenAI is developing a "universal verifier" for math and coding gains transfer. Competitive reasoning benchmarks and game arenas by Google and Kaggle highlight a meta-shift in reasoning model efficiency, comparable to the original Transformer leap. Other open-weight models gaining momentum include GLM-4.5, XBai o4, and Tencent Hunyuan with a focus on efficient training. "Qwen is all you need."