All tags
Company: "z-ai"
Qwen3.5-397B-A17B: the smallest Open-Opus class, very efficient model
qwen3.5-397b-a17b qwen3.5-plus qwen3-max qwen3-vl kimi alibaba openai deepseek z-ai minimax kimi unsloth ollama vllm native-multimodality spatial-intelligence sparse-moe long-context model-quantization model-architecture model-deployment inference-optimization apache-2.0-license pete_steinberger justinlin610
Alibaba released Qwen3.5-397B-A17B, an open-weight model featuring native multimodality, spatial intelligence, and a hybrid linear attention + sparse MoE architecture supporting 201 languages and long context windows up to 256K tokens. The model shows improvements over previous versions like Qwen3-Max and Qwen3-VL, with a sparsity ratio of about 4.3%. Community discussions highlighted the Gated Delta Networks enabling efficient inference despite large model size (~800GB BF16), with successful local runs on Apple Silicon using quantization techniques. The hosted API version, Qwen3.5-Plus, supports 1M context and integrates search and code interpreter features. This release follows other Chinese labs like Z.ai, Minimax, and Kimi in refreshing large models. The model is licensed under Apache-2.0 and is expected to be the last major release before DeepSeek v4. The news also notes Pete Steinberger joining OpenAI.
not much happened today
claude-3-7-sonnet gpt-4-1 gemini-3 qwen3-vl-embedding qwen3-vl-reranker glm-4-7 falcon-h1r-7b jamba2 stanford google google-deepmind alibaba z-ai tii ai21-labs huggingface copyright-extraction multimodality multilinguality retrieval-augmented-generation model-architecture mixture-of-experts model-quantization reasoning inference kernel-engineering memory-optimization enterprise-ai sundarpichai justinlin610
Stanford paper reveals Claude 3.7 Sonnet memorized 95.8% of Harry Potter 1, highlighting copyright extraction risks compared to GPT-4.1. Google AI Studio sponsors TailwindCSS amid OSS funding debates. Google and Sundar Pichai launch Gmail Gemini 3 features including AI Overviews and natural-language search with user controls. Alibaba Qwen releases Qwen3-VL-Embedding and Qwen3-VL-Reranker, a multimodal, multilingual retrieval stack supporting text, images, and video with quantization and instruction customization, achieving strong benchmark results. Z.ai goes public on HKEX with GLM-4.7 leading the Artificial Analysis Intelligence Index v4.0, showing gains in reasoning, coding, and agentic use, with large-scale MoE architecture and MIT license. Falcon-H1R-7B from TII targets efficient reasoning in smaller models, scoring 16 on the Intelligence Index. AI21 Labs introduces Jamba2, a memory-efficient enterprise model with hybrid SSM-Transformer architecture and Apache 2.0 license, available via SaaS and Hugging Face. vLLM shows throughput improvements in inference and kernel engineering. "Embeddings should be multimodal by default," notes Justin Lin.
GLM-4.5: Deeper, Headier, & better than Kimi/Qwen/DeepSeek (SOTA China LLM?)
glm-4.5-355b-a32b glm-4.5-air-106b-a12b qwen3-coder claude-4-opus grok-4 o3 gpt-4.1 gpt-5 kimi-k2 claude-sonnet-4 z-ai alibaba huggingface openai reinforcement-learning token-efficiency model-optimization open-source-models agentic-ai coding model-training lupantech teortaxestex mervenoyann _lewtun scaling01 cline
Z.ai (Zhipu AI) released the GLM-4.5-355B-A32B and GLM-4.5-Air-106B-A12B open weights models, claiming state-of-the-art performance competitive with Claude 4 Opus, Grok 4, and OpenAI's o3. These models emphasize token efficiency and efficient reinforcement learning training validated by the Muon optimizer. Alibaba Qwen introduced Group Sequence Policy Optimization (GSPO), a new reinforcement learning algorithm powering the Qwen3 model suite, integrated into Hugging Face's TRL library. Speculation surrounds mystery models "summit" and "zenith" as potential GPT-5 variants based on GPT-4.1 architecture. Qwen3-Coder shows strong coding benchmark results, rivaling Claude Sonnet 4 and Kimi K2. The rise of powerful Chinese open-source models like GLM-4.5, Wan-2.2, and Qwen3 Coder contrasts with a slowdown from Western labs such as OpenAI.