Topic: "adaptive-position-encoding"

glm-4.7-flash glm-4.7 glm-4.5 qwen3-vl qwen meta-ai-fair carnegie-mellon sakana-ai zhipu-ai transformer-memory model-architecture mixture-of-experts adaptive-position-encoding long-context model-compression inference-optimization local-inference model-deployment benchmarking coding agentic-ai

AI News for 1/16/2026-1/19/2026 covers new architectures for scaling Transformer memory and context, including STEM from Carnegie Mellon and Meta AI, which replaces part of the FFN with a token-indexed embedding lookup enabling CPU offload and asynchronous prefetch. RePo from Sakana AI introduces adaptive positional reordering to improve robustness on noisy and long-range contexts. Model releases highlight Zhipu AI's GLM-4.7-Flash, a 30B-class MLA + small MoE model optimized for coding and agentic tasks, noted for strong benchmark performance and a compression narrative from larger to smaller models. Inference and deployment updates include mlx-lm 0.30.3 supporting GLM-4.7-Flash with efficient 4-bit performance on laptops. The report emphasizes practical takeaways on static sparsity, adaptive ordering, and the resurgence of small, fast models for interactive tasks. "Sparse capacity doesn’t have to mean MoE routers + expert parallelism; static sparsity can be systems-friendly."

You can also subscribe by rss .

Press Esc or click anywhere to close