subscribe / issues / tags /

Company: "mlx"

not much happened today

kimi-k2.6 qwen-3.6-max-preview moonshot alibaba vllm openrouter cloudflare baseten mlx nous-research opencode ollama mixture-of-experts multimodality int4-quantization long-context agentic-coding multi-agent-systems model-orchestration memory-consolidation llm-driven-replanning dynamic-context-injection

Moonshot's Kimi K2.6 is a major open-weight 1T-parameter MoE model featuring 32B active parameters, 384 experts, MLA attention, 256K context window, native multimodality, and INT4 quantization. It supports day-0 integration with platforms like vLLM, OpenRouter, Cloudflare Workers AI, and others, showcasing state-of-the-art performance on benchmarks such as HLE w/ tools 54.0, SWE-Bench Pro 58.6, and Math Vision w/ python 93.2. The model excels in long-horizon execution with over 4,000 tool calls, 12+ hour continuous runs, and 300 parallel sub-agents. Meanwhile, Alibaba's Qwen3.6-Max-Preview previewed enhanced agentic coding, improved world knowledge, and instruction following, with notable performance on AIME 2026 #15 and ranking in Code Arena. Hermes Agent is rapidly expanding its ecosystem, surpassing 100K GitHub stars and integrating with tools like Ollama and Copilot CLI, while pioneering advanced multi-agent orchestration techniques such as stateless ephemeral units, LLM-driven replanning, and dynamic context injection. These developments highlight the competitive momentum of Chinese open and semi-open labs in coding and agent models.

not much happened today

minimax-m2.1 glm-4.7 gemini-3-pro claude-3-sonnet vl-jepa minimax-ai vllm-project exolabs mlx apple openai open-source mixture-of-experts local-inference quantization inference-quality multimodality non-autoregressive-models video-processing reinforcement-learning self-play agentic-rl parallel-computing model-deployment ylecun awnihannun alexocheema edwardsun0909 johannes_hage

MiniMax M2.1 launches as an open-source agent and coding Mixture-of-Experts (MoE) model with ~10B active / ~230B total parameters, claiming to outperform Gemini 3 Pro and Claude Sonnet 4.5, and supports local inference including on Apple Silicon M3 Ultra with quantization. GLM 4.7 demonstrates local scaling on Mac Studios with 2× 512GB M3 Ultra hardware, highlighting system-level challenges like bandwidth and parallelism. The concept of inference quality is emphasized as a key factor affecting output variance across deployments. Yann LeCun's VL-JEPA proposes a non-generative, non-autoregressive multimodal model operating in latent space for efficient real-time video processing with fewer parameters and decoding operations. Advances in agentic reinforcement learning for coding include self-play methods where agents inject and fix bugs autonomously, enabling self-improvement without human labeling, and large-scale RL infrastructure involving massive parallel code generation and execution sandboxes.

not much happened today

glm-4.5 glm-4.5-air qwen3-coder qwen3-235b kimi-k2 grok-imagine wan-2.2 smollm3 figure-01 figure-02 vitpose++ chatgpt zhipu-ai alibaba moonshot-ai x-ai figure openai runway mlx ollama deeplearningai model-releases model-performance moe image-generation video-generation pose-estimation robotics training-code-release interactive-learning in-context-learning yuchenj_uw corbtt reach_vb ollama deeplearningai gdb sama c_valenzuelab adcock_brett skalskip92 loubnabenallal1 hojonathanho ostrisai

Chinese AI labs have released powerful open-source models like GLM-4.5 and GLM-4.5-Air from Zhipu AI, Qwen3 Coder and Qwen3-235B from Alibaba, and Kimi K2 from Moonshot AI, highlighting a surge in permissively licensed models. Zhipu AI's GLM-4.5 is a 355B parameter MoE model competitive with Claude 4 Opus and Gemini 2.5 Pro. Alibaba's Qwen3 Coder shows strong code generation performance with a low edit failure rate, while Moonshot AI's Kimi K2 is a 1 trillion-parameter MoE model surpassing benchmarks like LiveCodeBench. In video and image generation, xAI launched Grok Imagine, and Wan2.2 impressed with innovative image-to-video generation. Robotics advances include Figure's Figure-01 and Figure-02 humanoid robots and ViTPose++ for pose estimation in basketball analysis. SmolLM3 training and evaluation code was fully released under Apache 2.0. OpenAI introduced Study Mode in ChatGPT to enhance interactive learning, and Runway rolled out Runway Aleph, a new in-context video model for multi-task visual generation. The community notes a competitive disadvantage for organizations avoiding these Chinese open-source models. "Orgs avoiding these models are at a significant competitive disadvantage," noted by @corbtt.

not much happened today

codex claude-4-opus claude-4-sonnet gemini-2.5-pro gemini-2.5 qwen-2.5-vl qwen-3 playdiffusion openai anthropic google perplexity-ai bing playai suno hugging-face langchain-ai qwen mlx assemblyai llamacloud fine-tuning model-benchmarking text-to-video agentic-ai retrieval-augmented-generation open-source-models speech-editing audio-processing text-to-speech ultra-low-latency multimodality public-notebooks sama gdb kevinweil lmarena_ai epochairesearch reach_vb wightmanr deeplearningai mervenoyann awnihannun jordirib1 aravsrinivas omarsar0 lioronai jerryjliu0 nerdai tonywu_71 _akhaliq clementdelangue _mfelfel

OpenAI rolled out Codex to ChatGPT Plus users with internet access and fine-grained controls, improving memory features for free users. Anthropic's Claude 4 Opus and Sonnet models lead coding benchmarks, while Google's Gemini 2.5 Pro and Flash models gain recognition with new audio capabilities. Qwen 2.5-VL and Qwen 3 quantizations are noted for versatility and support. Bing Video Creator launched globally enabling text-to-video generation, and Perplexity Labs sees increased demand for travel search. New agentic AI tools and RAG innovations include LlamaCloud and FedRAG. Open-source releases include Holo-1 for web navigation and PlayAI's PlayDiffusion for speech editing. Audio and multimodal advances feature Suno's music editing upgrades, Google's native TTS in 24+ languages, and Universal Streaming's ultra-low latency speech-to-text. Google NotebookLM now supports public notebooks. "Codex's internet access brings tradeoffs, with explicit warnings about risk" and "Gemini 2.5 Pro is cited as a daily driver by users".

© 2026 • AINews

You can also subscribe by rss .

Press Esc or click anywhere to close