All tags
Person: "eliebakouch"
not much happened today
qwen-image-2512 ax-k1 k-exaone sk-telecom lg upstage naver alibaba unsloth replicate mixture-of-experts model-release quantization open-source-models image-generation model-integration model-benchmarking compute-costs dataset-curation eliebakouch clementdelangue dorialexander rising_sayak _akhaliq ostrisai ivanfioravanti yupp_ai
South Korea's Ministry of Science launched a coordinated program with 5 companies to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like SK Telecom A.X-K1 (519B total / 33B active) and LG K-EXAONE (236B MoE / 23B active), with a total first-round budget of ~$140M. This initiative contrasts with EU approaches by focusing funding on fewer stakeholders and explicitly budgeting for data. Meanwhile, Alibaba's Qwen-Image-2512 emerges as a leading open-source image generation model, rapidly integrated into various toolchains including AI-Toolkit and local inference paths with quantization support, and hosted on platforms like Replicate. The model has undergone extensive blind testing with over 10,000 rounds on AI Arena, highlighting its ecosystem adoption.
not much happened today
glm-4.7 glm-4.6 minimax-m2.1 gemma-3 gemma-scope-2 google-deepmind valsai minimax-ai ollama trae alibaba sophont prime-intellect interpretability sparse-autoencoders agent-workflows model-benchmarking medical-evaluation multi-agent-systems model-performance model-optimization reinforcement-learning tool-use function-calling context-windows ivanfioravanti awnihannun deedydas cline omarsar0 adonis_singh eliebakouch teortaxestex ibragim_bad callum_mcdougall neelnanda5
GLM-4.7 and MiniMax M2.1 open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters and 200K context. Gemma Scope 2 from google-deepmind introduces sparse autoencoders and transcoders for interpretability across Gemma 3 models, aiming to provide shared infrastructure for safety and debugging. The Medmarks v0.1 open medical evaluation suite and leaderboard launch addresses the need for open medical benchmarking across 15+ environments, engaging clinicians and researchers.
not much happened today
glm-4.7 mimo-v2-flash z-image-turbo kling-2.6-motion-control zhipu-ai xiaomi google langchain huggingface openrouter artificial-analysis vllm-project coding complex-reasoning tool-use mixture-of-experts cost-efficiency open-weight-models text-to-image video-models memory-persistence agent-frameworks interactive-user-interfaces model-deployment mervenoyann eliebakouch omarsar0 osanseviero dair_ai
Zhipu AI's GLM-4.7 release marks a significant improvement in coding, complex reasoning, and tool use, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. Xiaomi's MiMo-V2-Flash is highlighted as a practical, cost-efficient mixture-of-experts model optimized for deployment. The open-weight text-to-image competition sees Z-Image Turbo leading with 6B parameters under Apache-2.0 license. Video model advances focus on control and long-form consistency, exemplified by Kling 2.6 Motion Control and research like MemFlow's adaptive memory retrieval. In agent frameworks, Google's A2UI protocol introduces agent-driven UI generation, while studies reveal that mixing multiple agent frameworks is common, with challenges in logic, termination, and tool interaction. LangChain emphasizes persistent memory patterns for production agents.
OpenAI GPT Image-1.5 claims to beat Nano Banana Pro, #1 across all Arenas, but completely fails Vibe Checks
gpt-image-1.5 nano-banana-pro mimo-v2-flash deepseek-v3.2 openai gemini xiaomi lmsys deepseek openrouter image-generation instruction-following benchmarking model-efficiency long-context multi-token-prediction hybrid-attention model-optimization inference-speed agentic-workflows model-architecture model-quantization fuli_luo eliebakouch
OpenAI released its new image model GPT Image 1.5, featuring precise image editing, better instruction following, improved text and markdown rendering, and faster generation up to 4×. Despite topping multiple leaderboards like LMArena (1277), Design Arena (1344), and AA Arena (1272), user feedback from Twitter, Reddit, and Discord communities is largely negative compared to Nano Banana Pro by Gemini. Xiaomi introduced the MiMo-V2-Flash, a 309B MoE model optimized for inference efficiency with 256K context window, achieving state-of-the-art scores on SWE-Bench. The model uses Hybrid Sliding Window Attention and multi-token prediction, offering significant speedups and efficiency improvements. The timing of OpenAI's launch amid competition from Gemini and Nano Banana Pro affects user sentiment, highlighting challenges in benchmarking relevance.
MCP -> Agentic AI Foundation, Mistral Devstral 2
devstral-2 devstral-small-2 sonnet-4.3 deepseek-v3.2 qwen3-vl openai anthropic block mistral-ai alibaba linux-foundation deepseek agentic-ai coding-models reinforcement-learning model-performance model-optimization open-weights cli-tools multi-file-code-automation data-decontamination moe reward-models rl-stability guillaumelample b_roziere qtnx_ charliermarsh omarsar0 eliebakouch justinwaugh cwolferesearch pan
OpenAI Engineering sees a significant collaborative milestone with the launch of the Agentic AI Foundation under the Linux Foundation, uniting projects from Anthropic, OpenAI, and Block. Mistral released Devstral 2, a coding model with 123B parameters and open weights, offering a cost-effective alternative to Sonnet 4.3 and competitive performance against DeepSeek v3.2. The new Mistral Vibe CLI supports agentic coding workflows with rapid ecosystem integration. Alibaba introduced Soft Adaptive Policy Optimization (SAPO) for reinforcement learning tuning, improving stability and performance in Qwen3-VL across multiple tasks. Research highlights include the importance of data decontamination in RL and ongoing discussions on MoE RL stability and reward hacking mitigation.
not much happened today
glm-4.6v glm-4.6v-flash jina-vlm-2b hugging-face zhipu-ai jina-ai google-deepmind axiomprover fine-tuning multimodality model-optimization long-context mechanistic-interpretability formal-methods sequence-architectures reinforcement-learning lioronai akshay_pachaar _akhaliq ben_burtenshaw vllm_project prince_canuma zenmuxai eliebakouch theturingpost axiommathai neelnanda5 sarahookr
Claude Code Skills gains attention with a published talk and Hugging Face's new "skill" enabling one-line fine-tuning pipelines for models from ~0.5B to 70B parameters, supporting SFT, DPO, and GRPO, costing as low as ~$0.30 for small runs. Zhipu AI launches multimodal models GLM-4.6V (106B params MoE) and GLM-4.6V-Flash (9B dense), featuring 128k context and native multimodal function calling, with free Flash variant and API pricing detailed. Jina AI releases Jina-VLM (2B), a compact multilingual VLM excelling in diagrams and documents with top benchmark scores. At NeurIPS 2025, research highlights include Google's post-Transformer sequence architectures (Moneta, Yaad, Memora) showing up to 20% gains in long-context retrieval, AxiomProver's autonomous Lean system solving 9/12 Putnam 2025 problems rapidly, and mechanistic interpretability advances discussed by Chris Olah emphasizing scalable tooling.
Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch
kimi-k2-thinking gemini moonshot-ai google apple vllm_project arena baseten yupp_ai mixture-of-experts quantization int4 context-window agentic-ai benchmarking model-deployment inference-acceleration api performance-optimization eliebakouch nrehiew_ andrew_n_carr ofirpress artificialanlys sundarpichai akhaliq
Moonshot AI launched Kimi K2 Thinking, a 1 trillion parameter mixture-of-experts (MoE) model with 32 billion active experts, a 256K context window, and native INT4 quantization-aware training. It achieves state-of-the-art results on benchmarks like HLE (44.9%), BrowseComp (60.2%), and agentic tool use with 200-300 sequential tool calls. The model is deployed with vLLM support and OpenAI-compatible APIs, available on platforms like Arena, Baseten, and Yupp. Early user reports note some API instability under launch load. Meanwhile, Google announced the TPU v7 (Ironwood) with a 10× peak performance improvement over TPU v5p, aimed at training and agentic inference for models like Gemini. Apple added support for M5 Neural Accelerators in llama.cpp for inference acceleration.
MiniMax M2 230BA10B — 8% of Claude Sonnet's price, ~2x faster, new SOTA open model
minimax-m2 hailuo-ai huggingface baseten vllm modelscope openrouter cline sparse-moe model-benchmarking model-architecture instruction-following tool-use api-pricing model-deployment performance-evaluation full-attention qk-norm gqa rope reach_vb artificialanlys akhaliq eliebakouch grad62304977 yifan_zhang_ zpysky1125
MiniMax M2, an open-weight sparse MoE model by Hailuo AI, launches with ≈200–230B parameters and 10B active parameters, offering strong performance near frontier closed models and ranking #5 overall on the Artificial Analysis Intelligence Index v3.0. It supports coding and agent tasks, is licensed under MIT, and is available via API at competitive pricing. The architecture uses full attention, QK-Norm, GQA, partial RoPE, and sigmoid routing, with day-0 support in vLLM and deployment on platforms like Hugging Face and Baseten. Despite verbosity and no tech report, it marks a significant win for open models.
DeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100
deepseek-ocr deepseek3b-moe-a570m veo-3.1 deepseek-ai google-deepmind krea ocr vision multimodality model-compression long-context model-architecture video-generation autoregressive-models model-efficiency precision-editing karpathy teortaxestex reach_vb _akhaliq eliebakouch vikhyatk demishassabis
As ICCV 2025 begins, DeepSeek releases a novel DeepSeek-OCR 3B MoE vision-language model that compresses long text as visual context with high accuracy and efficiency, challenging traditional tokenization approaches. The model achieves ~97% decoding precision at <10× compression and processes up to ~33M pages/day on 20 A100-40G nodes, outperforming benchmarks like GOT-OCR2.0. Discussions highlight the potential for unlimited context windows and tokenization-free inputs, with contributions from @karpathy, @teortaxesTex, and others. In video generation, google-deepmind's Veo 3.1 leads community benchmarks with advanced precision editing and scene blending, while Krea open-sources a 14B autoregressive video model enabling realtime long-form generation at ~11 FPS on a single B200 GPU.
not much happened today
gpt-5 grok-code-fast-1 claude-sonnet glm-4.5 longcat-flash-chat fastvlm mobileclip2 internvl3.5 openai x-ai zhipu-ai meituan apple model-architecture moe adaptive-compute inference-speed model-training cost-efficiency coding developer-tools open-inference on-device-ai vision gdb martin_casado yanndubs elonmusk cline vikhyatk dzhng quixiai tim_dettmers casper_hansen_ reach_vb eliebakouch teortaxestex youjiacheng
OpenAI integrates GPT-5 into Xcode 26 with improved coding latency, though some UX trade-offs are noted. xAI's Grok Code Fast 1 gains momentum, surpassing Claude Sonnet in usage and praised for fast debugging. Zhipu's GLM-4.5 offers a cost-effective coding plan with strong performance against Claude Sonnet 4. Meituan releases the LongCat-Flash-Chat, a 560B parameter MoE model with adaptive compute and detailed technical insights. Apple debuts on-device vision-language models FastVLM and MobileCLIP2 alongside InternVL3.5.
not much happened today
grok-2 grok-2.5 vibevoice-1.5b motif-2.6b gpt-5 qwen-code xai-org microsoft motif-technology alibaba huggingface langchain-ai mixture-of-experts model-scaling model-architecture text-to-speech fine-tuning training-data optimization reinforcement-learning agentic-ai tool-use model-training model-release api software-development model-quantization elonmusk clementdelangue rasbt quanquangu akhaliq eliebakouch gdb ericmitchellai ivanfioravanti deanwball giffmana omarsar0 corbtt
xAI released open weights for Grok-2 and Grok-2.5 with a novel MoE residual architecture and μP scaling, sparking community excitement and licensing concerns. Microsoft open-sourced VibeVoice-1.5B, a multi-speaker long-form TTS model with streaming support and a 7B variant forthcoming. Motif Technology published a detailed report on Motif-2.6B, highlighting Differential Attention, PolyNorm, and extensive finetuning, trained on AMD MI250 GPUs. In coding tools, momentum builds around GPT-5-backed workflows, with developers favoring it over Claude Code. Alibaba released Qwen-Code v0.0.8 with deep VS Code integration and MCP CLI enhancements. The MCP ecosystem advances with LiveMCP-101 stress tests, the universal MCP server "Rube," and LangGraph Platform's rollout of revision queueing and ART integration for RL training of agents.
Mary Meeker is so back: BOND Capital AI Trends report
qwen-3-8b anthropic hugging-face deepseek attention-mechanisms inference arithmetic-intensity transformers model-optimization interpretability model-quantization training tri_dao fleetwood___ teortaxestex awnihannun lateinteraction neelnanda5 eliebakouch _akhaliq
Mary Meeker returns with a comprehensive 340-slide report on the state of AI, highlighting accelerating tech cycles, compute growth, and comparisons of ChatGPT to early Google and other iconic tech products. The report also covers enterprise traction and valuation of major AI companies. On Twitter, @tri_dao discusses an "ideal" inference architecture featuring attention variants like GTA, GLA, and DeepSeek MLA with high arithmetic intensity (~256), improving efficiency and model quality. Other highlights include the release of 4-bit DWQ of DSR1 Qwen3 8B on Hugging Face, AnthropicAI's open-source interpretability tools for LLMs, and discussions on transformer training and abstractions by various researchers.
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
deepseek-native-sparse-attention r1-1776 paligemma-2-mix muse baichuan-m1-14b stripedhyena-2 huggingface deepseek perplexity-ai google-deepmind microsoft baichuan stripedhyena gpu-training scaling multimodality vision model-training foundation-models medical-llm genome-modeling robotic-manipulation interactive-content eliebakouch nouamanetazi lvwerra thom-wolf proftomyeh alex-wang aravsrinivas _akhaliq _philschmid mervenoyann reach_vb arankomatsuzaki maximelabonne
Huggingface released "The Ultra-Scale Playbook: Training LLMs on GPU Clusters," an interactive blogpost based on 4000 scaling experiments on up to 512 GPUs, providing detailed insights into modern GPU training strategies. DeepSeek introduced the Native Sparse Attention (NSA) model, gaining significant community attention, while Perplexity AI launched R1-1776, an uncensored and unbiased version of DeepSeek's R1 model. Google DeepMind unveiled PaliGemma 2 Mix, a multi-task vision-language model available in 3B, 10B, and 28B sizes. Microsoft introduced Muse, a generative AI model trained on the game Bleeding Edge, and presented Magma, a foundation model for multimodal AI agents excelling in UI navigation and robotic manipulation. Baichuan-M1-14B was announced as a state-of-the-art medical LLM trained on 20T tokens, and a fully open-source 40B genome modeling model using StripedHyena 2 architecture was also released. "Making your own gaming experience is coming sooner than you'd think," noted in relation to Muse.