All tags
Person: "eliebakouch"
not much happened today
gpt-5.5 gpt-5.4 opus-4.7 mimo-v2.5-pro mimo-v2.5 kimi-k2.6 codex copilot openai microsoft google amazon github xiaomi openai-devs vllm_project kimi-moonshot model-distribution cloud-computing benchmarking usage-based-billing model-orchestration open-source large-context-models agent-scaling coding model-training fp8 attention-mechanisms multi-agent-systems sama scaling01 kimmonismus ajassy simonw htihle arena gdb hangsiin eliebakouch _luofuli teortaxestex
OpenAI loosens its Azure exclusivity, allowing distribution across Google TPU, AWS Trainium, and Bedrock with commitments through 2032 and revenue share through 2030. GPT-5.5 shows improved benchmarks but is not uniformly dominant, ranking variably across coding, document, math, and vision tasks. GitHub's Copilot shifts to usage-based billing starting June 1, reflecting increased runtime costs. OpenAI open-sourced Symphony, an orchestration layer for issue tracking and Codex agents. Xiaomi released MiMo-V2.5 and MiMo-V2.5-Pro, large context models with up to 1M-token context and trillions of tokens trained, emphasizing complex agent and omni-modal capabilities. Kimi K2.6 leads OpenRouter's leaderboard, noted for coding and long-horizon agent capabilities with large-scale sub-agent coordination.
not much happened today
qwen3.6-27b qwen3.5-397b-a17b privacy-filter mimo-v2.5-pro mimo-v2.5 gemini-3.1-pro gemini-3.1-flash-image alibaba openai xiaomi google google-deepmind vllm_project unsloth ggml ollama arena nous-research open-models multimodality vision tokenization pii-detection privacy enterprise-ai agentic-ai benchmarking long-context model-deployment hardware-optimization model-integration software-engineering alibaba_qwen clementdelangue altryne eliebakouch mervenoyann xiaomimo sundarpichai scaling01
Alibaba released Qwen3.6-27B, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench. It supports native vision-language reasoning over images and video, with immediate ecosystem support from vLLM, Unsloth, ggml, and Ollama. OpenAI open-sourced a practical Privacy Filter model for PII detection and masking, a 1.5B parameter token-classification model with a 128k context window aimed at enterprise redaction tasks. Xiaomi announced MiMo-V2.5-Pro and MiMo-V2.5 models, emphasizing software engineering advances, long-horizon agents, and large context windows (up to 1M tokens), with strong benchmark results and integrations with Hermes and Nous. At Google Cloud Next, Google and Google DeepMind unveiled 8th-gen TPUs (TPU 8t for training and TPU 8i for inference) with claims of scaling to a million TPUs in a cluster, and launched the Gemini Enterprise Agent Platform evolving Vertex AI with Agent Studio and access to 200+ models including Gemini 3.1 Pro and Gemini 3.1 Flash Image. This marks a significant vertical integration of hardware, models, and enterprise tooling.
not much happened today
kimi-linear-48b codex gpt-5.4 claude-code moonshot openai assemblyai langchain attention-mechanisms model-architecture inference-speed agent-feedback agent-skills multi-agent-systems knowledge-transfer cli-tools coding-agents model-deployment kimi_moonshot elonmusk yuchenj_uw nathancgy4 eliebakouch tokenbender behrouz_ali cloneofsimo fidjissimo sama gdb andrewyng itsafiz simplifyinai
Moonshot's Attention Residuals paper introduced an input-dependent attention mechanism over prior layers with a 1.25x compute advantage and less than 2% inference latency overhead, validated on Kimi Linear 48B total / 3B active. The paper sparked debate on novelty versus prior art like DeepCrossAttention and Googleโs earlier work, highlighting tensions in idea novelty, citation quality, and frontier-scale validation. OpenAI's Codex showed strong momentum with over 2M weekly active users, nearly 4x growth YTD, and GPT-5.4 hitting 5T tokens/day and a $1B annualized run-rate. Codex added subagents supporting multi-agent coding workflows. Infrastructure for coding agents matured with tools like Context Hub / chub supporting agent feedback loops, AssemblyAI's skill for Claude Code and Codex, and automated skill extraction from GitHub repos yielding 40% knowledge-transfer gains. LangChain launched LangGraph CLI and open-sourced Deep Agents, recreating top coding agent workflows with planning, filesystem ops, shell access, and sub-agents.
not much happened today
qwen-image-2512 ax-k1 k-exaone sk-telecom lg upstage naver alibaba unsloth replicate mixture-of-experts model-release quantization open-source-models image-generation model-integration model-benchmarking compute-costs dataset-curation eliebakouch clementdelangue dorialexander rising_sayak _akhaliq ostrisai ivanfioravanti yupp_ai
South Korea's Ministry of Science launched a coordinated program with 5 companies to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like SK Telecom A.X-K1 (519B total / 33B active) and LG K-EXAONE (236B MoE / 23B active), with a total first-round budget of ~$140M. This initiative contrasts with EU approaches by focusing funding on fewer stakeholders and explicitly budgeting for data. Meanwhile, Alibaba's Qwen-Image-2512 emerges as a leading open-source image generation model, rapidly integrated into various toolchains including AI-Toolkit and local inference paths with quantization support, and hosted on platforms like Replicate. The model has undergone extensive blind testing with over 10,000 rounds on AI Arena, highlighting its ecosystem adoption.
not much happened today
glm-4.7 glm-4.6 minimax-m2.1 gemma-3 gemma-scope-2 google-deepmind valsai minimax-ai ollama trae alibaba sophont prime-intellect interpretability sparse-autoencoders agent-workflows model-benchmarking medical-evaluation multi-agent-systems model-performance model-optimization reinforcement-learning tool-use function-calling context-windows ivanfioravanti awnihannun deedydas cline omarsar0 adonis_singh eliebakouch teortaxestex ibragim_bad callum_mcdougall neelnanda5
GLM-4.7 and MiniMax M2.1 open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters and 200K context. Gemma Scope 2 from google-deepmind introduces sparse autoencoders and transcoders for interpretability across Gemma 3 models, aiming to provide shared infrastructure for safety and debugging. The Medmarks v0.1 open medical evaluation suite and leaderboard launch addresses the need for open medical benchmarking across 15+ environments, engaging clinicians and researchers.
not much happened today
glm-4.7 mimo-v2-flash z-image-turbo kling-2.6-motion-control zhipu-ai xiaomi google langchain huggingface openrouter artificial-analysis vllm-project coding complex-reasoning tool-use mixture-of-experts cost-efficiency open-weight-models text-to-image video-models memory-persistence agent-frameworks interactive-user-interfaces model-deployment mervenoyann eliebakouch omarsar0 osanseviero dair_ai
Zhipu AI's GLM-4.7 release marks a significant improvement in coding, complex reasoning, and tool use, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. Xiaomi's MiMo-V2-Flash is highlighted as a practical, cost-efficient mixture-of-experts model optimized for deployment. The open-weight text-to-image competition sees Z-Image Turbo leading with 6B parameters under Apache-2.0 license. Video model advances focus on control and long-form consistency, exemplified by Kling 2.6 Motion Control and research like MemFlow's adaptive memory retrieval. In agent frameworks, Google's A2UI protocol introduces agent-driven UI generation, while studies reveal that mixing multiple agent frameworks is common, with challenges in logic, termination, and tool interaction. LangChain emphasizes persistent memory patterns for production agents.
OpenAI GPT Image-1.5 claims to beat Nano Banana Pro, #1 across all Arenas, but completely fails Vibe Checks
gpt-image-1.5 nano-banana-pro mimo-v2-flash deepseek-v3.2 openai gemini xiaomi lmsys deepseek openrouter image-generation instruction-following benchmarking model-efficiency long-context multi-token-prediction hybrid-attention model-optimization inference-speed agentic-workflows model-architecture model-quantization fuli_luo eliebakouch
OpenAI released its new image model GPT Image 1.5, featuring precise image editing, better instruction following, improved text and markdown rendering, and faster generation up to 4ร. Despite topping multiple leaderboards like LMArena (1277), Design Arena (1344), and AA Arena (1272), user feedback from Twitter, Reddit, and Discord communities is largely negative compared to Nano Banana Pro by Gemini. Xiaomi introduced the MiMo-V2-Flash, a 309B MoE model optimized for inference efficiency with 256K context window, achieving state-of-the-art scores on SWE-Bench. The model uses Hybrid Sliding Window Attention and multi-token prediction, offering significant speedups and efficiency improvements. The timing of OpenAI's launch amid competition from Gemini and Nano Banana Pro affects user sentiment, highlighting challenges in benchmarking relevance.
MCP -> Agentic AI Foundation, Mistral Devstral 2
devstral-2 devstral-small-2 sonnet-4.3 deepseek-v3.2 qwen3-vl openai anthropic block mistral-ai alibaba linux-foundation deepseek agentic-ai coding-models reinforcement-learning model-performance model-optimization open-weights cli-tools multi-file-code-automation data-decontamination moe reward-models rl-stability guillaumelample b_roziere qtnx_ charliermarsh omarsar0 eliebakouch justinwaugh cwolferesearch pan
OpenAI Engineering sees a significant collaborative milestone with the launch of the Agentic AI Foundation under the Linux Foundation, uniting projects from Anthropic, OpenAI, and Block. Mistral released Devstral 2, a coding model with 123B parameters and open weights, offering a cost-effective alternative to Sonnet 4.3 and competitive performance against DeepSeek v3.2. The new Mistral Vibe CLI supports agentic coding workflows with rapid ecosystem integration. Alibaba introduced Soft Adaptive Policy Optimization (SAPO) for reinforcement learning tuning, improving stability and performance in Qwen3-VL across multiple tasks. Research highlights include the importance of data decontamination in RL and ongoing discussions on MoE RL stability and reward hacking mitigation.
not much happened today
glm-4.6v glm-4.6v-flash jina-vlm-2b hugging-face zhipu-ai jina-ai google-deepmind axiomprover fine-tuning multimodality model-optimization long-context mechanistic-interpretability formal-methods sequence-architectures reinforcement-learning lioronai akshay_pachaar _akhaliq ben_burtenshaw vllm_project prince_canuma zenmuxai eliebakouch theturingpost axiommathai neelnanda5 sarahookr
Claude Code Skills gains attention with a published talk and Hugging Face's new "skill" enabling one-line fine-tuning pipelines for models from ~0.5B to 70B parameters, supporting SFT, DPO, and GRPO, costing as low as ~$0.30 for small runs. Zhipu AI launches multimodal models GLM-4.6V (106B params MoE) and GLM-4.6V-Flash (9B dense), featuring 128k context and native multimodal function calling, with free Flash variant and API pricing detailed. Jina AI releases Jina-VLM (2B), a compact multilingual VLM excelling in diagrams and documents with top benchmark scores. At NeurIPS 2025, research highlights include Google's post-Transformer sequence architectures (Moneta, Yaad, Memora) showing up to 20% gains in long-context retrieval, AxiomProver's autonomous Lean system solving 9/12 Putnam 2025 problems rapidly, and mechanistic interpretability advances discussed by Chris Olah emphasizing scalable tooling.
Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch
kimi-k2-thinking gemini moonshot-ai google apple vllm_project arena baseten yupp_ai mixture-of-experts quantization int4 context-window agentic-ai benchmarking model-deployment inference-acceleration api performance-optimization eliebakouch nrehiew_ andrew_n_carr ofirpress artificialanlys sundarpichai akhaliq
Moonshot AI launched Kimi K2 Thinking, a 1 trillion parameter mixture-of-experts (MoE) model with 32 billion active experts, a 256K context window, and native INT4 quantization-aware training. It achieves state-of-the-art results on benchmarks like HLE (44.9%), BrowseComp (60.2%), and agentic tool use with 200-300 sequential tool calls. The model is deployed with vLLM support and OpenAI-compatible APIs, available on platforms like Arena, Baseten, and Yupp. Early user reports note some API instability under launch load. Meanwhile, Google announced the TPU v7 (Ironwood) with a 10ร peak performance improvement over TPU v5p, aimed at training and agentic inference for models like Gemini. Apple added support for M5 Neural Accelerators in llama.cpp for inference acceleration.
MiniMax M2 230BA10B โ 8% of Claude Sonnet's price, ~2x faster, new SOTA open model
minimax-m2 hailuo-ai huggingface baseten vllm modelscope openrouter cline sparse-moe model-benchmarking model-architecture instruction-following tool-use api-pricing model-deployment performance-evaluation full-attention qk-norm gqa rope reach_vb artificialanlys akhaliq eliebakouch grad62304977 yifan_zhang_ zpysky1125
MiniMax M2, an open-weight sparse MoE model by Hailuo AI, launches with โ200โ230B parameters and 10B active parameters, offering strong performance near frontier closed models and ranking #5 overall on the Artificial Analysis Intelligence Index v3.0. It supports coding and agent tasks, is licensed under MIT, and is available via API at competitive pricing. The architecture uses full attention, QK-Norm, GQA, partial RoPE, and sigmoid routing, with day-0 support in vLLM and deployment on platforms like Hugging Face and Baseten. Despite verbosity and no tech report, it marks a significant win for open models.
DeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100
deepseek-ocr deepseek3b-moe-a570m veo-3.1 deepseek-ai google-deepmind krea ocr vision multimodality model-compression long-context model-architecture video-generation autoregressive-models model-efficiency precision-editing karpathy teortaxestex reach_vb _akhaliq eliebakouch vikhyatk demishassabis
As ICCV 2025 begins, DeepSeek releases a novel DeepSeek-OCR 3B MoE vision-language model that compresses long text as visual context with high accuracy and efficiency, challenging traditional tokenization approaches. The model achieves ~97% decoding precision at <10ร compression and processes up to ~33M pages/day on 20 A100-40G nodes, outperforming benchmarks like GOT-OCR2.0. Discussions highlight the potential for unlimited context windows and tokenization-free inputs, with contributions from @karpathy, @teortaxesTex, and others. In video generation, google-deepmind's Veo 3.1 leads community benchmarks with advanced precision editing and scene blending, while Krea open-sources a 14B autoregressive video model enabling realtime long-form generation at ~11 FPS on a single B200 GPU.
not much happened today
gpt-5 grok-code-fast-1 claude-sonnet glm-4.5 longcat-flash-chat fastvlm mobileclip2 internvl3.5 openai x-ai zhipu-ai meituan apple model-architecture moe adaptive-compute inference-speed model-training cost-efficiency coding developer-tools open-inference on-device-ai vision gdb martin_casado yanndubs elonmusk cline vikhyatk dzhng quixiai tim_dettmers casper_hansen_ reach_vb eliebakouch teortaxestex youjiacheng
OpenAI integrates GPT-5 into Xcode 26 with improved coding latency, though some UX trade-offs are noted. xAI's Grok Code Fast 1 gains momentum, surpassing Claude Sonnet in usage and praised for fast debugging. Zhipu's GLM-4.5 offers a cost-effective coding plan with strong performance against Claude Sonnet 4. Meituan releases the LongCat-Flash-Chat, a 560B parameter MoE model with adaptive compute and detailed technical insights. Apple debuts on-device vision-language models FastVLM and MobileCLIP2 alongside InternVL3.5.
not much happened today
grok-2 grok-2.5 vibevoice-1.5b motif-2.6b gpt-5 qwen-code xai-org microsoft motif-technology alibaba huggingface langchain-ai mixture-of-experts model-scaling model-architecture text-to-speech fine-tuning training-data optimization reinforcement-learning agentic-ai tool-use model-training model-release api software-development model-quantization elonmusk clementdelangue rasbt quanquangu akhaliq eliebakouch gdb ericmitchellai ivanfioravanti deanwball giffmana omarsar0 corbtt
xAI released open weights for Grok-2 and Grok-2.5 with a novel MoE residual architecture and ฮผP scaling, sparking community excitement and licensing concerns. Microsoft open-sourced VibeVoice-1.5B, a multi-speaker long-form TTS model with streaming support and a 7B variant forthcoming. Motif Technology published a detailed report on Motif-2.6B, highlighting Differential Attention, PolyNorm, and extensive finetuning, trained on AMD MI250 GPUs. In coding tools, momentum builds around GPT-5-backed workflows, with developers favoring it over Claude Code. Alibaba released Qwen-Code v0.0.8 with deep VS Code integration and MCP CLI enhancements. The MCP ecosystem advances with LiveMCP-101 stress tests, the universal MCP server "Rube," and LangGraph Platform's rollout of revision queueing and ART integration for RL training of agents.
Mary Meeker is so back: BOND Capital AI Trends report
qwen-3-8b anthropic hugging-face deepseek attention-mechanisms inference arithmetic-intensity transformers model-optimization interpretability model-quantization training tri_dao fleetwood___ teortaxestex awnihannun lateinteraction neelnanda5 eliebakouch _akhaliq
Mary Meeker returns with a comprehensive 340-slide report on the state of AI, highlighting accelerating tech cycles, compute growth, and comparisons of ChatGPT to early Google and other iconic tech products. The report also covers enterprise traction and valuation of major AI companies. On Twitter, @tri_dao discusses an "ideal" inference architecture featuring attention variants like GTA, GLA, and DeepSeek MLA with high arithmetic intensity (~256), improving efficiency and model quality. Other highlights include the release of 4-bit DWQ of DSR1 Qwen3 8B on Hugging Face, AnthropicAI's open-source interpretability tools for LLMs, and discussions on transformer training and abstractions by various researchers.
The Ultra-Scale Playbook: Training LLMs on GPU Clusters
deepseek-native-sparse-attention r1-1776 paligemma-2-mix muse baichuan-m1-14b stripedhyena-2 huggingface deepseek perplexity-ai google-deepmind microsoft baichuan stripedhyena gpu-training scaling multimodality vision model-training foundation-models medical-llm genome-modeling robotic-manipulation interactive-content eliebakouch nouamanetazi lvwerra thom-wolf proftomyeh alex-wang aravsrinivas _akhaliq _philschmid mervenoyann reach_vb arankomatsuzaki maximelabonne
Huggingface released "The Ultra-Scale Playbook: Training LLMs on GPU Clusters," an interactive blogpost based on 4000 scaling experiments on up to 512 GPUs, providing detailed insights into modern GPU training strategies. DeepSeek introduced the Native Sparse Attention (NSA) model, gaining significant community attention, while Perplexity AI launched R1-1776, an uncensored and unbiased version of DeepSeek's R1 model. Google DeepMind unveiled PaliGemma 2 Mix, a multi-task vision-language model available in 3B, 10B, and 28B sizes. Microsoft introduced Muse, a generative AI model trained on the game Bleeding Edge, and presented Magma, a foundation model for multimodal AI agents excelling in UI navigation and robotic manipulation. Baichuan-M1-14B was announced as a state-of-the-art medical LLM trained on 20T tokens, and a fully open-source 40B genome modeling model using StripedHyena 2 architecture was also released. "Making your own gaming experience is coming sooner than you'd think," noted in relation to Muse.