Company: "ibm"

kimi-k2 qwen3-next nemotron-nano-2 granite-4.0 gpt-4.5 copilot codex vllm perplexity-ai ibm anthropic graphiti claude cursor-ai microsoft mixture-of-experts model-integration cloud-computing hybrid-models benchmarking agent-systems memory-persistence semantic-search code-retrieval context-length-optimization tool-use evaluation-frameworks software-development scaling01 cedric_chee aravsrinivas omarsar0 _avichawla pierceboggan jo_parkhurst jyangballin ofirpress ml_angelopoulos

Kimi-K2 Reasoner has been integrated into vLLM and will soon be supported by SGLang, featuring a massive 1.2 trillion parameter MoE configuration. Perplexity AI released research on cloud-portable trillion-parameter MoE kernels optimized for AWS EFA, with potential integration into vLLM. IBM's vLLM team formalized hybrid dense and sparse expert models, supporting models like Qwen3-Next, Nemotron Nano 2, and Granite 4.0. Kimi-K2 reportedly scores 77% on GPQA Diamond, outperforming GPT-4.5 at 71.4%, though this is unverified. Anthropic published a guide on efficient tool-heavy agent systems using MCP patterns, drastically reducing context tokens by ~98.7%. Graphiti MCP demonstrated shared memory across apps like Claude Desktop and Cursor for persistent agent memory. VS Code introduced an "Agent sessions" feature to unify agent management, including Copilot and Codex. Cursor AI improved coding accuracy via semantic search and code retrieval embeddings. New evaluation frameworks like CodeClash and LMArena assess agent and coding model performance in realistic multi-round tasks and occupation-tagged leaderboards.

Oct 02

not much happened today

kling-2.5-turbo sora-2 gemini-2.5-flash granite-4.0 qwen-3 qwen-image-2509 qwen3-vl-235b openai google ibm alibaba kling_ai synthesia ollama huggingface arena artificialanalysis tinker scaling01 video-generation instruction-following physics-simulation image-generation model-architecture mixture-of-experts context-windows token-efficiency fine-tuning lora cpu-training model-benchmarking api workflow-automation artificialanlys kling_ai altryne teortaxestex fofrai tim_dettmers sundarpichai officiallogank andrew_n_carr googleaidevs clementdelangue wzhao_nlp alibaba_qwen scaling01 ollama

Kling 2.5 Turbo leads in text-to-video and image-to-video generation with competitive pricing. OpenAI Sora 2 shows strong instruction-following but has physics inconsistencies. Google Gemini 2.5 Flash "Nano Banana" image generation is now generally available with multi-image blending and flexible aspect ratios. IBM Granite 4.0 introduces a hybrid Mamba/Transformer architecture with large context windows and strong token efficiency, outperforming some peers on the Intelligence Index. Qwen models receive updates including fine-tuning API support and improved vision capabilities. Tinker offers a flexible fine-tuning API supporting LoRA sharing and CPU-only training loops. The ecosystem also sees updates like Synthesia 3.0 adding video agents.

Sep 19

Grok 4 Fast: Xai's distilled, 40% more token efficient, 2m context, 344 tok/s frontier model

grok-4-fast magistral-1.2 moondream-3 granite-docling-258m sail-vl2 xai meta-ai-fair mistral-ai ibm bytedance efficiency reasoning vision multimodality model-optimization model-deployment vision-encoders model-architecture model-training nearcyan aidangomez _akhaliq vikhyatk rohanpaul_ai

xAI announced Grok 4 Fast, a highly efficient model running at 344 tokens/second, offering reasoning and nonreasoning modes and free trials on major platforms. Meta showcased its neural band and Ray-Ban Display with a live demo that experienced hiccups but sparked discussion on live hardware demos and integration challenges. Meta is also developing a first-party "Horizon Engine" for AI rendering and released Quest-native Gaussian Splatting capture tech. New model releases include Mistral's Magistral 1.2, a compact multimodal vision-language model with improved benchmarks and local deployment; Moondream 3, a 9B-parameter MoE VLM focused on efficient visual reasoning; IBM's Granite-Docling-258M, a document VLM for layout-faithful PDF to HTML/Markdown conversion; and ByteDance's SAIL-VL2, a vision-language foundation model excelling at multimodal understanding and reasoning at 2B and 8B parameter scales.

Aug 18

not much happened today

gemma-3-270m canary-1b parakeet-tdt-0.6b nemotron-nano-v2 qwen-image-edit dino-v3 nvidia alibaba tencent meta-ai-fair ibm datology synthetic-data multilingual-asr self-supervised-learning vision model-efficiency training-data data-augmentation model-speedup domain-transfer demishassabis adrgrondin rasbt reach_vb ctnzr clementdelangue natolambert _akhaliq itspaulai mervenoyann xenovacom tomaarsen pratyushmaini code_star leavittron k_schuerholt giffmana

Gemma 3 270M, an ultra-small model optimized for edge and mobile use, was released and is gaining adoption. NVIDIA launched two open multilingual ASR models, Canary 1B and Parakeet-TDT 0.6B, trained on 1 million hours of data with CC-BY licensing, plus the efficient Nemotron-Nano v2 9B model with significant speedups. Alibaba's Qwen-Image-Edit offers bilingual text editing and semantic image transformations. Tencent Hunyuan introduced a controllable game-world video generator trained on over 1 million gameplay recordings. Meta's DINOv3 presents a scalable self-supervised vision backbone with strong domain transfer capabilities. IBM quietly released efficient English embedding models under a commercial-friendly license. The BeyondWeb synthetic data paper shows significant training speed and performance gains over prior datasets. Analysis of HRM architecture suggests performance improvements largely stem from data augmentation and scaffolding rather than novel architecture. "Models and datasets are openly licensed and available on Hugging Face."

Feb 07

s1: Simple test-time scaling (and Kyutai Hibiki)

qwen-2.5-32b gemini-2.0-flash smollm2 granite-vision-3.1-2b google-deepmind qwen gemini hugging-face ibm deepseek reasoning fine-tuning scaling-laws open-source-models data-centric-training vision multilingual-models language-model-reasoning niklas-muennighoff

"Wait" is all you need introduces a novel reasoning model finetuned from Qwen 2.5 32B using just 1000 questions with reasoning traces distilled from Gemini 2.0 Flash Thinking, enabling controllable test-time compute by appending "Wait" to extend reasoning. Lead author Niklas Muennighoff, known for work on Bloom, StarCoder, and BIG-bench, highlights this method's efficiency and its reproduction of the famous o1 scaling chart. Additionally, Kyutai Moshi's Hibiki project demonstrates impressive offline French-English live translation on iPhone. Recent AI model releases include DeepSeek R1 and R3 open source models, potentially marking a major open-source milestone, Hugging Face's SmolLM2 emphasizing data-centric training for small LMs, and IBM's Granite-Vision-3.1-2B, a small vision-language model with strong performance. Key research papers spotlight LIMO for minimal demonstration reasoning achieving high accuracy on AIME and MATH benchmarks, and Token-Assisted Reasoning mixing latent and text tokens to improve language model reasoning.