Company: "vercel"

cosmos-3 nemotron-3-ultra minimax-m3 nvidia runway novita vercel cloudflare openclaude flowith omnimodal-models mixture-of-experts autoregressive-models diffusion-models structured-prompts fine-tuning open-weight-models multimodality agent-models benchmarking model-serving context-windows token-efficiency kimmonismus clementdelangue artificialanalysis scaling01 ctnzr caspar_br eliebakouch pbdtokenrouter rauchg gitlawb notjazii lostinlatencyx zhihufrontier

NVIDIA led open-source AI model releases with Cosmos 3, a comprehensive omnimodal world model unifying language, image, video, audio, and action using a Mixture-of-Transformers design, and Nemotron 3 Ultra, a 550B parameter open-weight model noted for high serving speed and strong evaluation performance. The Cosmos Coalition was launched to foster an open ecosystem for physical AI world models. Meanwhile, MiniMax M3 debuted as a multimodal agent/coding model with 1M context and strong benchmark scores, gaining rapid ecosystem support from vendors like Novita and Vercel AI Gateway. However, MiniMax M3 showed some inefficiencies such as high token consumption and verbose self-check loops. These developments highlight advances in open physical AI, multimodality, and agent models with significant community and infrastructure engagement.

Apr 15

not much happened today

OpenAI expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes, fostering a new ecosystem with integrations from Cloudflare, Modal, Vercel, and others. Cloudflare launched Project Think, a next-gen Agents SDK with durable execution and sandboxed code, alongside Agent Lee, a prompt-driven UI agent using sandboxed TypeScript, and introduced real-time voice pipelines and browser automation tools. Hermes Agent focuses on persistent skill formation by learning from completed workflows, positioning itself as a professional agent distinct from GUI-first assistants like OpenClaw. "Hermes autonomously backfills tracking data, updates cron jobs, and saves workflows as reusable skills," highlighting its advanced workflow management capabilities.

Mar 18

MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model

minimax-m2.7 sonnet-4.6 glm-5 mimo-v2-pro mamba-3 qwen-3.5 kimi-k2.5 gpt-5.4-mini minimax xiaomi artificial-analysis ollama trae yupp openrouter vercel zo opencode kilocode cartesia self-evolving-agents reasoning cost-efficiency token-efficiency hybrid-architecture harness-engineering agent-harnesses skills memory-optimization architecture feedback-loops api inference execution-environment

MiniMax M2.7 is the headline model release, described as a "self-evolving agent" with strong performance metrics including 56.22% on SWE-Pro, 57.0% on Terminal Bench 2, and parity with Sonnet 4.6. It features recursive self-improvement in skills, memory, and architecture. Artificial Analysis places M2.7 on the cost/performance frontier with an Intelligence Index score of 50, matching GLM-5 (Reasoning) but at a fraction of the cost. Distribution is available via platforms like Ollama cloud and OpenRouter. Xiaomi’s MiMo-V2-Pro is noted as a serious Chinese API-only reasoning model with a score of 49 on the Intelligence Index and favorable token efficiency. Cartesia’s Mamba-3 is highlighted as an SSM optimized for inference-heavy use, with early reactions focusing on hybrid transformer architectures like Qwen3.5 and Kimi Linear. The report emphasizes a shift from prompting to harness engineering, where the execution environment and agent harnesses, including skills and MCP, are becoming key differentiators in AI system design. This includes discussions on tools, repo legibility, constraints, and feedback loops, with mentions of DSPy and GPT-5.4 mini as important components in this evolving landscape.

Feb 11

Z.ai GLM-5: New SOTA Open Weights LLM

glm-5 glm-4.5 kimi-k2.5 zhipu-ai openrouter modal deepinfra ollama qoder vercel deepseek-sparse-attention long-context model-scaling pretraining benchmarking office-productivity context-window model-deployment cost-efficiency

Zhipu AI launched GLM-5, an Opus-class model scaling from 355B to 744B parameters with DeepSeek Sparse Attention integration for cost-efficient long-context serving. GLM-5 achieves SOTA on BrowseComp and leads on Vending Bench 2, focusing on office productivity tasks and surpassing Kimi K2.5 on the GDPVal-AA benchmark. Despite broad availability on platforms like OpenRouter, Modal, DeepInfra, and Ollama Cloud, GLM-5 faces compute constraints impacting rollout and pricing. The model supports up to 200K context length and 128K max output tokens.

Sep 09, 2025

not much happened today

gpt-5 kimi-k2-0905 glm-4.5 qwen3-asr opus-4.1 cognition founders-fund lux-capital 8vc neo vercel claude groq alibaba huggingface meta-ai-fair google theturingpost algoperf coding-agents agent-architecture open-source model-evaluation multilingual-models speech-recognition model-optimization kv-cache quantization algorithmic-benchmarking video-generation context-windows swyx tim_dettmers

Cognition raised $400M at a $10.2B valuation to advance AI coding agents, with swyx joining to support the "Decade of Agents" thesis. Vercel launched an OSS "vibe coding platform" using a tuned GPT-5 agent loop. Claude Code emphasizes minimalism in agent loops for reliability. Kimi K2-0905 achieved 94% on coding evals and improved agentic capabilities with doubled context length. Alibaba released Qwen3-ASR, a multilingual transcription model with <8% WER. Meta introduced Set Block Decoding for 3-5× faster decoding without architectural changes. Innovations in KV cache compression and quantization include AutoRound, QuTLASS v0.1.0, and AlgoPerf v0.6. Google's Veo 3 video generation API went GA with significant price cuts and vertical video support.

Sep 08, 2025

Cognition's $10b Series C; Smol AI updates

kimi-k2-0905 qwen3-asr gpt-5 cognition vercel meta-ai-fair alibaba groq huggingface coding-agents agent-development open-source model-evaluation multilingual-models inference-optimization kv-cache-compression quantization algorithmic-benchmarking context-length model-performance swyx

Cognition raised $400M at a $10.2B valuation to advance AI coding agents, with swyx joining the company. Vercel launched an OSS coding platform using a tuned GPT-5 agent loop. The Kimi K2-0905 model achieved top coding eval scores and improved agentic capabilities with doubled context length. Alibaba released Qwen3-ASR, a multilingual transcription model with robust noise handling. Meta introduced Set Block Decoding for 3-5× faster decoding without architectural changes. Innovations in KV cache compression and quantization were highlighted, including AutoRound in SGLang and QuTLASS v0.1.0 for Blackwell GPUs. Algorithmic benchmarking tools like AlgoPerf v0.6 were updated for efficiency.

Jun 25, 2025

Context Engineering: Much More than Prompts

gemini-code openai langchain cognition google-deepmind vercel cloudflare openrouter context-engineering retrieval-augmented-generation tools state-management history-management prompt-engineering software-layer chatgpt-connectors api-integration karpathy walden_yan tobi_lutke hwchase17 rlancemartin kwindla dex_horthy

Context Engineering emerges as a significant trend in AI, highlighted by experts like Andrej Karpathy, Walden Yan from Cognition, and Tobi Lutke. It involves managing an LLM's context window with the right mix of prompts, retrieval, tools, and state to optimize performance, going beyond traditional prompt engineering. LangChain and its tool LangGraph are noted for advancing this approach. Additionally, OpenAI has launched ChatGPT connectors for platforms like Google Drive, Dropbox, SharePoint, and Box, enhancing context integration for Pro users. Other notable news includes the launch of Vercel Sandbox, Cloudflare Containers, the leak and release of Gemini Code by Google DeepMind, and fundraising efforts by OpenRouter.

Jan 29, 2025

not much happened today

deepseek-r1 qwen-2.5 qwen-2.5-max deepseek-v3 deepseek-janus-pro gpt-4 nvidia anthropic openai deepseek huawei vercel bespoke-labs model-merging multimodality reinforcement-learning chain-of-thought gpu-optimization compute-infrastructure compression crypto-api image-generation saranormous zizhpan victormustar omarsar0 markchen90 sakanaailabs reach_vb madiator dain_mclau francoisfleuret garygodchaux arankomatsuzaki id_aa_carmack lavanyasant virattt

Huawei chips are highlighted in a diverse AI news roundup covering NVIDIA's stock rebound, new open music foundation models like Local Suno, and competitive AI models such as Qwen 2.5 Max and Deepseek V3. The release of DeepSeek Janus Pro, a multimodal LLM with image generation capabilities, and advancements in reinforcement learning and chain-of-thought reasoning are noted. Discussions include GPU rebranding with NVIDIA's H6400 GPUs, data center innovations, and enterprise AI applications like crypto APIs in hedge funds. "Deepseek R1's capabilities" and "Qwen 2.5 models added to applications" are key highlights.

Jan 04, 2025

not much happened today

prime gpt-4o qwen-32b olmo openai qwen cerebras-systems langchain vercel swaggo gin echo reasoning chain-of-thought math coding optimization performance image-processing software-development agent-frameworks version-control security robotics hardware-optimization medical-ai financial-ai architecture akhaliq jason-wei vikhyatk awnihannun arohan tom-doerr hendrikbgr jerryjliu0 adcock-brett shuchaobi stasbekman reach-vb virattt andrew-n-carr

Olmo 2 released a detailed tech report showcasing full pre, mid, and post-training details for a frontier fully open model. PRIME, an open-source reasoning solution, achieved 26.7% pass@1, surpassing GPT-4o in benchmarks. Performance improvements include Qwen 32B (4-bit) generating at >40 tokens/sec on an M4 Max and libvips being 25x faster than Pillow for image resizing. New tools like Swaggo/swag for Swagger 2.0 documentation, Jujutsu (jj) Git-compatible VCS, and Portspoof security tool were introduced. Robotics advances include a weapon detection system with a meters-wide field of view and faster frame rates. Hardware benchmarks compared H100 and MI300x accelerators. Applications span medical error detection using PRIME and a financial AI agent integrating LangChainAI and Vercel AI SDK. Architectural insights suggest the need for breakthroughs similar to SSMs or RNNs.