Company: "cloudflare"

cosmos-3 nemotron-3-ultra minimax-m3 nvidia runway novita vercel cloudflare openclaude flowith omnimodal-models mixture-of-experts autoregressive-models diffusion-models structured-prompts fine-tuning open-weight-models multimodality agent-models benchmarking model-serving context-windows token-efficiency kimmonismus clementdelangue artificialanalysis scaling01 ctnzr caspar_br eliebakouch pbdtokenrouter rauchg gitlawb notjazii lostinlatencyx zhihufrontier

NVIDIA led open-source AI model releases with Cosmos 3, a comprehensive omnimodal world model unifying language, image, video, audio, and action using a Mixture-of-Transformers design, and Nemotron 3 Ultra, a 550B parameter open-weight model noted for high serving speed and strong evaluation performance. The Cosmos Coalition was launched to foster an open ecosystem for physical AI world models. Meanwhile, MiniMax M3 debuted as a multimodal agent/coding model with 1M context and strong benchmark scores, gaining rapid ecosystem support from vendors like Novita and Vercel AI Gateway. However, MiniMax M3 showed some inefficiencies such as high token consumption and verbose self-check loops. These developments highlight advances in open physical AI, multimodality, and agent models with significant community and infrastructure engagement.

Apr 20

not much happened today

kimi-k2.6 qwen-3.6-max-preview moonshot alibaba vllm openrouter cloudflare baseten mlx nous-research opencode ollama mixture-of-experts multimodality int4-quantization long-context agentic-coding multi-agent-systems model-orchestration memory-consolidation llm-driven-replanning dynamic-context-injection

Moonshot's Kimi K2.6 is a major open-weight 1T-parameter MoE model featuring 32B active parameters, 384 experts, MLA attention, 256K context window, native multimodality, and INT4 quantization. It supports day-0 integration with platforms like vLLM, OpenRouter, Cloudflare Workers AI, and others, showcasing state-of-the-art performance on benchmarks such as HLE w/ tools 54.0, SWE-Bench Pro 58.6, and Math Vision w/ python 93.2. The model excels in long-horizon execution with over 4,000 tool calls, 12+ hour continuous runs, and 300 parallel sub-agents. Meanwhile, Alibaba's Qwen3.6-Max-Preview previewed enhanced agentic coding, improved world knowledge, and instruction following, with notable performance on AIME 2026 #15 and ranking in Code Arena. Hermes Agent is rapidly expanding its ecosystem, surpassing 100K GitHub stars and integrating with tools like Ollama and Copilot CLI, while pioneering advanced multi-agent orchestration techniques such as stateless ephemeral units, LLM-driven replanning, and dynamic context injection. These developments highlight the competitive momentum of Chinese open and semi-open labs in coding and agent models.

Apr 15

not much happened today

OpenAI expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes, fostering a new ecosystem with integrations from Cloudflare, Modal, Vercel, and others. Cloudflare launched Project Think, a next-gen Agents SDK with durable execution and sandboxed code, alongside Agent Lee, a prompt-driven UI agent using sandboxed TypeScript, and introduced real-time voice pipelines and browser automation tools. Hermes Agent focuses on persistent skill formation by learning from completed workflows, positioning itself as a professional agent distinct from GUI-first assistants like OpenClaw. "Hermes autonomously backfills tracking data, updates cron jobs, and saves workflows as reusable skills," highlighting its advanced workflow management capabilities.

Jul 01, 2025

not much happened today

chai-2 gemini-2.5-pro deepseek-r1-0528 meta scale-ai anthropic cloudflare grammarly superhuman chai-discovery atlassian notion slack commoncrawl hugging-face sakana-ai inference model-scaling collective-intelligence zero-shot-learning enterprise-deployment data-access science-funding open-source-llms alexandr_wang nat_friedman clementdelangue teortaxestex ylecun steph_palazzolo andersonbcdefg jeremyphoward reach_vb

Meta makes a major AI move by hiring Scale AI founder Alexandr Wang as Chief AI Officer and acquiring a 49% non-voting stake in Scale AI for $14.3 billion, doubling its valuation to about $28 billion. Chai Discovery announces Chai-2, a breakthrough model for zero-shot antibody discovery and optimization. The US government faces budget cuts threatening to eliminate a quarter million science research jobs by 2026. Data access restrictions intensify as companies like Atlassian, Notion, and Slack block web crawlers including Common Crawl, raising concerns about future public internet archives. Hugging Face shuts down HuggingChat after serving over a million users, marking a significant experiment in open-source LLMs. Sakana AI releases AB-MCTS, an inference-time scaling algorithm enabling multiple models like Gemini 2.5 Pro and DeepSeek-R1-0528 to cooperate and outperform individual models.

Jun 25, 2025

Context Engineering: Much More than Prompts

gemini-code openai langchain cognition google-deepmind vercel cloudflare openrouter context-engineering retrieval-augmented-generation tools state-management history-management prompt-engineering software-layer chatgpt-connectors api-integration karpathy walden_yan tobi_lutke hwchase17 rlancemartin kwindla dex_horthy

Context Engineering emerges as a significant trend in AI, highlighted by experts like Andrej Karpathy, Walden Yan from Cognition, and Tobi Lutke. It involves managing an LLM's context window with the right mix of prompts, retrieval, tools, and state to optimize performance, going beyond traditional prompt engineering. LangChain and its tool LangGraph are noted for advancing this approach. Additionally, OpenAI has launched ChatGPT connectors for platforms like Google Drive, Dropbox, SharePoint, and Box, enhancing context integration for Pro users. Other notable news includes the launch of Vercel Sandbox, Cloudflare Containers, the leak and release of Gemini Code by Google DeepMind, and fundraising efforts by OpenRouter.

Feb 27, 2025

lots of small launches

gpt-4o claude-3.7-sonnet claude-3.7 claude-3.5-sonnet deepseek-r1 deepseek-v3 grok-3 openai anthropic amazon cloudflare perplexity-ai deepseek-ai togethercompute elevenlabs elicitorg inceptionailabs mistral-ai voice model-releases cuda gpu-optimization inference open-source api model-performance token-efficiency context-windows cuda jit-compilation lmarena_ai alexalbert__ aravsrinivas reach_vb

GPT-4o Advanced Voice Preview is now available for free ChatGPT users with enhanced daily limits for Plus and Pro users. Claude 3.7 Sonnet has achieved the top rank in WebDev Arena with improved token efficiency. DeepSeek-R1 with 671B parameters benefits from the Together Inference platform optimizing NVIDIA Blackwell GPU usage, alongside the open-source DeepGEMM CUDA library delivering up to 2.7x speedups on Hopper GPUs. Perplexity launched a new Voice Mode and a Deep Research API. The upcoming Grok 3 API will support a 1M token context window. Several companies including Elicit, Amazon, Anthropic, Cloudflare, FLORA, Elevenlabs, and Inception Labs announced new funding rounds, product launches, and model releases.

Dec 15, 2023

12/15/2023: Mixtral-Instruct beats Gemini Pro (and matches GPT3.5)

mixtral gemini-pro gpt-3.5 gpt-4.5 gpt-4 chatgpt lmsys openai deepseek cloudflare huggingface performance context-window prompt-engineering privacy local-gpu cloud-gpu code-generation model-comparison model-usage api-errors karpathy

Thanks to a karpathy shoutout, lmsys now has enough data to rank mixtral and gemini pro. The discussion highlights the impressive performance of these state-of-the-art open-source models that can run on laptops. In the openai Discord, users compared AI tools like perplexity and chatgpt's browsing tool, favoring Perplexity for its superior data gathering, pricing, and usage limits. Interest was shown in AI's ability to convert large code files with deepseek coder recommended. Debates on privacy implications for AI advancement and challenges of running LLMs on local and cloud GPUs were prominent. Users reported issues with chatgpt including performance problems, loss of access to custom GPTs, and unauthorized access. Discussions also covered prompt engineering for large context windows and speculations about gpt-4.5 and gpt-4 future developments.