All tags
Person: "_avichawla"
not much happened today
kimi-k2 qwen3-next nemotron-nano-2 granite-4.0 gpt-4.5 copilot codex vllm perplexity-ai ibm anthropic graphiti claude cursor-ai microsoft mixture-of-experts model-integration cloud-computing hybrid-models benchmarking agent-systems memory-persistence semantic-search code-retrieval context-length-optimization tool-use evaluation-frameworks software-development scaling01 cedric_chee aravsrinivas omarsar0 _avichawla pierceboggan jo_parkhurst jyangballin ofirpress ml_angelopoulos
Kimi-K2 Reasoner has been integrated into vLLM and will soon be supported by SGLang, featuring a massive 1.2 trillion parameter MoE configuration. Perplexity AI released research on cloud-portable trillion-parameter MoE kernels optimized for AWS EFA, with potential integration into vLLM. IBM's vLLM team formalized hybrid dense and sparse expert models, supporting models like Qwen3-Next, Nemotron Nano 2, and Granite 4.0. Kimi-K2 reportedly scores 77% on GPQA Diamond, outperforming GPT-4.5 at 71.4%, though this is unverified.
Anthropic published a guide on efficient tool-heavy agent systems using MCP patterns, drastically reducing context tokens by ~98.7%. Graphiti MCP demonstrated shared memory across apps like Claude Desktop and Cursor for persistent agent memory. VS Code introduced an "Agent sessions" feature to unify agent management, including Copilot and Codex. Cursor AI improved coding accuracy via semantic search and code retrieval embeddings. New evaluation frameworks like CodeClash and LMArena assess agent and coding model performance in realistic multi-round tasks and occupation-tagged leaderboards.
not much happened today
trillium gemini-2.5-pro gemini-deepthink google huawei epoch-ai deutsche-telekom nvidia anthropic reka-ai weaviate deepmind energy-efficiency datacenters mcp context-engineering instruction-following embedding-models math-reasoning benchmarking code-execution sundarpichai yuchenj_uw teortaxestex epochairesearch scaling01 _avichawla rekaailabs anthropicai douwekiela omarsar0 nityeshaga goodside iscienceluvr lmthang
Google's Project Suncatcher prototypes scalable ML compute systems in orbit using solar energy with Trillium-generation TPUs surviving radiation, aiming for prototype satellites by 2027. China's 50% electricity subsidies for datacenters may offset chip efficiency gaps, with Huawei planning gigawatt-scale SuperPoDs for DeepSeek by 2027. Epoch launched an open data center tracking hub, and Deutsche Telekom and NVIDIA announced a $1.1B Munich facility with 10k GPUs. In agent stacks, MCP (Model-Compute-Platform) tools gain traction with implementations like LitServe, Claude Desktop, and Reka's MCP server for VS Code. Anthropic emphasizes efficient code execution with MCP. Context engineering shifts focus from prompt writing to model input prioritization, with reports and tools from Weaviate, Anthropic, and practitioners highlighting instruction-following rerankers and embedding approaches. DeepMind's IMO-Bench math reasoning suite shows Gemini DeepThink achieving high scores, with a ProofAutoGrader correlating strongly with human grading. Benchmarks and governance updates include new tasks and eval sharing in lighteval.
Anthropic raises $13B at $183B Series F
claude-code gpt-5 grok-4 claude sonnet-4 glm-4.5 deepseek-r1 anthropic mistral-ai x-ai salesforce galileo openpipe zhipu thudm enterprise-connectors agent-benchmarking reinforcement-learning inference-optimization memory-optimization cuda multi-token-prediction speculative-decoding tensor-offload performance-optimization real-time-guardrails cost-optimization swyx emilygsands _philschmid _lewtun omarsar0 _avichawla corbtt
Anthropic achieved a $183B post-money valuation in Series F funding by September 2025, growing from about $1B run-rate in January to over $5B run-rate by August 2025. Their Claude Code product saw >10x usage growth in three months and reached $500M run-rate revenue, serving over 300,000 business customers with a nearly 7x increase in large accounts. Mistral AI launched Le Chat with 20+ MCP connectors integrating with major SaaS platforms and persistent memory features. Benchmarking updates highlight GPT-5 leading agent intelligence indices, with strong performances from xAI's Grok and Anthropic's Claude families. Reliability tooling and agent evaluation advances were shared by Galileo, OpenPipe, and others. Zhipu/THUDM open-sourced Slime v0.1.0, enhancing RL infrastructure behind GLM-4.5 with significant decoding speed improvements and advanced tensor offload techniques.