All tags
Model: "claude-4.5-haiku"
xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing
grok-4.1 gpt-5.1 claude-4.1-opus grok-4 gpt-5 grok-4.1-thinking gpt-5-pro claude-4.5-haiku xai openai google-deepmind sakana-ai anthropic microsoft mufg khosla nea lux-capital iqt model-performance creative-writing hallucination evaluation-datasets ensemble-models weather-forecasting funding efficiency anti-hallucination arc-agi model-scaling yanndubs gregkamradt philschmid willccbb
xAI launched Grok 4.1, achieving a #1 rank on the LM Arena Text Leaderboard with an Elo score of 1483, showing improvements in creative writing and anti-hallucination. OpenAI's GPT-5.1 "Thinking" demonstrates efficiency gains with ~60% less "thinking" on easy queries and strong ARC-AGI performance. Google DeepMind released WeatherNext 2, an ensemble generative model that is 8× faster and more accurate for global weather forecasts, integrated into multiple Google products. Sakana AI raised ¥20B ($135M) in Series B funding at a $2.63B valuation to focus on efficient AI for resource-constrained enterprise applications in Japan. New evaluations highlight tradeoffs between hallucination and knowledge accuracy across models including Claude 4.1 Opus and Anthropic models.
Claude Agent Skills - glorified AGENTS.md? or MCP killer?
claude-4.5-haiku claude chatgpt huggingchat-omni anthropic openai microsoft perplexity-ai huggingface groq cerebras togethercompute agent-skills document-processing long-context reasoning multi-model-routing memory-management voice vision simonwillison alexalbert__ mustafasuleyman yusuf_i_mehdi aravsrinivas
Anthropic achieves a rare feat with back-to-back AI news headlines featuring Claude's new Skills—a novel way to build specialized agents using Markdown files, scripts, and metadata to handle tasks like creating and reading PDFs, Docs, and PPTs. Simon Willison calls this a "bigger deal than MCP," predicting a "Cambrian explosion in Skills." Meanwhile, Anthropic launches Claude 4.5 Haiku with strong reasoning and long-context capabilities, priced competitively. Other updates include OpenAI's ChatGPT memory management improvements, Windows 11 Copilot voice and vision features, and HuggingChat Omni routing across 115 open-source models from 15 providers. These developments highlight advances in agent skills, document processing, long-context reasoning, and multi-model routing.