All tags
Company: "lux-capital"
xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing
grok-4.1 gpt-5.1 claude-4.1-opus grok-4 gpt-5 grok-4.1-thinking gpt-5-pro claude-4.5-haiku xai openai google-deepmind sakana-ai anthropic microsoft mufg khosla nea lux-capital iqt model-performance creative-writing hallucination evaluation-datasets ensemble-models weather-forecasting funding efficiency anti-hallucination arc-agi model-scaling yanndubs gregkamradt philschmid willccbb
xAI launched Grok 4.1, achieving a #1 rank on the LM Arena Text Leaderboard with an Elo score of 1483, showing improvements in creative writing and anti-hallucination. OpenAI's GPT-5.1 "Thinking" demonstrates efficiency gains with ~60% less "thinking" on easy queries and strong ARC-AGI performance. Google DeepMind released WeatherNext 2, an ensemble generative model that is 8× faster and more accurate for global weather forecasts, integrated into multiple Google products. Sakana AI raised ¥20B ($135M) in Series B funding at a $2.63B valuation to focus on efficient AI for resource-constrained enterprise applications in Japan. New evaluations highlight tradeoffs between hallucination and knowledge accuracy across models including Claude 4.1 Opus and Anthropic models.
not much happened today
gpt-5 kimi-k2-0905 glm-4.5 qwen3-asr opus-4.1 cognition founders-fund lux-capital 8vc neo vercel claude groq alibaba huggingface meta-ai-fair google theturingpost algoperf coding-agents agent-architecture open-source model-evaluation multilingual-models speech-recognition model-optimization kv-cache quantization algorithmic-benchmarking video-generation context-windows swyx tim_dettmers
Cognition raised $400M at a $10.2B valuation to advance AI coding agents, with swyx joining to support the "Decade of Agents" thesis. Vercel launched an OSS "vibe coding platform" using a tuned GPT-5 agent loop. Claude Code emphasizes minimalism in agent loops for reliability. Kimi K2-0905 achieved 94% on coding evals and improved agentic capabilities with doubled context length. Alibaba released Qwen3-ASR, a multilingual transcription model with <8% WER. Meta introduced Set Block Decoding for 3-5× faster decoding without architectural changes. Innovations in KV cache compression and quantization include AutoRound, QuTLASS v0.1.0, and AlgoPerf v0.6. Google's Veo 3 video generation API went GA with significant price cuts and vertical video support.