All tags
Model: "qwen-3-8b"
Mary Meeker is so back: BOND Capital AI Trends report
qwen-3-8b anthropic hugging-face deepseek attention-mechanisms inference arithmetic-intensity transformers model-optimization interpretability model-quantization training tri_dao fleetwood___ teortaxestex awnihannun lateinteraction neelnanda5 eliebakouch _akhaliq
Mary Meeker returns with a comprehensive 340-slide report on the state of AI, highlighting accelerating tech cycles, compute growth, and comparisons of ChatGPT to early Google and other iconic tech products. The report also covers enterprise traction and valuation of major AI companies. On Twitter, @tri_dao discusses an "ideal" inference architecture featuring attention variants like GTA, GLA, and DeepSeek MLA with high arithmetic intensity (~256), improving efficiency and model quality. Other highlights include the release of 4-bit DWQ of DSR1 Qwen3 8B on Hugging Face, AnthropicAI's open-source interpretability tools for LLMs, and discussions on transformer training and abstractions by various researchers.
DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release
deepseek-r1-0528 gemini-2.5-pro qwen-3-8b qwen-3-235b deepseek-ai anthropic meta-ai-fair nvidia alibaba google-deepmind reinforcement-learning benchmarking model-performance open-weights reasoning quantization post-training model-comparison artificialanlys scaling01 cline reach_vb zizhpan andrewyng teortaxestex teknim1 lateinteraction abacaj cognitivecompai awnihannun
DeepSeek R1-0528 marks a significant upgrade, closing the gap with proprietary models like Gemini 2.5 Pro and surpassing benchmarks from Anthropic, Meta, NVIDIA, and Alibaba. This Chinese open-weights model leads in several AI benchmarks, driven by reinforcement learning post-training rather than architecture changes, and demonstrates increased reasoning token usage (23K tokens per question). The China-US AI race intensifies as Chinese labs accelerate innovation through transparency and open research culture. Key benchmarks include AIME 2024, LiveCodeBench, and GPQA Diamond.