All tags
Company: "thinking-machines"
not much happened today
nemotron-nano-2 gpt-oss-120b qwen3 llama-3 minimax-m2 glm-4.6-air gemini-2.5-flash gpt-5.1-mini tahoe-x1 vllm_project nvidia mistral-ai baseten huggingface thinking-machines deeplearningai pytorch arena yupp-ai zhipu-ai scaling01 stanford transformer-architecture model-optimization inference distributed-training multi-gpu-support performance-optimization agents observability model-evaluation reinforcement-learning model-provenance statistical-testing foundation-models cancer-biology model-fine-tuning swyx dvilasuero _lewtun clementdelangue zephyr_z9 skylermiao7 teortaxestex nalidoust
vLLM announced support for NVIDIA Nemotron Nano 2, featuring a hybrid Transformer–Mamba design and tunable "thinking budget" enabling up to 6× faster token generation. Mistral AI Studio launched a production platform for agents with deep observability. Baseten reported high throughput (650 TPS) for GPT-OSS 120B on NVIDIA hardware. Hugging Face InspectAI added inference provider integration for cross-provider evaluation. Thinking Machines Tinker abstracts distributed fine-tuning for open-weight LLMs like Qwen3 and Llama 3. In China, MiniMax M2 shows competitive performance with top models and is optimized for agents and coding, while Zhipu GLM-4.6-Air focuses on reliability and scaling for coding tasks. Rumors suggest Gemini 2.5 Flash may be a >500B parameter MoE model, and a possible GPT-5.1 mini reference appeared. Outside LLMs, Tahoe-x1 (3B) foundation model achieved SOTA in cancer cell biology benchmarks. Research from Stanford introduces a method to detect model provenance via training-order "palimpsest" with strong statistical guarantees.
Thinking Machines' Tinker: LoRA based LLM fine-tuning API
qwen-235b-a22b sora-2 thinking-machines openai fine-tuning lora model-training api model-optimization distributed-training post-training-methods research-productivity video-generation content-moderation engagement-patterns karpathy lilianweng sama
Thinking Machines recently raised $2 billion without shipping a product until now, launching their first product Tinker, a managed service API for fine-tuning large and mixture-of-experts models like Qwen-235B-A22B using LoRA for cost-efficient training. The Tinker API offers low-level primitives for post-training methods and is supported by an open-source Tinker Cookbook library. Influential AI figures like Andrej Karpathy and Lilian Weng praised its design for reducing complexity and boosting research productivity. Meanwhile, OpenAI launched Sora 2, a video+audio model integrated into their consumer social app, sparking viral engagement and concerns over misuse and content moderation. Sam Altman emphasized the product's dual focus on delight and revenue alongside AGI research.
X.ai Grok 3 and Mira Murati's Thinking Machines
grok-3 grok-3-mini gemini-2-pro gpt-4o o3-mini-high o1 deepseek-r1 anthropic openai thinking-machines benchmarking reasoning reinforcement-learning coding multimodality safety alignment research-publishing model-performance creative-ai mira-murati lmarena_ai karpathy omarsar0 ibab arankomatsuzaki iscienceluvr scaling01
Grok 3 has launched with mixed opinions but strong benchmark performance, notably outperforming models like Gemini 2 Pro and GPT-4o. The Grok-3 mini variant shows competitive and sometimes superior capabilities, especially in reasoning and coding, with reinforcement learning playing a key role. Mira Murati has publicly shared her post-OpenAI plan, founding the frontier lab Thinking Machines, focusing on collaborative, personalizable AI, multimodality, and empirical safety and alignment research, reminiscent of Anthropic's approach.