All tags
Model: "olmo-3.1-32b"
not much happened today
gpt-5.2 opus-4.5 gemini-3-pro gpt-5.1 olmo-3.1-32b qwen3-vl-235b openai allen_ai mistral-ai ollama lmstudio thinkymachines reinforcement-learning model-benchmarking long-context model-quantization model-optimization inference-speed sparsity fine-tuning vision sama scaling01 akhaliq artificialanlys lechmazur acerfur epochairesearch
GPT-5.2 shows mixed performance in public evaluations, excelling in agentic tasks but at a significantly higher cost (~$620/run) compared to Opus 4.5 and GPT-5.1. It performs variably on reasoning and coding benchmarks, with some improvements on long-context tasks. Extended "reasoning effort" settings notably impact results. Aggregators rank Gemini 3 Pro above GPT-5.2 in task persistence. OpenAI released sparse activation models sparking debate on sparsity vs MoE architectures. Allen AI's Olmo 3.1 (32B) advances open reinforcement learning scale with substantial compute investment (~125k H100 hours). Mistral's Devstral-2 and llama.cpp improve local inference infrastructure with new features like GGUF support and distributed speedups. Tinker platform goes GA with vision input and finetuning support for Qwen3-VL-235B.