All tags
Company: "allen_ai"
not much happened today
gpt-5.2 opus-4.5 gemini-3-pro gpt-5.1 olmo-3.1-32b qwen3-vl-235b openai allen_ai mistral-ai ollama lmstudio thinkymachines reinforcement-learning model-benchmarking long-context model-quantization model-optimization inference-speed sparsity fine-tuning vision sama scaling01 akhaliq artificialanlys lechmazur acerfur epochairesearch
GPT-5.2 shows mixed performance in public evaluations, excelling in agentic tasks but at a significantly higher cost (~$620/run) compared to Opus 4.5 and GPT-5.1. It performs variably on reasoning and coding benchmarks, with some improvements on long-context tasks. Extended "reasoning effort" settings notably impact results. Aggregators rank Gemini 3 Pro above GPT-5.2 in task persistence. OpenAI released sparse activation models sparking debate on sparsity vs MoE architectures. Allen AI's Olmo 3.1 (32B) advances open reinforcement learning scale with substantial compute investment (~125k H100 hours). Mistral's Devstral-2 and llama.cpp improve local inference infrastructure with new features like GGUF support and distributed speedups. Tinker platform goes GA with vision input and finetuning support for Qwen3-VL-235B.
Vision Everywhere: Apple AIMv2 and Jina CLIP v2
aimv2-3b jina-clip-v2 tulu-3 llama-3-1 claude-3-5 llama-3-1-70b apple jina allen_ai autoregressive-objectives vision multilinguality multimodality image-generation model-training model-optimization reinforcement-learning fine-tuning model-benchmarking
Apple released AIMv2, a novel vision encoder pre-trained with autoregressive objectives that achieves 89.5% accuracy on ImageNet and integrates joint visual and textual objectives. Jina launched Jina CLIP v2, a multimodal embedding model supporting 89 languages and high-resolution images with efficient Matryoshka embeddings reducing dimensions by 94% with minimal accuracy loss. Allen AI introduced Tülu 3 models based on Llama 3.1 with 8B and 70B parameters, offering 2.5x faster inference and alignment via SFT, DPO, and RLVR methods, competing with Claude 3.5 and Llama 3.1 70B. These developments highlight advances in autoregressive training, vision encoders, and multilingual multimodal embeddings.