All tags
Topic: "retrieval"
not much happened today
7m-tiny-recursive-model jamba-reasoning-3b qwen3-omni qwen-image-edit-2509 colbert-nano agentflow samsung lecuun ai21-labs alibaba coreweave weights-biases openpipe stanford recursive-reasoning density-estimation multimodality long-context retrieval serverless-reinforcement-learning agentic-systems model-efficiency reinforcement-learning transformers rasbt jm_alexia jiqizhixin randall_balestr corbtt shawnup _akhaliq
Samsung's 7M Tiny Recursive Model (TRM) achieves superior reasoning on ARC-AGI and Sudoku with fewer layers and MLP replacing self-attention. LeCun's team introduces JEPA-SCORE, enabling density estimation from encoders without retraining. AI21 Labs releases Jamba Reasoning 3B, a fast hybrid SSM-Transformer model supporting up to 64K context tokens. Alibaba's Qwen3 Omni/Omni Realtime offers a unified audio-video-text model with extensive language and speech support, outperforming Gemini 2.0 Flash on BigBench Audio. Alibaba also debuts Qwen Image Edit 2509, a top open-weight multi-image editing model. ColBERT Nano models demonstrate effective retrieval at micro-scale parameter sizes. In reinforcement learning, CoreWeave, Weights & Biases, and OpenPipe launch serverless RL infrastructure reducing costs and speeding training. Stanford's AgentFlow presents an in-the-flow RL system with a 7B backbone outperforming larger models on agentic tasks. This update highlights advances in recursive reasoning, density estimation, multimodal architectures, long-context modeling, retrieval, and serverless reinforcement learning.
not much happened today
claude-3-sonnet claude-3-opus gpt-5-codex grok-4-fast qwen-3-next gemini-2.5-pro sora-2-pro ray-3 kling-2.5 veo-3 modernvbert anthropic x-ai google google-labs openai arena epoch-ai mit luma akhaliq coding-agents cybersecurity api model-taxonomy model-ranking video-generation benchmarking multi-modal-generation retrieval image-text-retrieval finbarrtimbers gauravisnotme justinlin610 billpeeb apples_jimmy akhaliq
Anthropic announces a new CTO. Frontier coding agents see updates with Claude Sonnet 4.5 showing strong cybersecurity and polished UX but trailing GPT-5 Codex in coding capability. xAI Grok Code Fast claims higher edit success at lower cost. Google's Jules coding agent launches a programmable API with CI/CD integration. Qwen clarifies its model taxonomy and API tiers. Vision/LM Arena rankings show a tight competition among Claude Sonnet 4.5, Claude Opus 4.1, Gemini 2.5 Pro, and OpenAI's latest models. In video generation, Sora 2 Pro leads App Store rankings with rapid iteration and a new creator ecosystem; early tests show it answers GPQA-style questions at 55% accuracy versus GPT-5's 72%. Video Arena adds new models like Luma's Ray 3 and Kling 2.5 for benchmarking. Multi-modal video+audio generation model Ovi (Veo-3-like) is released. Retrieval models include ModernVBERT from MIT with efficient image-text retrieval capabilities. "Claude Sonnet 4.5 is basically the same as Opus 4.1 for coding" and "Jules is a programmable team member" highlight key insights.