Topic: "model-calibration"

Sep 13, 2025

mobilellm-r1 qwen3-next-80b-a3b gpt-5 meta-ai-fair huggingface alibaba openai reasoning model-efficiency hybrid-attention long-context benchmarking agent-evaluation hallucination-detection model-calibration inference-complexity model-pricing _akhaliq tacocohen pkirgis sayashk

Meta released MobileLLM-R1, a sub-1B parameter reasoning model family on Hugging Face with strong small-model math accuracy, trained on 4.2T tokens. Alibaba introduced Qwen3-Next-80B-A3B with hybrid attention, 256k context window, and improved long-horizon memory, priced competitively on Alibaba Cloud. Meta AI FAIR fixed a benchmark bug in SWE-Bench affecting agent evaluation. LiveMCP-101 benchmark shows frontier models like GPT-5 underperform on complex tasks with common failure modes cataloged. OpenAI highlights hallucination issues due to benchmark incentives, proposing calibration improvements. Community demos and tooling updates continue to evolve.

You can also subscribe by rss .

Press Esc or click anywhere to close