All tags
Topic: "post-training-quantization"
not much happened today
nomos-1 axiomprover devstral-2-small deepseek-v3.2 claude-code cursor-2.2 claude-opus-4.5 gpt-5 claude-sonnet-4.5 gemini-3-pro llama qwen mistral gemma nousresearch thinkymachines mistral-ai deepseek anthropic cursor microsoft langchain-ai openai gemini intel vllm_project danielhanchen math formal-reasoning agentic-systems asynchronous-execution multi-agent-systems observability benchmarking quantization post-training-quantization training-speedup kernel-optimization inference-efficiency
NousResearch's Nomos 1 is a 30B open math model achieving a top Putnam score with only ~3B active parameters, enabling consumer Mac inference. AxiomProver also posts top Putnam results using ThinkyMachines' RL stack. Mistral's Devstral 2 Small outperforms DeepSeek v3.2 in 71% of preferences with better speed and cost. Anthropic's Claude Code introduces asynchronous agent execution. Cursor 2.2 adds deep agent primitives like Debug and Plan Modes. VS Code launches unified agent chat sessions improving multi-agent workflows. LangChain releases "Polly" for agent observability. The Stirrup harness leads OpenAI GDPval benchmarks with Claude Opus 4.5, GPT-5, and Gemini 3 Pro following. Advances in quantization include vLLM integrating Intel's AutoRound PTQ for efficient serving. Unsloth achieves up to 3× training speedups with new kernels across Llama, Qwen, Mistral, and Gemma models. "Compositional reasoning + specialized post-training under constrained active params can rival frontier closed models on formal math."
Gemini Ultra is out, to mixed reviews
gemini-ultra gemini-advanced solar-10.7b openhermes-2.5-mistral-7b subformer billm google openai mistral-ai hugging-face multi-gpu-support training-data-contamination model-merging model-alignment listwise-preference-optimization high-performance-computing parameter-sharing post-training-quantization dataset-viewer gpu-scheduling fine-tuning vram-optimization
Google released Gemini Ultra as a paid tier for "Gemini Advanced with Ultra 1.0" following the discontinuation of Bard. Reviews noted it is "slightly faster/better than ChatGPT" but with reasoning gaps. The Steam Deck was highlighted as a surprising AI workstation capable of running models like Solar 10.7B. Discussions in AI communities covered topics such as multi-GPU support for OSS Unsloth, training data contamination from OpenAI outputs, ethical concerns over model merging, and new alignment techniques like Listwise Preference Optimization (LiPO). The Mojo programming language was praised for high-performance computing. In research, the Subformer model uses sandwich-style parameter sharing and SAFE for efficiency, and BiLLM introduced 1-bit post-training quantization to reduce resource use. The OpenHermes dataset viewer tool was launched, and GPU scheduling with Slurm was discussed. Fine-tuning challenges for models like OpenHermes-2.5-Mistral-7B and VRAM requirements were also topics of interest.