All tags
Topic: "training-optimization"
not much happened today
gemini-3.1-pro gpt-5.5 opus-4.7-xhigh agent-moderncolbert google-deepmind lighton nous-research research-benchmarks math medical-benchmarks agentic-systems program-synthesis retrieval-augmentation training-optimization superoptimization scaling-laws training-efficiency gpu-optimization attention-mechanisms soohak polynoamial torchcompiled leloykun che_shr_cat jjitsev omarsar0
Research-level reasoning benchmarks are advancing with 439 new math problems from 64 mathematicians and expanded medical benchmarks in Medmarks v1.0 covering 30 benchmarks and 61 models. Google DeepMind's AI Co-Mathematician achieves 48% on FrontierMath Tier 4, while Gemini 3.1 Pro improves physics benchmark scores significantly. GPT-5.5 high/xhigh outperforms Opus 4.7 xhigh on program synthesis tasks. Retrieval benchmarks favor smaller models like LightOn's Agent-ModernColBERT with 149M parameters. Training optimization advances include SOAP/Muon-style updates reducing training steps, and a Lean4-to-TileLang superoptimizer achieving 1.8× speedup on A100 GPUs. Scaling laws are reconsidered with arguments for measuring in bytes rather than tokens. New training-time efficiency methods like Lighthouse Attention enable subquadratic training wrappers removable before deployment.
RIP Latent Diffusion, Hello Hourglass Diffusion
gpt-4 latent-diffusion stable-diffusion meta-ai-fair openai hugging-face diffusion-models transformers image-generation model-efficiency fine-tuning quantization prompt-engineering roleplay training-optimization katherine-crowson lucidrains
Katherine Crowson from Stable Diffusion introduces a hierarchical pure transformer backbone for diffusion-based image generation that efficiently scales to megapixel resolutions with under 600 million parameters, improving upon the original ~900M parameter model. This architecture processes local and global image phenomena separately, enhancing efficiency and resolution without latent steps. Additionally, Meta's Self Rewarding LM paper has inspired lucidrains to begin an implementation. Discord summaries highlight GPT-4's robustness against quantification tricks, discussions on open-source GPT-0 alternatives, challenges in DPO training on limited VRAM with suggestions like QLoRA and rmsprop, and efforts to improve roleplay model consistency through fine-tuning and merging. Philosophical debates on AI sentience and GPT-4 customization for markdown and translation tasks were also noted.