All tags
Topic: "multi-turn-dialogue"
MiniMax-M2.5: SOTA coding, search, toolcalls, $1/hour
minimax-m2.5 glm-5 minimax-ai togethercompute huggingface intel wandb reinforcement-learning agent-based-models model-quantization benchmarking model-efficiency multi-turn-dialogue infrastructure-optimization cost-efficiency on-device-ai
MiniMax-M2.5 is now open source, featuring an "agent-native" reinforcement learning framework called Forge trained across 200k+ RL environments for coding, tool use, and workflows. It boasts strong benchmark scores like 80.2% SWE-Bench Verified and emphasizes cost-efficiency with claims like "$1 per hour at 100 tps" and good on-device performance. The Forge RL system uses multi-level prefix caching and high rollout compute share (~60%) to generate millions of trajectories daily. Independent reviews note improved stability and multi-turn viability but high token usage. The ecosystem rapidly adopted MiniMax-M2.5 with quantized releases including 2-bit GGUF and INT4 formats. Meanwhile, Together markets GLM-5 as a leading open-source model for long-horizon agents with 77.8% SWE-Bench Verified and MoE efficiency using DeepSeek Sparse Attention.
LLaDA: Large Language Diffusion Models
llada-8b llama-3-8b step-video-t2v-30b step-audio-chat-132b llama-2-7b stepfun-ai scale-ai cambridge llamaindex diffusion-models text-generation multimodality video-generation voice-processing benchmarking instruction-following model-scaling gpu-usage long-context multi-turn-dialogue arankomatsuzaki _akhaliq omarsar0 iscienceluvr gallabytes maximelabonne reach_vb
LLaDA (Large Language Diffusion Model) 8B is a breakthrough diffusion-based language model that rivals LLaMA 3 8B while training on 7x fewer tokens (2 trillion tokens) and using 0.13 million H800 GPU hours. It introduces a novel text generation approach by predicting uniformly masked tokens in a diffusion process, enabling multi-turn dialogue and instruction-following. Alongside, StepFun AI released two major models: Step-Video-T2V 30B, a text-to-video model generating up to 204 frames with high coherence and motion quality, and Step-Audio-Chat 132B, a voice-to-voice model. Additionally, challenging multimodal benchmarks like Scale AI's EnigmaEval and Cambridge's ZeroBench highlight current frontier models scoring zero, emphasizing the difficulty of these tasks. The community also noted the return of diffusion models in language modeling, a previously speculative architecture now scaled successfully.