All tags
Person: "gallabytes"
Chinese Models Launch - MiniMax-M1, Hailuo 2 "Kangaroo", Moonshot Kimi-Dev-72B
minimax-m1 hailuo-02 kimi-dev-72b deepseek-r1 ale-agent minimax-ai moonshot-ai deepseek bytedance anthropic langchain columbia-university sakana-ai openai microsoft multi-agent-systems attention-mechanisms coding optimization prompt-injection model-performance video-generation model-training task-automation jerryjliu0 hwchase17 omarsar0 gallabytes lateinteraction karpathy
MiniMax AI launched MiniMax-M1, a 456 billion parameter open weights LLM with a 1 million token input and 80k token output using efficient "lightning attention" and a GRPO variant called CISPO. MiniMax AI also announced Hailuo 02 (0616), a video model similar to ByteDance's Seedance. Moonshot AI released Kimi-Dev-72B, a coding model outperforming DeepSeek R1 on SWEBench Verified. Discussions on multi-agent system design from Anthropic and LangChain highlighted improvements in task completion and challenges like prompt injection attacks, as demonstrated by Karpathy and Columbia University research. Sakana AI introduced ALE-Agent, a coding agent that ranked 21st in the AtCoder Heuristic Competition solving NP-hard optimization problems. There is unverified news about an acquisition involving OpenAI, Microsoft, and Windsurf.
LLaDA: Large Language Diffusion Models
llada-8b llama-3-8b step-video-t2v-30b step-audio-chat-132b llama-2-7b stepfun-ai scale-ai cambridge llamaindex diffusion-models text-generation multimodality video-generation voice-processing benchmarking instruction-following model-scaling gpu-usage long-context multi-turn-dialogue arankomatsuzaki _akhaliq omarsar0 iscienceluvr gallabytes maximelabonne reach_vb
LLaDA (Large Language Diffusion Model) 8B is a breakthrough diffusion-based language model that rivals LLaMA 3 8B while training on 7x fewer tokens (2 trillion tokens) and using 0.13 million H800 GPU hours. It introduces a novel text generation approach by predicting uniformly masked tokens in a diffusion process, enabling multi-turn dialogue and instruction-following. Alongside, StepFun AI released two major models: Step-Video-T2V 30B, a text-to-video model generating up to 204 frames with high coherence and motion quality, and Step-Audio-Chat 132B, a voice-to-voice model. Additionally, challenging multimodal benchmarks like Scale AI's EnigmaEval and Cambridge's ZeroBench highlight current frontier models scoring zero, emphasizing the difficulty of these tasks. The community also noted the return of diffusion models in language modeling, a previously speculative architecture now scaled successfully.