All tags
Person: "denny_zhou"
DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
deepseek-v3 gpt-4o claude-3.5-sonnet llama-3 deepseek-ai hugging-face openai anthropic mixture-of-experts model-training model-optimization reinforcement-learning chain-of-thought multi-token-prediction synthetic-data model-distillation fine-tuning attention-mechanisms gpu-optimization nrehiew_ denny_zhou
DeepSeek-V3 has launched with 671B MoE parameters and trained on 14.8T tokens, outperforming GPT-4o and Claude-3.5-sonnet in benchmarks. It was trained with only 2.788M H800 GPU hours, significantly less than Llama-3's 30.8M GPU-hours, showcasing major compute efficiency and cost reduction. The model is open-source and deployed via Hugging Face with API support. Innovations include native FP8 mixed precision training, Multi-Head Latent Attention scaling, distillation from synthetic reasoning data, pruning and healing for MoEs with up to 256 experts, and a new multi-token prediction objective enabling lookahead token planning. Research highlights also cover the OREO method and Natural Language Reinforcement Learning (NLRL) for multi-step reasoning and agent control.
OpenAI Sora Turbo and Sora.com
sora-turbo o1 claude-3.5-sonnet claude-3.5 gemini llama-3-3-euryale-v2.3 mistral-large behemoth endurance-v1.1 openai google nvidia hugging-face mistral-ai text-to-video-generation quantum-computing coding-capabilities transformers algorithmic-innovation storytelling roleplay model-parameter-tuning anti-monopoly-investigation sama sundarpichai bindureddy denny_zhou nrehiew_
OpenAI launched Sora Turbo, enabling text-to-video generation for ChatGPT Plus and Pro users with monthly generation limits and regional restrictions in Europe and the UK. Google announced a quantum computing breakthrough with the development of the Willow chip, potentially enabling commercial quantum applications. Discussions on O1 model performance highlighted its lag behind Claude 3.5 Sonnet and Gemini in coding tasks, with calls for algorithmic innovation beyond transformer scaling. The Llama 3.3 Euryale v2.3 model was praised for storytelling and roleplay capabilities, with users suggesting parameter tuning to reduce creative liberties and repetition. Alternatives like Mistral-Large, Behemoth, and Endurance v1.1 were also noted. Additionally, Nvidia faces an anti-monopoly investigation in China. Memes and humor around GPU issues and embargo mishaps were popular on social media.
nothing much happened today
o1 chatgpt-4o llama-3-1-405b openai lmsys scale-ai cognition langchain qdrant rohanpaul_ai reinforcement-learning model-merging embedding-models toxicity-detection image-editing dependency-management automated-code-review visual-search benchmarking denny_zhou svpino alexandr_wang cwolferesearch rohanpaul_ai _akhaliq kylebrussell
OpenAI's o1 model faces skepticism about open-source replication due to its extreme restrictions and unique training advances like RL on CoT. ChatGPT-4o shows significant performance improvements across benchmarks. Llama-3.1-405b fp8 and bf16 versions perform similarly with cost benefits for fp8. A new open-source benchmark "Humanity's Last Exam" offers $500K in prizes to challenge LLMs. Model merging benefits from neural network sparsity and linear mode connectivity. Embedding-based toxic prompt detection achieves high accuracy with low compute. InstantDrag enables fast, optimization-free drag-based image editing. LangChain v0.3 releases with improved dependency management. Automated code review tool CodeRabbit adapts to team coding styles. Visual search advances integrate multimodal data for better product search. Experts predict AI will be default software by 2030.