subscribe / issues / tags /

Topic: "fp8-training"

not much happened today

o3-mini o1-mini llama hunyuan-a13b ernie-4.5 ernie-4.5-21b-a3b qwen3-30b-a3b gemini-2.5-pro meta-ai-fair openai tencent microsoft baidu gemini superintelligence ai-talent job-market open-source-models multimodality mixture-of-experts quantization fp8-training model-benchmarking model-performance model-releases api model-optimization alexandr_wang shengjia_zhao jhyuxm ren_hongyu shuchaobi saranormous teortaxesTex mckbrando yuchenj_uw francoisfleuret quanquangu reach_vb philschmid

Meta has poached top AI talent from OpenAI, including Alexandr Wang joining as Chief AI Officer to work towards superintelligence, signaling a strong push for the next Llama model. The AI job market shows polarization with high demand and compensation for top-tier talent, while credentials like strong GitHub projects gain importance. The WizardLM team moved from Microsoft to Tencent to develop open-source models like Hunyuan-A13B, highlighting shifts in China's AI industry. Rumors suggest OpenAI will release a new open-source model in July, potentially surpassing existing ChatGPT models. Baidu open-sourced multiple variants of its ERNIE 4.5 model series, featuring advanced techniques like 2-bit quantization, MoE router orthogonalization loss, and FP8 training, with models ranging from 0.3B to 424B parameters. Gemini 2.5 Pro returned to the free tier of the Gemini API, enabling developers to explore its features.

Llama 4's Controversial Weekend Release

llama-4 llama-3 llama-3-2 meta mixture-of-experts early-fusion attention-mechanisms fp8-training training-data benchmarking model-performance model-release multimodality open-models ahmad_al_dahle ylecun reach_vb yuchenj_uw

Meta released Llama 4, featuring two new medium-size MoE open models and a promised 2 Trillion parameter "behemoth" model, aiming to be the largest open model ever. The release included advanced training techniques like Chameleon-like early fusion with MetaCLIP, interleaved chunked attention without RoPE, native FP8 training, and training on up to 40 trillion tokens. Despite the hype, the release faced criticism for lack of transparency compared to Llama 3, implementation issues, and poor performance on some benchmarks. Meta leadership, including Ahmad Al Dahle, denied allegations of training on test sets. The smallest Scout model at 109B parameters is too large for consumer GPUs, and the claimed 10 million token context is disputed. The community response has been mixed, with some praising the openness and others pointing out discrepancies and quality concerns.

FlashAttention 3, PaliGemma, OpenAI's 5 Levels to Superintelligence

flashattention-3 paligemma-3b gemma-2b numinamath-7b deepseekmath-7b codellama-34b wizardcoder-python-34b-v1.0 chatgpt-3.5 openai together-ai google hugging-face deepseek code-llama attention-mechanisms fp8-training vision prefix-lm superintelligence fine-tuning chain-of-thought tool-integrated-reasoning self-consistency-decoding python coding-capabilities elo-ratings ilya-sutskever lucas-giffman

FlashAttention-3 introduces fast and accurate attention optimized for H100 GPUs, advancing native FP8 training. PaliGemma, a versatile 3B Vision-Language Model (VLM) combining a SigLIP-So400m ViT encoder with the Gemma-2B language model, emphasizes a prefix-LM architecture for improved image-query interaction. OpenAI reveals a framework on levels of superintelligence, signaling progress toward Level 2 and highlighting internal safety disagreements. On Reddit, NuminaMath 7B, fine-tuned from DeepSeekMath-7B, wins the AI Math Olympiad by solving 29 problems using iterative supervised fine-tuning and tool-integrated reasoning. Open-source LLMs like CodeLlama-34b and WizardCoder-Python-34B-V1.0 are closing the coding performance gap with closed models such as ChatGPT-3.5.

© 2025 • AINews

You can also subscribe by rss .

Press Esc or click anywhere to close