All tags
Topic: "moe"
not much happened today
gemma-3n glm-4.1v-thinking deepseek-r1t2 mini-max-m1 o3 claude-4-opus claude-sonnet moe-72b meta scale-ai unslothai zhipu-ai deepseek huawei minimax-ai allenai sakana-ai-labs openai model-performance vision conv2d float16 training-loss open-source model-benchmarks moe load-balancing scientific-literature-evaluation code-generation adaptive-tree-search synthesis-benchmarks alexandr_wang natfriedman steph_palazzolo thegregyang teortaxes_tex denny_zhou agihippo danielhanchen osanseviero reach_vb scaling01 ndea
Meta has hired Scale AI CEO Alexandr Wang as its new Chief AI Officer, acquiring a 49% non-voting stake in Scale AI for $14.3 billion, doubling its valuation to ~$28 billion. This move is part of a major talent shuffle involving Meta, OpenAI, and Scale AI. Discussions include the impact on Yann LeCun's influence at Meta and potential responses from OpenAI. In model news, Gemma 3N faces technical issues like vision NaNs and FP16 overflows, with fixes from UnslothAI. Chinese open-source models like GLM-4.1V-Thinking by Zhipu AI and DeepSeek R1T2 show strong performance and speed improvements. Huawei open-sourced a 72B MoE model with a novel load balancing solution. The MiniMax-M1 hybrid MoE model leads math benchmarks on the Text Arena leaderboard. AllenAI launched SciArena for scientific literature evaluation, where o3 outperforms others. Research from Sakana AI Labs introduces AB-MCTS for code generation, improving synthesis benchmarks.
Zuck goes Superintelligence Founder Mode: $100M bonuses + $100M+ salaries + NFDG Buyout?
llama-4 maverick scout minimax-m1 afm-4.5b chatgpt midjourney-v1 meta-ai-fair openai deeplearning-ai essential-ai minimax arcee midjourney long-context multimodality model-release foundation-models dataset-release model-training video-generation enterprise-ai model-architecture moe prompt-optimization sama nat dan ashvaswani clementdelangue amit_sangani andrewyng _akhaliq
Meta AI is reportedly offering 8-9 figure signing bonuses and salaries to top AI talent, confirmed by Sam Altman. They are also targeting key figures like Nat and Dan from the AI Grant fund for strategic hires. Essential AI released the massive 24-trillion-token Essential-Web v1.0 dataset with rich metadata and a 12-category taxonomy. DeepLearning.AI and Meta AI launched a course on Llama 4, featuring new MoE models Maverick (400B) and Scout (109B) with context windows up to 10M tokens. MiniMax open-sourced MiniMax-M1, a long-context LLM with a 1M-token window, and introduced the Hailuo 02 video model. OpenAI rolled out "Record mode" for ChatGPT Pro, Enterprise, and Edu on macOS. Arcee launched the AFM-4.5B foundation model for enterprise. Midjourney released its V1 video model enabling image animation. These developments highlight major advances in model scale, long-context reasoning, multimodality, and enterprise AI applications.
not much happened today
hunyuan-turbos qwen3-235b-a22b o3 gpt-4.1-nano grok-3 gemini-2.5-pro seed1.5-vl kling-2.0 tencent openai bytedance meta-ai-fair nvidia deepseek benchmarking model-performance moe reasoning vision video-understanding vision-language multimodality model-evaluation model-optimization lmarena_ai artificialanlys gdb _jasonwei iScienceLuvr _akhaliq _philschmid teortaxesTex mervenoyann reach_vb
Tencent's Hunyuan-Turbos has risen to #8 on the LMArena leaderboard, showing strong performance across major categories and significant improvement since February. The Qwen3 model family, especially the Qwen3 235B-A22B (Reasoning) model, is noted for its intelligence and efficient parameter usage. OpenAI introduced HealthBench, a new health evaluation benchmark developed with input from over 250 physicians, where models like o3, GPT-4.1 nano, and Grok 3 showed strong results. ByteDance released Seed1.5-VL, a vision-language model with a 532M-parameter vision encoder and a 20B active parameter MoE LLM, achieving state-of-the-art results on 38 public benchmarks. In vision-language, Kling 2.0 leads image-to-video generation, and Gemini 2.5 Pro excels in video understanding with advanced multimodal capabilities. Meta's Vision-Language-Action framework and updates on VLMs for 2025 were also highlighted.
LlamaCon: Meta AI gets into the Llama API platform business
llama-4 qwen3 qwen3-235b-a22b qwen3-30b-a3b qwen3-4b qwen2-5-72b-instruct o3-mini meta-ai-fair cerebras groq alibaba vllm ollama llamaindex hugging-face llama-cpp model-release fine-tuning reinforcement-learning moe multilingual-models model-optimization model-deployment coding benchmarking apache-license reach_vb huybery teortaxestex awnihannun thezachmueller
Meta celebrated progress in the Llama ecosystem at LlamaCon, launching an AI Developer platform with finetuning and fast inference powered by Cerebras and Groq hardware, though it remains waitlisted. Meanwhile, Alibaba released the Qwen3 family of large language models, including two MoE models and six dense models ranging from 0.6B to 235B parameters, with the flagship Qwen3-235B-A22B achieving competitive benchmark results and supporting 119 languages and dialects. The Qwen3 models are optimized for coding and agentic capabilities, are Apache 2.0 licensed, and have broad deployment support including local usage with tools like vLLM, Ollama, and llama.cpp. Community feedback highlights Qwen3's scalable performance and superiority over models like OpenAI's o3-mini.
not much happened today
claude-3.7-sonnet claude-3.7 deepseek-r1 o3-mini deepseek-v3 gemini-2.0-pro gpt-4o qwen2.5-coder-32b-instruct anthropic perplexity-ai amazon google-cloud deepseek_ai coding reasoning model-benchmarking agentic-workflows context-window model-performance open-source moe model-training communication-libraries fp8 nvlink rdma cli-tools skirano omarsar0 reach_vb artificialanlys terryyuezhuo _akhaliq _philschmid catherineols goodside danielhanchen
Claude 3.7 Sonnet demonstrates exceptional coding and reasoning capabilities, outperforming models like DeepSeek R1, O3-mini, and GPT-4o on benchmarks such as SciCode and LiveCodeBench. It is available on platforms including Perplexity Pro, Anthropic, Amazon Bedrock, and Google Cloud, with pricing at $3/$15 per million tokens. Key features include a 64k token thinking mode, 200k context window, and the CLI-based coding assistant Claude Code. Meanwhile, DeepSeek released DeepEP, an open-source communication library optimized for MoE model training and inference with support for NVLink, RDMA, and FP8. These updates highlight advancements in coding AI and efficient model training infrastructure.
not much happened today
o1-preview o1-mini qwen-2.5 gpt-4o deepseek-v2.5 gpt-4-turbo-2024-04-09 grin llama-3-1-405b veo kat openai qwen deepseek-ai microsoft kyutai-labs perplexity-ai together-ai meta-ai-fair google-deepmind hugging-face google anthropic benchmarking math coding instruction-following model-merging model-expressiveness moe voice voice-models generative-video competition open-source model-deployment ai-agents hyung-won-chung noam-brown bindureddy akhaliq karpathy aravsrinivas fchollet cwolferesearch philschmid labenz ylecun
OpenAI's o1-preview and o1-mini models lead benchmarks in Math, Hard Prompts, and Coding. Qwen 2.5 72B model shows strong performance close to GPT-4o. DeepSeek-V2.5 tops Chinese LLMs, rivaling GPT-4-Turbo-2024-04-09. Microsoft's GRIN MoE achieves good results with 6.6B active parameters. Moshi voice model from Kyutai Labs runs locally on Apple Silicon Macs. Perplexity app introduces voice mode with push-to-talk. LlamaCoder by Together.ai uses Llama 3.1 405B for app generation. Google DeepMind's Veo is a new generative video model for YouTube Shorts. The 2024 ARC-AGI competition increases prize money and plans a university tour. A survey on model merging covers 50+ papers for LLM alignment. The Kolmogorov–Arnold Transformer (KAT) paper proposes replacing MLP layers with KAN layers for better expressiveness. Hugging Face Hub integrates with Google Cloud Vertex AI Model Garden for easier open-source model deployment. Agent.ai is introduced as a professional network for AI agents. "Touching grass is all you need."
1/10/2024: All the best papers for AI Engineers
chatgpt gpt-4 dall-e-3 stable-diffusion deepseek-moe openai deepseek-ai prompt-engineering model-release rate-limiting ethics image-generation moe collaborative-workspaces data-privacy abdubs darthgustav
OpenAI launched the GPT Store featuring over 3 million custom versions of ChatGPT accessible to Plus, Team, and Enterprise users, with weekly highlights of impactful GPTs like AllTrails. The new ChatGPT Team plan offers advanced models including GPT-4 and DALL·E 3, alongside collaborative tools and enhanced data privacy. Discussions around AI-generated imagery favored DALL·E and Stable Diffusion, while users faced rate limit challenges and debated the GPT Store's SEO and categorization. Ethical considerations in prompt engineering were raised with a three-layer framework called 'The Sieve'. Additionally, DeepSeek-MoE was noted for its range of Mixture of Experts (MoE) model sizes. "The Sieve," a three-layer ethical framework for AI, was highlighted in prompt engineering discussions.