All tags
Company: "tencent"
Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params
kimi-k2 kimi-k2-1t deepseek-v3 grok-4 devstral-2507 gpt-4.1 sonnet-4 moonshot-ai alibaba tencent deepseek x-ai mistral-ai weights-biases hugging-face mixture-of-experts model-training model-optimization optimizer benchmarking long-context model-performance open-weights model-release yuchenj_uw andrew_n_carr scaling01 novita_labs teknium1 aravsrinivas mparakhin simonw
Moonshot AI has released Kimi K2, a 1 trillion parameter Mixture-of-Experts model trained on 15.5 trillion tokens using the new MuonClip optimizer, achieving state-of-the-art results on benchmarks like SWE-Bench Verified (65.8%) and TAU2 (58.4%). This model is competitive with GPT-4.1 and Sonnet 4 on non-thinking tasks and is available under an MIT license. Meanwhile, xAI announced Grok-4, noted for its "LEAST censored frontier model" status and strong long-context performance but criticized for rushed post-training. Mistral AI updated its Devstral 2507 models with improved performance and cost efficiency. The community is excited about the potential of the MuonClip optimizer, which may surpass the long-standing AdamW optimizer in machine learning.
SmolLM3: the SOTA 3B reasoning open source LLM
smollm3-3b olmo-3 grok-4 claude-4 claude-4.1 gemini-nano hunyuan-a13b gemini-2.5 gemma-3n qwen2.5-vl-3b huggingface allenai openai anthropic google-deepmind mistral-ai tencent gemini alibaba open-source small-language-models model-releases model-performance benchmarking multimodality context-windows precision-fp8 api batch-processing model-scaling model-architecture licensing ocr elonmusk mervenoyann skirano amandaaskell clementdelangue loubnabenallal1 awnihannun swyx artificialanlys officiallogank osanseviero cognitivecompai aravsrinivas
HuggingFace released SmolLM3-3B, a fully open-source small reasoning model with open pretraining code and data, marking a high point in open source models until Olmo 3 arrives. Grok 4 was launched with mixed reactions, while concerns about Claude 4 nerfs and an imminent Claude 4.1 surfaced. Gemini Nano is now shipping in Chrome 137+, enabling local LLM access for 3.7 billion users. Tencent introduced Hunyuan-A13B, an 80B parameter model with a 256K context window running on a single H200 GPU. The Gemini API added a batch mode with 50% discounts on 2.5 models. MatFormer Lab launched tools for custom-sized Gemma 3n models. Open source OCR models like Nanonets-OCR-s and ChatDOC/OCRFlux-3B derived from Qwen2.5-VL-3B were highlighted, with licensing discussions involving Alibaba.
not much happened today
o3-mini o1-mini llama hunyuan-a13b ernie-4.5 ernie-4.5-21b-a3b qwen3-30b-a3b gemini-2.5-pro meta-ai-fair openai tencent microsoft baidu gemini superintelligence ai-talent job-market open-source-models multimodality mixture-of-experts quantization fp8-training model-benchmarking model-performance model-releases api model-optimization alexandr_wang shengjia_zhao jhyuxm ren_hongyu shuchaobi saranormous teortaxesTex mckbrando yuchenj_uw francoisfleuret quanquangu reach_vb philschmid
Meta has poached top AI talent from OpenAI, including Alexandr Wang joining as Chief AI Officer to work towards superintelligence, signaling a strong push for the next Llama model. The AI job market shows polarization with high demand and compensation for top-tier talent, while credentials like strong GitHub projects gain importance. The WizardLM team moved from Microsoft to Tencent to develop open-source models like Hunyuan-A13B, highlighting shifts in China's AI industry. Rumors suggest OpenAI will release a new open-source model in July, potentially surpassing existing ChatGPT models. Baidu open-sourced multiple variants of its ERNIE 4.5 model series, featuring advanced techniques like 2-bit quantization, MoE router orthogonalization loss, and FP8 training, with models ranging from 0.3B to 424B parameters. Gemini 2.5 Pro returned to the free tier of the Gemini API, enabling developers to explore its features.
not much happened today
gemma-3n hunyuan-a13b flux-1-kontext-dev mercury fineweb2 qwen-vlo o3-mini o4-mini google-deepmind tencent black-forest-labs inception-ai qwen kyutai-labs openai langchain langgraph hugging-face ollama unslothai nvidia amd multimodality mixture-of-experts context-windows tool-use coding image-generation diffusion-models dataset-release multilinguality speech-to-text api prompt-engineering agent-frameworks open-source model-release demishassabis reach_vb tri_dao osanseviero simonw clementdelangue swyx hwchase17 sydneyrunkle
Google released Gemma 3n, a multimodal model for edge devices available in 2B and 4B parameter versions, with support across major frameworks like Transformers and Llama.cpp. Tencent open-sourced Hunyuan-A13B, a Mixture-of-Experts (MoE) model with 80B total parameters and a 256K context window, optimized for tool calling and coding. Black Forest Labs released FLUX.1 Kontext [dev], an open image AI model gaining rapid Hugging Face adoption. Inception AI Labs launched Mercury, the first commercial-scale diffusion LLM for chat. The FineWeb2 multilingual pre-training dataset paper was released, analyzing data quality impacts. The Qwen team released Qwen-VLo, a unified visual understanding and generation model. Kyutai Labs released a top-ranked open-source speech-to-text model running on Macs and iPhones. OpenAI introduced Deep Research API with o3/o4-mini models and open-sourced prompt rewriter methodology, integrated into LangChain and LangGraph. The open-source Gemini CLI gained over 30,000 GitHub stars as an AI terminal agent.
minor ai followups: MultiAgents, Meta-SSI-Scale, Karpathy, AI Engineer
gpt-4o afm-4.5b gemma qwen stt-1b-en_fr stt-2.6b-en hunyuan-3d-2.1 openai meta-ai-fair scale-ai huggingface tencent arcee-ai ai-safety alignment ai-regulation memory-optimization scalable-oversight speech-recognition 3d-generation foundation-models sama polynoamial neelnanda5 teortaxestex yoshua_bengio zachtratar ryanpgreenblatt reach_vb arankomatsuzaki code_star
OpenAI released a paper revealing how training models like GPT-4o on insecure code can cause broad misalignment, drawing reactions from experts like @sama and @polynoamial. California's AI regulation efforts were highlighted by @Yoshua_Bengio emphasizing transparency and whistleblower protections. The term "context rot" was coined to describe LLM conversation degradation, with systems like Embra using CRM-like memory for robustness. Scalable oversight research aiming to improve human control over smarter AIs was discussed by @RyanPGreenblatt. New model releases include Kyutai's speech-to-text models capable of 400 real-time streams on a single H100 GPU, Tencent's Hunyuan 3D 2.1 as the first open-source production-ready PBR 3D generative model, and Arcee's AFM-4.5B foundation model family targeting enterprise use, competitive with Gemma and Qwen.
not much happened today
hunyuan-turbos qwen3-235b-a22b o3 gpt-4.1-nano grok-3 gemini-2.5-pro seed1.5-vl kling-2.0 tencent openai bytedance meta-ai-fair nvidia deepseek benchmarking model-performance moe reasoning vision video-understanding vision-language multimodality model-evaluation model-optimization lmarena_ai artificialanlys gdb _jasonwei iScienceLuvr _akhaliq _philschmid teortaxesTex mervenoyann reach_vb
Tencent's Hunyuan-Turbos has risen to #8 on the LMArena leaderboard, showing strong performance across major categories and significant improvement since February. The Qwen3 model family, especially the Qwen3 235B-A22B (Reasoning) model, is noted for its intelligence and efficient parameter usage. OpenAI introduced HealthBench, a new health evaluation benchmark developed with input from over 250 physicians, where models like o3, GPT-4.1 nano, and Grok 3 showed strong results. ByteDance released Seed1.5-VL, a vision-language model with a 532M-parameter vision encoder and a 20B active parameter MoE LLM, achieving state-of-the-art results on 38 public benchmarks. In vision-language, Kling 2.0 leads image-to-video generation, and Gemini 2.5 Pro excels in video understanding with advanced multimodal capabilities. Meta's Vision-Language-Action framework and updates on VLMs for 2025 were also highlighted.
Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
claude-3.5-haiku llama-3-1 llama-3-2 mlx-lm tencent anthropic meta-ai-fair togethercompute llamaindex mixture-of-experts synthetic-data model-scaling model-architecture model-optimization kv-cache-quantization react fine-tuning scaling-laws model-efficiency model-deployment multimodality
Tencent released a notable >300B parameter MoE model pretrained on 7T tokens, including 1.5T synthetic data generated via Evol-Instruct. The model introduces novel techniques like "recycle routing" and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, Anthropic launched Claude 3.5 Haiku, now available on multiple platforms, praised for intelligence and speed but criticized for a 10x price increase. Meta opened Llama AI to the U.S. defense sector, and a Llama Impact Hackathon offers a $15K prize for projects using Llama 3.1 & 3.2 Vision. LlamaIndex released a React chat UI component with Tailwind CSS and LLM backend integrations. The MLX LM model advances text generation speed and efficiency with KV cache quantization.
not much happened today
claudette llama-3-1 yi-lightning gpt-4o claude-3.5-sonnet answer-ai tencent notebooklm motherduck perplexity dropbox openai meta-ai-fair yi-ai zyphra-ai anthropic langchain openai synthetic-data fine-tuning sql audio-processing on-device-ai dataset-release transformer llm-reasoning ai-safety code-generation ai-pricing ai-job-market fchollet aravsrinivas svpino swyx
Answer.ai launched fastdata, a synthetic data generation library using
claudette
and Tencent's Billion Persona paper. NotebookLM became customizable, and Motherduck introduced notable LLMs in SQL implementations. Perplexity and Dropbox announced competitors to Glean. OpenAI unveiled audio chat completions priced at 24 cents per minute. Meta AI released Llama 3.1, powering Lenovo AI Now's on-device agent. Yi-Lightning model ranked #6 globally, surpassing GPT-4o. Zyphra AI released the large Zyda-2 dataset with 5 trillion tokens. François Chollet clarified transformer architecture as set-processing, not sequence-processing. Research suggests memorization aids LLM reasoning. Anthropic updated its Responsible Scaling Policy for AI safety. Tools like Perplexity Finance, Open Canvas by LangChain, and AlphaCodium code generation tool were highlighted. Approximately $500 million was raised for AI agent startups, with ongoing discussions on AI's job market impact. Combining prompt caching with the Batches API can yield a 95% discount on Claude 3.5 Sonnet tokens. a quiet weekend
o1 datagemma aloha demostart firefly-ai-video-model pixtral-12b gamegen-o openai google-deepmind adobe mistral-ai tencent supermaven 11x cohere anthropic latent-space-university stanford microsoft mila notre-dame reinforcement-learning chain-of-thought reasoning robotics diffusion-models multimodality video-generation model-training reflection-tuning mathematical-reasoning model-benchmarking fine-tuning george-hotz terence-tao adcock_brett rohanpaul_ai bindureddy fchollet philschmid
OpenAI released the new o1 model, leveraging reinforcement learning and chain-of-thought prompting to excel in reasoning benchmarks, achieving an IQ-like score of 120. Google DeepMind introduced DataGemma to reduce hallucinations by connecting LLMs with real-world data, and unveiled ALOHA and DemoStart for robot dexterity using diffusion methods. Adobe previewed its Firefly AI Video Model with text-to-video and generative extend features. Mistral launched the multimodal Pixtral 12B model, and Tencent presented the GameGen-O open-world video game generation model. Several research papers from Stanford, OpenAI, Microsoft, Mila, and Notre Dame focus on advanced reasoning, self-verification, and reflection tuning techniques. Experts like Terence Tao and George Hotz have shared mixed but optimistic views on o1's capabilities. Seed funding rounds include Supermaven ($12M) and 11x ($24M).
Microsoft AgentInstruct + Orca 3
mistral-7b orca-2.5 microsoft-research apple tencent hugging-face synthetic-data fine-tuning instruction-following transformers model-performance hallucination-detection dataset-quality flashattention mixture-of-experts philschmid sama bindureddy rohanpaul_ai zachtratar dair_ai
Microsoft Research released AgentInstruct, the third paper in its Orca series, introducing a generative teaching pipeline that produces 25.8 million synthetic instructions to fine-tune mistral-7b, achieving significant performance gains: +40% AGIEval, +19% MMLU, +54% GSM8K, +38% BBH, +45% AlpacaEval, and a 31.34% reduction in hallucinations. This synthetic data approach follows the success of FineWeb and Apple's Rephrasing research in improving dataset quality. Additionally, Tencent claims to have generated 1 billion diverse personas for synthetic data. On AI Twitter, notable discussions included a shooting incident at a Trump rally and recent ML research highlights such as FlashAttention-3, RankRAG, and Mixture of A Million Experts.