Frozen AI News archive

Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data

**Tencent** released a notable >300B parameter MoE model pretrained on **7T tokens**, including **1.5T synthetic data** generated via **Evol-Instruct**. The model introduces novel techniques like "recycle routing" and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, **Anthropic** launched **Claude 3.5 Haiku**, now available on multiple platforms, praised for intelligence and speed but criticized for a **10x price increase**. **Meta** opened **Llama AI** to the U.S. defense sector, and a **Llama Impact Hackathon** offers a **$15K prize** for projects using **Llama 3.1 & 3.2 Vision**. **LlamaIndex** released a React chat UI component with Tailwind CSS and LLM backend integrations. The **MLX LM** model advances text generation speed and efficiency with KV cache quantization.

Canonical issue URL

AI News for 11/4/2024-11/5/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (217 channels, and 3533 messages) for you. Estimated reading time saved (at 200wpm): 364 minutes. You can now tag @smol_ai for AINews discussions!

We tend to apply a high bar for Chinese models, especially from previously-unknown teams. But Tencent 's release today (huggingface,paper here, HN comments) is notable in its claims versus known SOTA open-weights models:

image.png

Remarkably for a >300B param model (MoE regardless), it is very data efficient, being pretrained on "only" 7T tokens (DeepseekV2 was 8T, Llama3 was 15T), with 1.5T of them being synthetic data generated via Evol-Instruct, which the Wizard-LM team did not miss:

image.png

image.png

The paper offers decent research detail on some novel approaches they explored, including "recycle routing":

image.png

and expert-specific LRs

image.png

The even investigate and offer a compute-efficient scaling law for MoE active params:

image.png

The story isn't wholly positive: the custom license forbids users in the EU and >100M MAU companies, and of course don't ask them China-sensitive questions. Vibe checks aren't in yet (we don't find anyone hosting an easy public endpoint) but nobody is exactly shouting from the rooftops about it. Still it is a nice piece of research for this model class.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Releases and Updates

AI Tools and Infrastructure

AI Research and Benchmarks

AI Industry Events and Hackathons

AI Pricing and Market Reactions

Memes and Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Tencent's Hunyuan-Large: A Game Changer in Open Source Models

Theme 2. Tensor Parallelism Enhances Llama Models: Benchmark Insights

Theme 3. Competitive Advances in Coding Models: Qwen2.5-Coder Analysis

Theme 4. New AI Tools: Voice Cloning and Speculative Decoding Techniques

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Autonomous Systems & Safety

AI Security & Vulnerabilities

3D Avatar Generation & Rendering

Industry Movements & Corporate AI

AI Image Generation Critique

Memes & Humor


AI Discord Recap

A summary of Summaries of Summaries by O1-preview

Theme 1. AI Giants Drop Mega Models: The New Heavyweights


Theme 2. Defense, Meet AI: LLMs Enlist in National Security


Theme 3. Open Data Bonanza: Datasets Set to Supercharge AI


Theme 4. Users Rage Against the Machines: AI Tools Under Fire


Theme 5. AI Optimization Takes Center Stage: Speed and Efficiency



PART 1: High level Discord summaries

HuggingFace Discord


Perplexity AI Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


Eleuther Discord


Unsloth AI (Daniel Han) Discord


LM Studio Discord


Latent Space Discord


Notebook LM Discord Discord


Stability.ai (Stable Diffusion) Discord


Nous Research AI Discord


Interconnects (Nathan Lambert) Discord


OpenAI Discord


LlamaIndex Discord


Cohere Discord


OpenInterpreter Discord


Modular (Mojo 🔥) Discord


DSPy Discord


OpenAccess AI Collective (axolotl) Discord


Torchtune Discord


tinygrad (George Hotz) Discord


LLM Agents (Berkeley MOOC) Discord


Mozilla AI Discord


Gorilla LLM (Berkeley Function Calling) Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

HuggingFace ▷ #general (1094 messages🔥🔥🔥):

  • AI Model Integration
  • Temperature Settings in LLMs
  • Phonons and Material Science
  • Speculative Decoding
  • Digital Ethnographic Research

Links mentioned:


HuggingFace ▷ #today-im-learning (3 messages):

  • FastBert Tokenizer
  • AutoTokenizer Comparison

HuggingFace ▷ #cool-finds (3 messages):

  • ShellCheck
  • Open Trusted Data Initiative
  • Largest multilingual dataset
  • Aud2Stm2Mdi

Links mentioned:


HuggingFace ▷ #i-made-this (29 messages🔥):

  • Computer Vision Model Quantization
  • Docker Learning Series
  • Music Bot Development
  • Text2Text Model for Summarization
  • Community Feedback Implementation

Links mentioned:


HuggingFace ▷ #reading-group (1 messages):

west_ryder: 😝


HuggingFace ▷ #computer-vision (3 messages):

  • HuggingMod
  • New Microsoft Models

HuggingFace ▷ #NLP (3 messages):

  • Building RAG
  • Chroma Vector Store Issues
  • OpenAI Embeddings
  • Code References

HuggingFace ▷ #diffusion-discussions (1 messages):

  • Diffusion with Categorical Inputs
  • New architectures in Diffusion Models

Perplexity AI ▷ #announcements (1 messages):

  • U.S. Presidential race tracking
  • Election hub

Perplexity AI ▷ #general (364 messages🔥🔥):

  • Opus Removal
  • Perplexity Pro Features
  • Model Comparisons
  • Perplexity Bugs
  • User Feedback

Links mentioned:


Perplexity AI ▷ #sharing (20 messages🔥):

  • Siberian Craters
  • Chemistry Rule Debunked
  • Human Brain on a Chip
  • Nvidia's Market Moves
  • AI's Upcoming Changes

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (1 messages):

canarywolfs: Same here. Filled it a long ago. Even filled it again but nothing...🙁


OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

  • Claude 3.5 Haiku
  • Free Llama 3.2 models
  • PDF functionality in Chatroom
  • Sporadic timeout investigation
  • Predicted output for latency

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (340 messages🔥🔥):

  • Hermes model status
  • Pricing concerns with AI models
  • User experiences with OpenRouter
  • Rate limits and credits
  • Model recommendations for specific use cases

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (4 messages):

  • Custom Provider Beta Keys
  • Accessing BYOK Feature
  • Advantages of Custom Keys

aider (Paul Gauthier) ▷ #announcements (6 messages):

  • Aider v0.62.0
  • Claude 3.5 Haiku Performance
  • ChatGPT/Claude Integration

Links mentioned:


aider (Paul Gauthier) ▷ #general (171 messages🔥🔥):

  • AI Model Comparisons
  • Benchmarking Aider
  • Aider Updates
  • Coding with AI
  • AI Forum and Subreddit Recommendations

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (74 messages🔥🔥):

  • Aider Configuration
  • DeepSeek Model Issues
  • Using Claude Haiku
  • Benchmarking Models

Links mentioned:


Eleuther ▷ #general (5 messages):

  • Local Tests Failures
  • Transformers Bug

Eleuther ▷ #research (130 messages🔥🔥):

  • Running Reading Groups
  • Model Training Techniques
  • Optimization Strategies
  • Logits and Probability Distributions
  • Implementation of Dualizers

Links mentioned:


Eleuther ▷ #interpretability-general (1 messages):

  • W2S Model Files

Link mentioned: GitHub - EleutherAI/w2s: Contribute to EleutherAI/w2s development by creating an account on GitHub.


Eleuther ▷ #lm-thunderdome (10 messages🔥):

  • Control attempts in evaluation
  • LLM Robustness Evaluation PR
  • Inference hanging issue
  • NCCL out of memory error
  • Batch size adjustments

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (100 messages🔥🔥):

  • Python 3.11 Performance
  • Qwen 2.5 Model Support
  • Fine-Tuning LLMs
  • Training Methodologies
  • Unsloth Library Issues

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (12 messages🔥):

  • NVIDIA GeForce RTX Team Survey
  • Spam and Self Promotion Issues
  • Chat Community Dynamics

Link mentioned: 10 Minute Meeting - Asli Sabanci: Hi there!As the NVIDIA GeForce RTX team, we're seeking input from community’s AI enthusiasts to guide the future product direction and roadmap. We'd love to meet some of you with low / no codi...


Unsloth AI (Daniel Han) ▷ #help (25 messages🔥):

  • Fine-tuning on Wikipedia data
  • Model Inference Issues
  • Saving Fine-tuned Models
  • Formatting Training Data
  • Qwen Model Performance

Unsloth AI (Daniel Han) ▷ #community-collaboration (1 messages):

  • mtbench evaluation
  • Hugging Face metrics

LM Studio ▷ #general (67 messages🔥🔥):

  • LM Studio Usage
  • Model Adaptability
  • Server Log Access
  • Portable LM Studio
  • Model Performance Comparison

Links mentioned:


LM Studio ▷ #hardware-discussion (53 messages🔥):

  • Windows Scheduler Performance
  • GPU vs CPU Optimization
  • LLM Context Handling
  • Laptop Cooling Techniques
  • Memory Bandwidth Limitations

Links mentioned:


Latent Space ▷ #ai-general-chat (85 messages🔥🔥):

  • Hume App Launch
  • OpenAI Predicted Outputs
  • Supermemory AI Tool
  • Hunyuan-Large Model Release
  • Defense Llama Announcement

Links mentioned:


Notebook LM Discord ▷ #use-cases (37 messages🔥):

  • YouTube Video Discussions
  • Copyright Concerns for Podcasts
  • Notebook LM Functionalities
  • Vendor Database Management
  • Deepfake Technology

Links mentioned:


Notebook LM Discord ▷ #general (48 messages🔥):

  • NotebookLM Features
  • Language and Localization Issues
  • User Experience Enhancements
  • Collaboration and Sharing Limitations
  • Audio Overview and Podcast Generation

Link mentioned: Culture and Capitalism: The Triumph of Distributism with John Medaille: John Medaille is a former elected official, business owner, and currently is a professor of theology and business ethics join us for a talk on distributism a...


Stability.ai (Stable Diffusion) ▷ #general-chat (71 messages🔥🔥):

  • SWarmUI Installation
  • Cloud Hosting for Stable Diffusion
  • Civitai Models and LoRas
  • Animatediff Tutorials
  • ComfyUI and Video AI Support

Links mentioned:


Nous Research AI ▷ #general (59 messages🔥🔥):

  • Hermes 2.5 Dataset Concerns
  • Closed Source LLMs Discussion
  • Future AI Models and Data Quality
  • TEE Twitter Recovery Updates
  • Open Source Dataset Plans

Link mentioned: Reddit - Dive into anything: no description found


Nous Research AI ▷ #research-papers (1 messages):

adjectiveallison: https://arxiv.org/abs/2411.00715v1

Looks fascinating


Nous Research AI ▷ #interesting-links (3 messages):

  • OmniParser
  • Hertz-Dev
  • Communication Protocols for LLM Agents
  • Agora Protocol

Links mentioned:


Nous Research AI ▷ #research-papers (1 messages):

adjectiveallison: https://arxiv.org/abs/2411.00715v1

Looks fascinating


Interconnects (Nathan Lambert) ▷ #events (1 messages):

  • NeurIPS sponsorship
  • Dinner at NeurIPS

Interconnects (Nathan Lambert) ▷ #news (19 messages🔥):

  • Inference costs pressure
  • Long context breakthroughs
  • Tencent's Model Release
  • Scale AI's Defense LLM
  • Unique Annotation Needs

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (8 messages🔥):

  • LLM performance drift
  • Prompt classifiers
  • Evaluation pipelines
  • ChatGPT tracking
  • Data quality for models

Interconnects (Nathan Lambert) ▷ #ml-drama (3 messages):

  • Internal GPU drama
  • V100s access

Interconnects (Nathan Lambert) ▷ #random (19 messages🔥):

  • Subscriber verification
  • Tulu 3 project
  • Transformer architecture insights
  • Classmate engagement in AI discussions
  • Discord applications for verification

Link mentioned: Tweet from Felix Hill (@FelixHill84): In a 96-layer transformer like ChatGPT, thanks to skip connections, the 10th layer can interact directly with the first layer. This means that if the 10th layer is sufficiently high up the start to ...


Interconnects (Nathan Lambert) ▷ #memes (10 messages🔥):

  • YOLOv3 Paper
  • Claude's System Prompt Critique
  • AI Writing Code
  • OpenAI CEO Discussion
  • Political Reactions to Biden

Links mentioned:


OpenAI ▷ #ai-discussions (21 messages🔥):

  • GPT-4o Rollout
  • OpenAI and AGI
  • Text Extraction Tools Feedback
  • Election Season Model Releases
  • Investment in AI Development

OpenAI ▷ #gpt-4-discussions (14 messages🔥):

  • GPT-5 announcement
  • Issues with Premium accounts
  • Custom GPT configuration
  • Hallucinations in summarization
  • Human oversight in AI workflows

OpenAI ▷ #prompt-engineering (4 messages):

  • Perfect Prompts
  • Using Summaries for Context
  • Model Interaction

OpenAI ▷ #api-discussions (4 messages):

  • Effective Prompting Strategy
  • Summary for Context

LlamaIndex ▷ #blog (4 messages):

  • LlamaIndex chat-ui
  • Advanced report generation
  • NVIDIA competition

Link mentioned: NVIDIA and LlamaIndex Developer Contest: Stand a chance to win cash prizes, a GeForce RTX GPU, and more.


LlamaIndex ▷ #general (38 messages🔥):

  • LlamaIndex PR Review
  • LlamaParse Capabilities
  • Multi-Modal Integration with Cohere
  • ReAct Agent System Prompts
  • Annotations and Citations in LlamaIndex

Links mentioned:


Cohere ▷ #discussions (10 messages🔥):

  • Connectors Issues
  • Search Functionality

Cohere ▷ #questions (13 messages🔥):

  • Cohere API trial fine-tuning
  • Issues with connectors
  • Re-creating prompt tuner on Wordpress
  • Using embed model in software testing
  • GCP Marketplace billing questions

Link mentioned: Login | Cohere: Login for access to advanced Large Language Models and NLP tools through one easy-to-use API.


Cohere ▷ #api-discussions (7 messages):

  • API 500 errors
  • Fine-tuned classify model issues
  • Playground model functionality
  • Troubleshooting assistance

OpenInterpreter ▷ #general (19 messages🔥):

  • Upcoming House Party Announcement
  • Integration with Microsoft's Omniparser
  • Claude's Computer Use Integration
  • Standards for Agents
  • Haiku Performance in OpenInterpreter

OpenInterpreter ▷ #O1 (1 messages):

zer0blanks.: https://www.tiktok.com/t/ZTFckAFHR/


OpenInterpreter ▷ #ai-content (1 messages):

  • Tool Use Package
  • New AI Tools
  • GitHub Repository
  • AI Time Management

Links mentioned:


Modular (Mojo 🔥) ▷ #general (1 messages):

  • Community Meeting
  • Submit Questions
  • Project Proposals

Link mentioned: Modular Community Q&A: no description found


Modular (Mojo 🔥) ▷ #mojo (14 messages🔥):

  • Mojo effect system
  • Matrix multiplication errors
  • Matmul kernel performance
  • Bounds checking in Mojo
  • Stack allocation for C_buffer

Links mentioned:


DSPy ▷ #show-and-tell (1 messages):

  • Election Candidate Research Tool

Link mentioned: GitHub - tkellogg/election2024: A script for researching candidates: A script for researching candidates. Contribute to tkellogg/election2024 development by creating an account on GitHub.


DSPy ▷ #general (12 messages🔥):

  • Optimization for Few-Shot Learning
  • VLM support performance
  • Issue with Long Input Handling
  • DSPy Library Usage

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (7 messages):

  • Distributed Training of LLMs
  • Kubernetes for Fault Tolerance
  • Pretraining LLMs
  • Axolotl Resources
  • Meta Llama 3.1 Model

Link mentioned: Fine Tuning Llama 3.1 405B with Axolotl on a Lambda 1-Click Cluster: Personalizing SOTA Open Source AI


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (4 messages):

  • Zero1 Performance
  • Zero2 Issues
  • StreamingDataset PR
  • Code Debugging

OpenAccess AI Collective (axolotl) ▷ #other-llms (1 messages):

  • Firefly Model
  • Mistral Small 22B
  • Creative Writing Tools
  • Content Sensitivity

Link mentioned: invisietch/MiS-Firefly-v0.1-22B · Hugging Face: no description found


Torchtune ▷ #dev (4 messages):

  • DistiLLM Teacher Probability Discussion
  • KD-div vs Cross-Entropy Clarification

Link mentioned: Issues · jongwooko/distillm: Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024) - Issues · jongwooko/distillm


Torchtune ▷ #papers (1 messages):

  • TPO
  • VinePPO
  • Reasoning and Alignment

tinygrad (George Hotz) ▷ #general (1 messages):

  • TokenFormer port to tinygrad

Link mentioned: GitHub - kroggen/tokenformer-minimal at tinygrad: Minimal implementation of TokenFormer for inference and learning - GitHub - kroggen/tokenformer-minimal at tinygrad


tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):

  • Dependency resolution in views
  • Hailo reverse engineering
  • Kernel consistency in tinygrad

LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

  • Lecture 9
  • Project GR00T
  • Jim Fan
  • GEAR at NVIDIA
  • Course Resources

Link mentioned: CS 194/294-196 (LLM Agents) - Lecture 9, Jim Fan: no description found


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (1 messages):

koppu0729: great talk Jim


Mozilla AI ▷ #announcements (1 messages):

  • FOSDEM 2025
  • Mozilla DevRoom
  • Call for Volunteers
  • Talk Proposals

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

  • Benchmarking retrieval-based approaches
  • Function calling definitions
  • Test category functions





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}