Frozen AI News archive

not much happened today

**Google DeepMind** released **EmbeddingGemma (308M)**, a small multilingual embedding model optimized for on-device retrieval-augmented generation and semantic search, supporting over 100 languages and running efficiently with quantization and EdgeTPU latency under 15ms. **Jina AI** introduced new code-focused embedding models (0.5B/1.5B) with GGUF quantization, achieving state-of-the-art retrieval across multiple languages and tasks. **LightOn** demonstrated large-scale retrieval training without distillation using contrastive training on billions of passages. **Hugging Face** released the **FineVision** dataset with 17.3M images and 9.5B answer tokens for vision-language model training, showing significant benchmark improvements. The **MiniCPM-V 4.5 (8B)** multimodal model reported surpassing **GPT-4o** and **Gemini-2.0 Pro** on OpenCompass benchmarks with innovative video token compression. Microsoft’s **VibeVoice TTS** and Stanford’s Mixture-of-Contexts video generation also featured. Additionally, a Stanford study benchmarked optimizers like Muon, Soap, Mars, and Sophia, finding diminishing speedups over AdamW at larger scales but advantages at smaller scales. The new ChatGPT branching feature was noted for its simplicity and popularity. *"Everyone's a decacorn now."*

Canonical issue URL

everyone's a decacorn now.

AI News for 9/4/2025-9/5/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (186 channels, and 4350 messages) for you. Estimated reading time saved (at 200wpm): 324 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Congrats to Sierra on becoming the latest ~~Decagon~~ I mean, Decacorn.

Also the new ChatGPT branching feature was remarkably popular for the probable ~100 LOC it took to implement it (with the Responses API)


AI Twitter Recap

Embeddings on-device and retrieval stack updates

Vision-language data and multimodal models

Optimizers, internal metrics, and training recipes

Agent systems, runtimes, and tooling

Product rollouts and ecosystem

Top tweets (by engagement)


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Microsoft VibeVoice Repo Takedown & ComfyUI Integration

2. EmbeddingGemma 300M Launch + HF Science AMA/FineVision

3. Local AI Ops: 5070 Ti Super VRAM Rigs & Ollama Exposure PSA

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Nano Banana & Veo3 Visual Gen Demos and Workflows

2. Meta Superintelligence, Sutskever ‘breakthrough’ and GPT‑6 Rumors

3. AI Hallucination in Court + ChatGPT Community Experiments


AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Low-Bit Training, Triton Changes, and GPU Perf Playbook

2. Agent Tooling Goes Real: ACK-Lab Wallets, DSPy Momentum

3. Multimodal & On-Device: smolVLM2, LFM2, EmbeddingGemma

4. Hardware Shakeups: Huawei Ternary Compute and AI SSD, Builder GPU Choices


Discord: High level Discord summaries

Perplexity AI Discord


LMArena Discord


Eleuther Discord


Cursor Community Discord


Nous Research AI Discord


OpenRouter Discord


HuggingFace Discord


Yannick Kilcher Discord


LM Studio Discord


GPU MODE Discord


DSPy Discord


Moonshot AI (Kimi K-2) Discord


OpenAI Discord


Modular (Mojo 🔥) Discord


Manus.im Discord Discord


tinygrad (George Hotz) Discord


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1200 messages🔥🔥🔥):

Comet Browser, Perplexity AI Pro, Model Selection, User Support


Perplexity AI ▷ #sharing (7 messages):

Shareable Threads on Perplexity AI, Perplexity AI Browser Claims


Perplexity AI ▷ #pplx-api (3 messages):

Pro Account Issue, New Endpoint, Contact Support


LMArena ▷ #general (586 messages🔥🔥🔥):

LM Arena Outages, Web Scraping, LM Arena Models, Qwen3, Image generation Aspect Ratio


Eleuther ▷ #general (189 messages🔥🔥):

Typing Protocol vs Mixin Classes, Mech Interp Research, Hierarchical Nature of HRM, OOD Iteration Extrapolation, Error Correction in UTs


Eleuther ▷ #research (50 messages🔥):

Entropy rate of natural languages, Continual Learning, QK-Norm Optimizer, Curriculum Learning, mup implementations


Eleuther ▷ #multimodal-general (5 messages):

Multimodal Common Pile, Audio/Music Datasets, Ethical concerns with Speech and Images, Openly Licensed Music Dataset


Cursor Community ▷ #general (196 messages🔥🔥):

GPT-5 vs Claude 4, Cursor Slow Performance, VSCode extension for Cursor, Subagents in Cursor, Token Usage and Cost


Nous Research AI ▷ #general (114 messages🔥🔥):

N8N, AO3, Huawei ternary logic compute, ack-lab, Photonic chips


Nous Research AI ▷ #ask-about-llms (1 messages):

Hermes 4 Limitations, Model Hallucinations


Nous Research AI ▷ #research-papers (3 messages):

Fine-tuning Auto-Regressive Models, BOS Token Usage in LLMs, MCQ Classifier Training


Nous Research AI ▷ #interesting-links (2 messages):

PotatoLM, FineVision


Nous Research AI ▷ #research-papers (3 messages):

Fine-tuning auto-regressive models, BOS token usage in LLMs, MCQ classifier training


OpenRouter ▷ #announcements (1 messages):

toven: The promotional free period for Gemini 2.5 Flash Image has now ended.


OpenRouter ▷ #general (108 messages🔥🔥):

Gemini 2.5 Flash Image Restrictions, DeepInfra's Gemini 2.5 Pricing, OpenRouter API Key Exposure, Kimi K2 Model, Prompt Caching Benefits


OpenRouter ▷ #discussion (4 messages):

Deepseek AI Agent, R2 never


HuggingFace ▷ #general (105 messages🔥🔥):

Ollama debacles, Quantized Model Deployment, Fine-tuning Vision Models, Liquid Foundation Models (LFM2), Discord bot vision integration


HuggingFace ▷ #i-made-this (1 messages):

tonic_1: https://huggingface.co/posts/Tonic/941120780247130


HuggingFace ▷ #agents-course (1 messages):

marc_28459: Beginning the agents course today! Hello from Philadelphia everyone!


Yannick Kilcher ▷ #general (90 messages🔥🔥):

Kickstarter governance, Continual learning, True Online Learning, Adaptive Resonance Theory (ART), i.i.d. sampling vs online learning


Yannick Kilcher ▷ #paper-discussion (2 messages):

Unitary Transforms, SVD Matrix Decomposition


Yannick Kilcher ▷ #ml-news (9 messages🔥):

Huawei AI SSD, Computational Storage, EmbeddingGemma, SD card FPGA redneck AI


LM Studio ▷ #general (46 messages🔥):

LM Studio efficiency, 70B model loading issues, Qwen-30-a3b recommendation, Agent tool with sub-agent support, Comet browser review


LM Studio ▷ #hardware-discussion (44 messages🔥):

Mi50 vs 3090, 3090 vs 7900 XTX, GPT-OSS Performance, Old Nvidia Cards


GPU MODE ▷ #general (1 messages):

Expert Parallelism, Kimi K2 Paper, All-to-all latency, Bandwidth Optimization


GPU MODE ▷ #triton (1 messages):

Meetup Video, Whitney Tsang, Triton Channel


GPU MODE ▷ #cuda (5 messages):

Shared Memory Addressing, fp4 and fp8 packing, Modal GPU Glossary


GPU MODE ▷ #jobs (1 messages):

Ailinia, ML Engineer


GPU MODE ▷ #beginner (5 messages):

Resume feedback for RTL/digital logic design roles


GPU MODE ▷ #torchao (1 messages):

torchao v0.13.0, QAT improvements, NVFP4 and FP8 QAT, MXFP8 pretraining speedups, axolotl integration


GPU MODE ▷ #🍿 (1 messages):

LLM Generated Kernels, Nano GPT, PyTorch Ops


GPU MODE ▷ #submissions (22 messages🔥):

MI300x8 Leaderboard Updates, AMD all2all benchmarks, µs performance achieved


GPU MODE ▷ #amd-competition (12 messages🔥):

MoE config limits, Random seed PR impact on num_tokens, Max comm bdw impact on pipeline design, Debugging unspecified bugs, Hyperparameter settings visibility


GPU MODE ▷ #cutlass (2 messages):

cutlass_profiler, H100, CUTLASS_NVCC_ARCHS, CUTLASS_LIBRARY_KERNELS, CUTLASS_LIBRARY_OPERATIONS


GPU MODE ▷ #low-bit-training (18 messages🔥):

torch.compile reduce-overhead, sequence packing using flash_atnn, MXFP8 dot product in Triton, GemLite, torchao's FP8 transformation


DSPy ▷ #papers (1 messages):

DSPy Hallucinations, HallBayes


DSPy ▷ #general (48 messages🔥):

DSPy's Opinionated Paradigm, GEPA Optimizer, MIPROv2 Example, Prompt Optimization


Moonshot AI (Kimi K-2) ▷ #general-chat (47 messages🔥):

Twitter account suspension, Pricing plans for Kimi AI, PPTX Slides with Kimi, CCP affiliations and Moonshot AI, Kimi K2 temperature


OpenAI ▷ #ai-discussions (29 messages🔥):

AI Agents vs Workflows, Chinese AI Development, AI Safety, Free AI Options, LLMA 3.2


OpenAI ▷ #gpt-4-discussions (1 messages):

smirsonianahmadi10100: Hello


OpenAI ▷ #prompt-engineering (3 messages):

Token IDs, GPT5, Custom Settings


OpenAI ▷ #api-discussions (3 messages):

Token IDs, Custom Settings, GPT5


Modular (Mojo 🔥) ▷ #mojo (21 messages🔥):

Networking libraries in stdlib, AI inference over network, HTTP in AI clusters, DPDK and Mojo, Lightbug limitations


Modular (Mojo 🔥) ▷ #max (1 messages):

Shape Recompilation, Dynamic Tensors


Manus.im Discord ▷ #general (4 messages):

Scheduled task errors, Support ticket updates


tinygrad (George Hotz) ▷ #announcements (1 messages):

Tinybox Pricing, Tinybox New Colors, Tinybox Act Fast