Frozen AI News archive

Titans: Learning to Memorize at Test Time

**Google** released a new paper on "Neural Memory" integrating persistent memory directly into transformer architectures at test time, showing promising long-context utilization. **MiniMax-01** by @omarsar0 features a **4 million token context window** with **456B parameters** and **32 experts**, outperforming **GPT-4o** and **Claude-3.5-Sonnet**. **InternLM3-8B-Instruct** is an open-source model trained on **4 trillion tokens** with state-of-the-art results. **Transformer²** introduces self-adaptive LLMs that dynamically adjust weights for continuous adaptation. Advances in AI security highlight the need for **agent authentication**, **prompt injection** defenses, and **zero-trust architectures**. Tools like **Micro Diffusion** enable budget-friendly diffusion model training, while **LeagueGraph** and **Agent Recipes** support open-source social media agents.

Canonical issue URL

AI News for 1/14/2025-1/15/2025. We checked 7 subreddits, 433 Twitters and 32 Discords (219 channels, and 2812 messages) for you. Estimated reading time saved (at 200wpm): 327 minutes. You can now tag @smol_ai for AINews discussions!

Lots of people are buzzing about the latest Google paper, hailed by yappers as "Transformers 2.0" (arxiv, tweet):

image.png

It seems to fold persistent memory right into the architecture at "test time" rather than outside of it (this is one of three variants as context, head, or layer).

image.png

The paper notably uses a surprisal measure to update its memory:

image.png

and models forgetting by weight decay

image.png

The net result shows very promising context utilization over long contexts.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Models and Scaling

AI Applications and Tools

AI Security and Ethical Concerns

AI in Education and Hiring

AI Integration in Software Engineering

Politics and AI Regulations

Memes/Humor


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. InternLM3-8B outperforms Llama3.1-8B and Qwen2.5-7B

Theme 2. OpenRouter gets new features and community-driven improvements

Theme 3. Kiln as an Open Source Alternative to Google AI Studio Gains Traction

Theme 4. OuteTTS 0.3 introduces new 1B and 500M language models

Theme 5. 405B MiniMax MoE: Breakthrough in context length and efficiency

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Transformer²: Enhancing Real-Time LLM Adaptability

Theme 2. Deep Learning Revolutionizing Predictive Healthcare


AI Discord Recap

A summary of Summaries of Summaries by o1-preview-2024-09-12

Theme 1: AI Model Performance Stumbles Across Platforms

Theme 2: New AI Models Break Context Barriers

Theme 3: Legal Woes Hit AI Datasets and Developers

Theme 4: Advancements and Debates in AI Training

Theme 5: Industry Moves Shake Up the AI Landscape


PART 1: High level Discord summaries

Cursor IDE Discord


Perplexity AI Discord


Codeium (Windsurf) Discord


Unsloth AI (Daniel Han) Discord


Stability.ai (Stable Diffusion) Discord


Interconnects (Nathan Lambert) Discord


Nous Research AI Discord


Stackblitz (Bolt.new) Discord


Notebook LM Discord Discord


Cohere Discord


OpenRouter (Alex Atallah) Discord


OpenAI Discord


LM Studio Discord


aider (Paul Gauthier) Discord


Modular (Mojo 🔥) Discord


Eleuther Discord


Latent Space Discord


GPU MODE Discord


LlamaIndex Discord


OpenInterpreter Discord


LAION Discord


AI21 Labs (Jamba) Discord


MLOps @Chipro Discord


Nomic.ai (GPT4All) Discord


DSPy Discord


The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Axolotl AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor IDE ▷ #general (568 messages🔥🔥🔥):

Cursor performance issues, Slow requests, Deployment methods, Model comparisons, Sora AI usage

Links mentioned:


Perplexity AI ▷ #general (274 messages🔥🔥):

Perplexity outages, Performance of AI models, Integration with IDEs, Citations issues, User experiences with AI models

Links mentioned:


Perplexity AI ▷ #sharing (7 messages):

iPhone Air Rumors, JavaScript Trademark Battle, US AI-Export Rules, Perplexity AI Apps Confusion


Perplexity AI ▷ #pplx-api (1 messages):

llama-3.1-sonar-large-128k-online output speed


Codeium (Windsurf) ▷ #announcements (2 messages):

Windsurf Command Tutorial, Discord Challenge Winners, Student Discount Pricing, Windsurf Editor Launch, Codeium vs GitHub Copilot

Links mentioned:


Codeium (Windsurf) ▷ #discussion (88 messages🔥🔥):

Codeium Telemetry Issue, Codeium Subscription Plans, Student Discounts, Remote Repository Utilization, Codeium Installation Problems

Links mentioned:


Codeium (Windsurf) ▷ #windsurf (191 messages🔥🔥):

Windsurf IDE issues, Discounts and Pricing, User Experiences with Cascade, C# variable type analysis, Integration of AI models

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (113 messages🔥🔥):

GPU Usage in Unsloth, Fine-tuning Misconceptions, Model Training in Notebooks, Collaboration with Kaggle, Using Unsloth for Web Scraping

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (16 messages🔥):

QA training techniques, Model performance issues, Fine-tuning models, MLX framework, Ollama compatibility

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (141 messages🔥🔥):

Training Issues with Phi-4, Fine-tuning Llamas, Using WSL for AI Development, Dynamic Quantization in Models, Conda Installation on Windows

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (9 messages🔥):

Grokking phenomenon, LLM training methods, Security conference submissions, Research papers and resources, Grokking video sequel

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (206 messages🔥🔥):

AI Image Generation, Fake Cryptocurrency Launches, Model Metrics, ComfyUI and Stable Diffusion, Sharing Generated Images

Links mentioned:


Interconnects (Nathan Lambert) ▷ #events (3 messages):

Agent Identity Hackathon, Mixer Event, Xeno Grant

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (90 messages🔥🔥):

Model Performance Issues, Program Synthesis Focus, Cerebras Yield Solutions, Contextual AI Platform Launch, LLM Language Understanding

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (19 messages🔥):

ReaderLM-2 Capabilities, MATH Dataset DMCA Issue, AMD GPU Sponsorship Ideas, Tensorwave MI300X Launch, AoPS Exclusivity Concerns

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (2 messages):

Non-reasoning models, GPT-4o


Interconnects (Nathan Lambert) ▷ #cv (7 messages):

Multimodal Visualization-of-Thought (MVoT), Chain-of-Thought (CoT) prompting, Mind's Eye paradigm, Simulation in AI reasoning, Grounding language models

Links mentioned:


Interconnects (Nathan Lambert) ▷ #reads (47 messages🔥):

AI Relationships, Population Decline Implications, Automation of Jobs, AI and Social Movements, Challenges with LLMs

Links mentioned:


Interconnects (Nathan Lambert) ▷ #posts (10 messages🔥):

Voiceover Techniques, Meta Ray-Bans, Project Aria, Content Creation, User Experiences

Link mentioned: Introducing Project Aria, from Meta: Project Aria is a research program from Meta, to help build the future responsibly. Project Aria unlocks new possibilities of how we connect with and experience the world.


Interconnects (Nathan Lambert) ▷ #policy (2 messages):

TSMC and Samsung, US-China chip flow restrictions

Links mentioned:


Nous Research AI ▷ #general (125 messages🔥🔥):

Claude's Persona, Fine-Tuning Techniques, LLM Dataset Quality, Nous Research Funding, Hackathon Event

Links mentioned:


Nous Research AI ▷ #ask-about-llms (25 messages🔥):

Gemini for data extraction, Grokking paper insights, Ortho Grad and GrokAdamW merger, Stablemax function issues

Links mentioned:


Nous Research AI ▷ #interesting-links (7 messages):

Grokking phenomenon, Grokfast optimizer, Orthograd optimizer, Coconut GitHub project

Links mentioned:


Stackblitz (Bolt.new) ▷ #announcements (1 messages):

Bolt update, Project title editing

Link mentioned: Tweet from StackBlitz (@stackblitz): 📢 Fresh Bolt update:You can change the project title now — making it easier to find on the projects list!


Stackblitz (Bolt.new) ▷ #prompting (14 messages🔥):

GA4 API integration issues, Firebase data loading technique, Context file creation for Bolt, Token usage optimization, Chat history snapshot system

Link mentioned: feat: restoring project from snapshot on reload by thecodacus · Pull Request #444 · stackblitz-labs/bolt.diy: Add Chat History Snapshot SystemOverviewThis PR introduces a snapshot system for chat history, allowing the restoration of previous chat states along with their associated file system state. This...


Stackblitz (Bolt.new) ▷ #discussions (135 messages🔥🔥):

Token Usage Issues, Bolt Features and Updates, Database Integration with Supabase, Chat Session Management, Error Handling in Bolt

Links mentioned:


Notebook LM Discord ▷ #use-cases (16 messages🔥):

AI-generated scripts, Worldbuilding assistance, Podcast tone issues, Forum-based information extraction, Novel author discussions

Links mentioned:


Notebook LM Discord ▷ #general (88 messages🔥🔥):

NotebookLM Plus Features, API and Bulk Sync Features, Using YouTube as a Source, Limitations of Text Uploads, Scraping Websites for Content

Links mentioned:


Cohere ▷ #discussions (34 messages🔥):

Discord bot API usage, Inefficiency in project, LLM interface and APIs, Payment issues with production key, Learning resources for programming


Cohere ▷ #questions (25 messages🔥):

Context length of 128k tokens, Cohere API key limits, Rerank model performance decline, Audio noise reduction recommendations, Updating Command R models

Links mentioned:


Cohere ▷ #api-discussions (10 messages🔥):

Cohere Client Initialization Error, Payment Method Issues for Production Key, Use of Cohere ClientV2


Cohere ▷ #cmd-r-bot (13 messages🔥):

Cohere Bot Interactions, Counting to 10, Searching Documentation


Cohere ▷ #cohere-toolkit (6 messages):

Moderator Recruitment, Community Contributions


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Mistral coding model, Minimax-01 release

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (80 messages🔥🔥):

DeepSeek API issues, Token limit inconsistencies, Provider performance, Model removal, Prompt caching

Links mentioned:


OpenAI ▷ #ai-discussions (62 messages🔥🔥):

AI and Neural Feedback Loop, ISO 42001 Certification for AI, AI Limitations in Conversation Context, ChatGPT's API and Performance Discrepancies, Generative AI Hype and Market Reality

Links mentioned:


OpenAI ▷ #gpt-4-discussions (6 messages):

Custom GPT uploads, GPT-4o Tasks, Canvas feature on web, Model limitations


OpenAI ▷ #prompt-engineering (5 messages):

Assistants referencing docs, API related questions, Humanization prompt, Prompt engineering, Scripting support


OpenAI ▷ #api-discussions (5 messages):

Assistants referencing docs, API concerns in Playground, Humanization prompt for AI


LM Studio ▷ #general (66 messages🔥🔥):

Model Fine-Tuning Techniques, User and Assistant Modes, Context Window and Memory Usage, Model Loading Issues, Image Analysis with Models


LM Studio ▷ #hardware-discussion (11 messages🔥):

Comparative Inference Speeds, GPU Parallelization Limitations, Power Efficiency of GPUs, Model Layer Distribution


aider (Paul Gauthier) ▷ #general (36 messages🔥):

DeepSeek performance issues, Aider with code editing, Using GPUs for model execution, AI content clickbait discussions, GitHub bots integration

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (22 messages🔥):

Repository-map size concerns, Agentic tools for code exploration, Aider API call logging, Git commit issues with Aider, o1-preview API performance

Link mentioned: docs: Add architect mode section to edit errors troubleshooting guide by golergka · Pull Request #2877 · Aider-AI/aider: no description found


aider (Paul Gauthier) ▷ #links (2 messages):

Repomix, AI-friendly codebase packaging, Repopack, API call optimization

Link mentioned: Tweet from Repomix: Pack your codebase into AI-friendly formats


Modular (Mojo 🔥) ▷ #general (4 messages):

Docs font weight update, User feedback on readability


Modular (Mojo 🔥) ▷ #mojo (41 messages🔥):

Function Support in Mojo, Zed Preview Extension, SIMD Performance, Recursive Types in Mojo

Links mentioned:


Eleuther ▷ #general (2 messages):

Fateful Ping, AI Ranking


Eleuther ▷ #research (29 messages🔥):

Critical Tokens in LLMs, VinePPO and CoT Trajectories, NanoGPT Speedrun Record, TruthfulQA Dataset Weaknesses, Human Annotation Issues in Datasets

Links mentioned:


Eleuther ▷ #scaling-laws (5 messages):

Loss vs Compute Plot, Induction Head Behavior, Circuit Interoperability, Pythia Models Training


Eleuther ▷ #lm-thunderdome (3 messages):

MATH Dataset DMCA, AOps Disclosure

Link mentioned: Tweet from Tom Adamczewski (@tmkadamcz): Hendrycks MATH has just been hit with a DMCA takedown notice. The dataset is currently disabled.https://huggingface.co/datasets/hendrycks/competition_math/discussions/5


Eleuther ▷ #gpt-neox-dev (6 messages):

NeoX model conversion, Learning ranks attribute, Intermediate size configuration, Layer masking issues, Zero stages incompatibility

Links mentioned:


Latent Space ▷ #ai-general-chat (40 messages🔥):

Cursor AI funding, Transformer² adaptive models, AI tutoring impact in Nigeria, OpenBMB MiniCPM-o 2.6 model, Synthetic data generation with Curator

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

NVIDIA Cosmos, CES presentation, LLM Paper Club

Link mentioned: LLM Paper Club (NVIDIA Cosmos) · Zoom · Luma: Ethan He is back to share the latest from Nvidia CES: the Cosmos World Foundation Models:https://github.com/NVIDIA/Cosmos---we need YOU to volunteer to do…


GPU MODE ▷ #triton (6 messages):

Triton dependency on Torch, cuBLAS equivalent for Triton, Triton pointer type error

Link mentioned: Redirecting to https://triton-lang.org/main/index.html: no description found


GPU MODE ▷ #cuda (3 messages):

RTX 50x Blackwell cards, Hopper TMA features


GPU MODE ▷ #torch (2 messages):

Batching Tensors to GPU, Torch Compiler Material


GPU MODE ▷ #algorithms (2 messages):

MiniMax-01, Lightning Attention Architecture, Open-Source Model Release, Ultra-Long Context Processing, Cost-Effective AI Solutions

Link mentioned: Tweet from MiniMax (official) (@MiniMax__AI): MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent EraWe are thrilled to introduce our latest open-source models: the foundational language model MiniMax-Text-01 and the visua...


GPU MODE ▷ #cool-links (3 messages):

Training with bfloat16, GPU animation insights, GPT-3 architecture access

Link mentioned: Tweet from Fleetwood (@fleetwood___): Everyone working with GPUs needs to internalise this.A100 version of @vrushankdes's animation


GPU MODE ▷ #beginner (2 messages):

LLM inference with single GPU, LLM inference with multiple GPUs, VRAM requirements for serving multiple requests, Batch processing requests, Parallelism strategies for multi-GPU setups


GPU MODE ▷ #metal (2 messages):

MPS Kernel Profiling, Debugging GPU Trace, Metal Compiler Flags


GPU MODE ▷ #self-promotion (1 messages):

Thunder Compute, Cloud GPU Pricing, Y Combinator, Instance Management Tool

Link mentioned: Thunder Compute: Low-cost GPUS for anything AI/ML: Train, fine-tune, and deploy models on Thunder Compute. Get started with $20/month of free credit.


GPU MODE ▷ #🍿 (12 messages🔥):

Modal Registry for Popcorn Bot, GPU Type Handling, Creating Modal Functions, Using nvidia-smi for GPU Capability, Discord Leaderboard Implementation

Link mentioned: What utility/binary can I call to determine an nVIDIA GPU's Compute Capability?: Suppose I have a system with a single GPU installed, and suppose I've also installed a recent version of CUDA. I want to determine what's the compute capability of my GPU. If I coul...


GPU MODE ▷ #thunderkittens (6 messages):

Onboarding Documentation, Kernel Options, Linking Resources, Manual Creation

Link mentioned: TK onboarding: Summary This document specifies how to get started on programming kernels using TK. Please feel free to leave comments on this document for areas of improvement / missing information. Summary 1 Back...


LlamaIndex ▷ #blog (3 messages):

RAG applications with LlamaParse, Improving knowledge graphs with LlamaIndex workflows, LlamaIndex and Vellum AI partnership


LlamaIndex ▷ #general (23 messages🔥):

XHTML to PDF conversion utilities, Choosing a vector database, Workflow design with HITL steps, Difference between agents and workflows, User sign-up issues

Links mentioned:


OpenInterpreter ▷ #general (12 messages🔥):

OpenInterpreter 1.0 capabilities, Bora's Law on intelligence, Python convenience functions for OI, Limitations of command line tools, AGI development approaches

Link mentioned: Bora's Law: Intelligence Scales With Constraints, Not Compute: This is a working paper exploring an emerging principle in artificial intelligence development.


LAION ▷ #general (4 messages):

AI Copyright Law Overview, Hyper-Explainable Networks, Inference-Time Credit Assignment


AI21 Labs (Jamba) ▷ #general-chat (3 messages):

P2B crypto platform, AI21 Labs' stance on crypto, Community guidelines on crypto discussions


MLOps @Chipro ▷ #events (1 messages):

AI Agents Workshop, 2025 AI Trends, No-code AI Development

Link mentioned: Create Your First AI Agent of 2025 · Zoom · Luma: AI Agents are the talk of 2025! From personal assistants to business analysts, these digital teammates are taking over every industry. The best part? Creating…


MLOps @Chipro ▷ #general-ml (1 messages):

heathcliff_ca: Cost is another big reason to stick with what works


Nomic.ai (GPT4All) ▷ #general (2 messages):

Qwen 2.5 fine-tuning, Llama 3.2 character limits, TV pilot script analysis

Link mentioned: Compare Llama 3.2 3B vs Llama 3 8B Instruct - Pricing, Benchmarks, and More: Compare pricing, benchmarks, model overview and more between Llama 3.2 3B and Llama 3 8B Instruct. In depth comparison of Llama 3.2 3B vs Llama 3 8B Instruct.


DSPy ▷ #general (1 messages):

Ambient Agent Implementation, DSPy Examples








{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}