Frozen AI News archive

Not much happened piday

**DeepMind** announces **SIMA**, a generalist AI agent capable of following natural language instructions across diverse 3D environments and video games, advancing embodied AI agents. **Anthropic** releases **Claude 3 Haiku**, their fastest and most affordable model, now available via API and Perplexity. New research explores language model scaling laws, over-training, and introduces **Branch-Train-MiX (BTX)** for efficient training of large language models using mixture-of-experts. Predictions suggest software engineering jobs will grow to **30-35 million** in five years, aided by AI coding assistants like **Cohere's Command-R** focusing on retrieval-augmented generation and tool use. The **EU AI Act** is approved, mandating transparency in training data for GPAI systems. Privacy-preserving in-context learning with differential privacy is highlighted as promising work. Memes humorously discuss AI software engineers and notable figures like **Andrej Karpathy**.

Canonical issue URL


It's the anniversary of GPT4, but no GPT5 for you today. Join @elonmusk in checking out the latest Latent Space pod with Suno AI?

https://www.youtube.com/watch?v=gYXjn-V7AEw&feature=youtu.be

(Also we missed highlighting the Figure 01 launch yesterday, which in retrospect we'd rank slightly higher than Deepmind SIMA in impressiveness/near term importance).


Table of Contents

[TOC]


PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

AI Agents and Environments

  1. DeepMind announces SIMA, a generalist AI agent that can follow natural language instructions in a broad range of 3D environments and video games, marking an important step towards agents that can tackle complex tasks requiring planning and sub-tasks. (537,888 impressions)

  2. DeepMind's SIMA agent demonstrates the ability to follow natural language instructions to carry out tasks across a wide array of game worlds, similar to how a human would play. This is an exciting development in embodied AI agents. (178,835 impressions)

  3. The SIMA research focuses on developing embodied AI agents that can translate abstract language into useful actions, using video games as safe, accessible testing environments rather than optimizing for high scores. (24,983 impressions)

Large Language Models and Scaling

  1. Anthropic introduces Claude 3 Haiku, their fastest and most affordable model, now available in the API and on Perplexity for Claude Pro subscribers. (299,766 impressions)

  2. Language models scale reliably with over-training and on downstream tasks. A new paper explores gaps in LM scaling laws, providing insights into over-training and linking model perplexity to downstream performance. (10,589 impressions)

  3. Branch-Train-MiX (BTX) is a new approach for training large language models more efficiently by mixing expert LLMs into a Mixture-of-Experts LLM. It is shown to be more efficient than training a larger generalist LLM or several separate specialized LLMs. (11,042 impressions)

AI Coding Assistants and Software Engineering

  1. @fchollet predicts there will be more software engineers in five years than today, estimating growth from 26-27M today to 30-35M in 5 years. He argues that making it easier to code has historically led to more coding jobs. (188,949 impressions)

  2. Cohere's Command-R model focuses on retrieval augmented generation (RAG) and tool usage - two key skills for building LLM applications. It addresses issues in scaling proof-of-concept LLM apps to production. (2,297 impressions)

  3. A perspective that AI will enable more software engineers, and that fancy demos are causing overreaction. Most AI coding solutions will likely have limited scope and need human supervision. (15,308 impressions)

AI Safety and Regulation

  1. The EU AI Act has been approved by Parliament, representing big and largely positive AI news. (11,126 impressions)

  2. Key requirements in the AI Act include that GPAI systems must publish "detailed summaries of the content used for training." (1,759 impressions)

  3. A paper on "Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation" is highlighted as promising work in light of the AI Act's approval. The paper proposes using a pre-trained LLM to generate differentially private synthetic examples from private datasets. (79 impressions)

Memes and Humor

  1. A meme jokes that an "AI software engineer" that can automate everything would be used as a product rather than to dominate the market. (586,645 impressions)

  2. A humorous tweet imagines Andrej Karpathy leaving Tesla because he suggested changing a learning rate constant from 0.086 to 0.0855541. (270 impressions)

  3. A meme suggests that people waiting for GPT-5 to drop will be disappointed again. (1,378 impressions)

Other Notable Topics


PART 0: Summary of Summaries of Summaries

Since Claude 3 Haiku was released recently, we're adding them to this summary run for you to compare vs our custom GPT summarizer (all are different than the smol model running the Part 1/2 summaries). We'll keep running these side by side for a little longer while we build the AINews platform for a better UX. We've noticed that the same prompts result in consistently different output in the 3 Claude models. We'll be trying to tweak prompts in tomorrow's iteration to get Haiku at least behaving.

Claude 3 Haiku (3B?)

Claude 3 Sonnet (14B?)

  1. New AI Model Releases and Capabilities:

    • Cerebras unveils CS-3 AI accelerator capable of training up to 24 trillion parameter models on a single chip, with 4 trillion transistors and 125 petaflops of compute power.
    • Anthropic releases Claude 3 Haiku, a fast and cost-efficient model available on OpenRouter, running at ~120 tokens/s and 4 million prompt tokens per dollar.
    • Cohere's Command-R model with 128k token context window is now on OpenRouter, at 2 million prompt tokens per dollar.
    • DeepMind's SIMA is a new generalist AI agent that can understand natural language instructions in video game environments, though its technical details lack transparency.
  2. AI Safety and Vulnerability Concerns:

    • A new paper on ComPromptMized reveals prompt injection attacks on AI models like Gemini Pro, ChatGPT 4.0, and LLaVA, highlighting vulnerabilities in GenAI-powered applications.
    • Discussions around OpenAI's security incident and the implications for AI security.
    • Debates on the EU's new AI legislation requiring disclosure of AI-generated content and avoiding illegal outputs, with concerns over practicality and impact on open source models.
  3. Advances in Multimodal AI and Interpretability:

  4. Synthetic Data Generation for LLM Pretraining and Finetuning:

    • Discussions on using synthetic data as an alternative to human annotations for pretraining, instruction-tuning, and preference-tuning of language models, with potential quality advantages.
    • AI News newsletter provides valuable summaries of AI discords and top Twitter accounts, recommended by experts like Soumith Chintala and Andrej Karpathy.
    • Debates on whether fine-tuning can impart new knowledge to models, and the efficiency of fine-tuning for style transfer versus knowledge acquisition.

Claude 3 Opus (>220B?)

ChatGPT (GPT4T)


PART 1: High level Discord summaries

Nous Research AI Discord Summary


Perplexity AI Discord Summary


LM Studio Discord Summary

Tackling LM Studio Outside the UI Box: Users examined running LM Studio's API services on a home network without the user interface, focusing on server mode and localhost connections. It was highlighted that llama.cpp is a viable option, sustaining AVX without AVX2 and allowing independence from the LM Studio UI, per its GitHub repository.

LM Studio Limitations Spur Creative Workarounds: Among LM Studio's constraints is the inability to launch services or connect to the internet programmatically; users creatively employed batch files and PowerShell scripts to automate starting the LM Studio inference server, showcasing the community's resourcefulness.

Mighty Models Extended and Examined: The Nous-Yarn-Mistral-7b-128k model expanded to a 128k token context window using the YaRN method, alongside discussions about model perplexity and humorous disappointment with the "Yet Another " naming convention. Moreover, some shared format-specific obstacles, such as incompatibility issues with llama.cpp for the Command-R 35B v1.0 GGUF format.

ROCM Round-Up: Real-world experiences with ROCm support in LM Studio were shared, including troubleshooting steps like using AMD's cleanup tool and avoiding PRO drivers. Vision models proved challenging, and it was advised to choose Nvidia GPUs over AMD for image generation projects. Additionally, a user found that disabling the iGPU on a Gigabyte motherboard in BIOS settings enabled better usage of their RX 7900 XT with ROCm.

Hardware Conversation Heats Up: The cost of SLI/NVLink sparked debates, complemented by discussions on overcoming Mac OS's minimum VRAM requirements, strategizing PC hardware upgrades, and balance in multi-model deployments in LM Studio. Separate dialogues covered selecting the right dual-purpose monitor, with an inclination towards OLED screens despite burn-in risks and preferences for high refresh rates to match top-tier graphics cards like the Nvidia 4090.


Latent Space Discord Summary


Unsloth AI (Daniel Han) Discord Summary

Visualizing Token Probabilities: Discussions indicated a need for visualizing token probability in sentences, with suggestions on using lm_head's output and softmax. However, there seems to be a lack of specific plugins for this visualization.

AI's Fast-Paced Progress: Conversations were buzzing about the rapid development in AI, with anticipation for Elon Musk's Grok model and chatter about OpenAI's authenticity.

Unsloth AI Battles Colab Woes: Fixes for Google Colab's PyTorch update issues were shared by Unsloth AI, along with a command list to help users rectify these problems themselves. Unsloth AI's compatibility was clarified, noting that it doesn't support multi-GPU or GGUF formatted models for fine-tuning yet, but it can handle 4-bit quantization for single-GPU setups.

Data Preparation Discussion: An active conversation recommended the creation of an FAQ for data preparation, suggesting a more automated approach could be beneficial.

Sophia Optimizer Sparks Interest: A new optimizer, Sophia, proposed for reducing language model training time and cost, caught the attention of the community. While untested, there's optimism it could replace existing optimizers effectively (Sophia Optimizer Paper).


OpenAI Discord Summary


Eleuther Discord Summary

DeepMind Debuts Generalist Gaming AI: DeepMind introduces SIMA, exhibiting natural-language proficiency in varied gaming settings, but the research community flags insufficient technical detail. Critics are wary of the metrics used to validate the agent's effectiveness, debating the definition of game expertise and AI's broader implications in competitive gaming scenarios, particularly within unpredictable multi-agent systems like BR games.

Research Paper Paywalls Provoke Ire: Accessibility to cutting-edge AI research is hampered by publisher paywalls, sparking discussions around innovative neural network training dynamics and the integration of diverse network architectures. Concerns also arise about the consequences of watermarking AI-generated content, potentially limiting its practicality.

Interpretability Library for Multimodal Models Launched: A new multimodal mechanism interpretability library garners interest for collaboration, while discussions delve into the complexities of model agnosticism and language-dependent dynamics in multilingual transformers. The exploration of tokenization bias in bilingual models is highlighted along with a vector-DB-lookup method for deeper insights into model latent representations.

Language Models Enter the Thunderdome: The LM evaluation harness community is experimenting with learning rate cooldowns for benchmark improvement. They face challenges in adding logits due to recent API changes aimed at security, spurring discourse on adapting tasks for generative models and testing different checkpoints for model performance.

Megatron Meets NeoX: A GitHub pull request sparks a debate about the potential benefits of aligning GPT-NeoX more closely with the upstream Megatron for Transformer Engine integration. Community feedback is solicited to weigh the advantages of this strategy against code divergence.


LAION Discord Summary


HuggingFace Discord Summary


OpenRouter (Alex Atallah) Discord Summary


LlamaIndex Discord Summary

LlamaParse Triumphs in Document Parsing: LlamaParse elevates document parsing with its ability to handle images, tables, charts, and follow natural language instructions, promising remarkable performance improvements as seen on Twitter.

Safeguard Data with Presidio: LlamaIndex shines a spotlight on Presidio, Microsoft's open-source tool to identify and anonymize PII, reinforcing the significant role of data protection highlighted in this tweet.

RAG Stumbles with Finance Presentations: When it comes to finance PowerPoint presentations, RAG has difficulty due to format complexities, necessitating improved methods for text positioning and parsing, detailed in this tweet.

Azure Storage Anomalies Baffle Users: Users grappling with Azure AI Search Index report discrepancies between storage size (3mb) and a vector index size of 0, despite following the AzureAISearchIndexDemo guide.

Developer Dilemmas in #general: Engineers encounter multiple roadblocks, from warnings with OpenAIPydanticProgram—solvable by installing llama-index-program-openai—to puzzling npx create-llama errors and slow response times with OpenAIAssistantAgent; upgrading to streaming and resolving recent OpenAI API performance issues may alleviate lag.


OpenAccess AI Collective (axolotl) Discord Summary


LangChain AI Discord Summary

LangChain 0.2 on Fast-Track Due to Vulnerabilities: An expedited release of langchain 0.2 is underway, addressing CVEs by separating from langchain-community. The process is detailed on GitHub, seeking community input to meet user requirements.

LangChain Challenges and Innovations: Users discussed various LangChain issues including AgentExecutor bugs, advantages of AI agents, and evaluating AI agent behaviors with still-developing benchmarks. One inquiry focused on how to integrate variables like tools = [cat_tool] into Langsmith Hub prompt templates. For more guidance, users were referred to the LangChain evaluation guides.

Exciting Collaborations and Demos Spotlighted:

Tutorial Central for #golang and #llm Fans:


Interconnects (Nathan Lambert) Discord Summary


CUDA MODE Discord Summary


DiscoResearch Discord Summary


LLM Perf Enthusiasts AI Discord Summary


Datasette - LLM (@SimonW) Discord Summary


Alignment Lab AI Discord Summary


Skunkworks AI Discord Summary


AI Engineer Foundation Discord Summary


PART 2: Detailed by-Channel summaries and links

Nous Research AI ▷ #ctx-length-research (4 messages):


Nous Research AI ▷ #off-topic (30 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (8 messages🔥):

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):


Nous Research AI ▷ #general (556 messages🔥🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (115 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #bittensor-finetune-subnet (27 messages🔥):

Links mentioned:


Perplexity AI ▷ #announcements (2 messages):


Perplexity AI ▷ #general (487 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (15 messages🔥):

Links mentioned:


Perplexity AI ▷ #pplx-api (13 messages🔥):

Links mentioned:

About "return_citations": no description found


LM Studio ▷ #💬-general (273 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (24 messages🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (2 messages):


LM Studio ▷ #🎛-hardware-discussion (115 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (3 messages):


LM Studio ▷ #amd-rocm-tech-preview (85 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-general-chat (108 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (10 messages🔥):

Links mentioned:


Latent Space ▷ #llm-paper-club-west (208 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (130 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #welcome (9 messages🔥):


Unsloth AI (Daniel Han) ▷ #random (5 messages):


Unsloth AI (Daniel Han) ▷ #help (73 messages🔥🔥):


Unsloth AI (Daniel Han) ▷ #suggestions (3 messages):

Links mentioned:

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training: Given the massive cost of language model pre-training, a non-trivial improvement of the optimization algorithm would lead to a material reduction on the time and cost of training. Adam and its variant...


OpenAI ▷ #ai-discussions (128 messages🔥🔥):

Links mentioned:

Microsoft Copilot | Microsoft AI: A new era of AI has arrived. Work more productively, boost efficiency, and find new growth opportunities with Copilot.


OpenAI ▷ #gpt-4-discussions (41 messages🔥):


OpenAI ▷ #prompt-engineering (11 messages🔥):


OpenAI ▷ #api-discussions (11 messages🔥):


Eleuther ▷ #general (94 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (51 messages🔥):

Links mentioned:


Eleuther ▷ #interpretability-general (22 messages🔥):

Links mentioned:

llm-latent-language/nnsight.ipynb at main · Butanium/llm-latent-language: Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers". - Butanium/llm-latent-language


Eleuther ▷ #lm-thunderdome (10 messages🔥):

Links mentioned:


Eleuther ▷ #multimodal-general (1 messages):

boneamputee: https://brianfitzgerald.xyz/prompt-augmentation/


Eleuther ▷ #gpt-neox-dev (1 messages):

Links mentioned:

Diffs to upstream megatron as a basis for discussion towards TE integration by tf-nv · Pull Request #1185 · EleutherAI/gpt-neox: Here's three commits: One with the full diff of GPT-NeoX's megatron folder with current upstream Megatron-LM. That's 256 files with ~60k lines. However most are completely new or deleted....


LAION ▷ #general (137 messages🔥🔥):

Links mentioned:


LAION ▷ #research (21 messages🔥):

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

Links mentioned:


HuggingFace ▷ #general (76 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (7 messages):

Links mentioned:


HuggingFace ▷ #cool-finds (10 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (13 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (6 messages):


HuggingFace ▷ #core-announcements (2 messages):

Links mentioned:

Merge LoRAs: no description found


HuggingFace ▷ #diffusion-discussions (2 messages):

Links mentioned:

Improve generation quality with FreeU: no description found


HuggingFace ▷ #computer-vision (13 messages🔥):


HuggingFace ▷ #NLP (19 messages🔥):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (2 messages):

Links mentioned:

Improve generation quality with FreeU: no description found


OpenRouter (Alex Atallah) ▷ #announcements (3 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (7 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (129 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #blog (3 messages):


LlamaIndex ▷ #general (82 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (61 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (6 messages):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (7 messages):

Links mentioned:

Quickstart — vLLM: no description found


OpenAccess AI Collective (axolotl) ▷ #community-showcase (3 messages):


LangChain AI ▷ #announcements (1 messages):

Links mentioned:

RFC: Expedited langchain 0.2 release · langchain-ai/langchain · Discussion #19083: Context Currently langchain (the package) depends on langchain-community. This is done only for backwards compatibility with langchain versions that predate the split of langchain and langchain-com...


LangChain AI ▷ #general (64 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #langchain-templates (1 messages):


LangChain AI ▷ #share-your-work (8 messages🔥):

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (6 messages):


Interconnects (Nathan Lambert) ▷ #other-papers (2 messages):


Interconnects (Nathan Lambert) ▷ #ml-questions (3 messages):


Interconnects (Nathan Lambert) ▷ #ml-drama (2 messages):


Interconnects (Nathan Lambert) ▷ #random (54 messages🔥):

Links mentioned:


CUDA MODE ▷ #general (8 messages🔥):

Links mentioned:

NumPy vs BLAS: Losing 90% of Throughput: Downloaded over 5 Billion times, NumPy is the most popular library for numerical computing in Python. It wraps low-level HPC libraries like BLAS and LAPACK, providing a high-level interface for matrix...


CUDA MODE ▷ #triton (6 messages):

Links mentioned:


CUDA MODE ▷ #cuda (10 messages🔥):

Links mentioned:


CUDA MODE ▷ #jobs (1 messages):


CUDA MODE ▷ #beginner (2 messages):


CUDA MODE ▷ #pmpp-book (4 messages):


CUDA MODE ▷ #ring-attention (10 messages🔥):

Links mentioned:


CUDA MODE ▷ #off-topic (8 messages🔥):

Links mentioned:


DiscoResearch ▷ #disco_judge (1 messages):


DiscoResearch ▷ #general (8 messages🔥):

Links mentioned:


DiscoResearch ▷ #benchmark_dev (1 messages):

Links mentioned:

GitHub - EQ-bench/EQ-Bench at creative_writing: A benchmark for emotional intelligence in large language models - GitHub - EQ-bench/EQ-Bench at creative_writing


DiscoResearch ▷ #embedding_dev (3 messages):


DiscoResearch ▷ #discolm_german (2 messages):


LLM Perf Enthusiasts AI ▷ #claude (12 messages🔥):


Datasette - LLM (@SimonW) ▷ #ai (6 messages):

Links mentioned:

ComPromptMized: Stav Cohen Technion - Israel Institute of Technology


Alignment Lab AI ▷ #looking-for-collabs (2 messages):

Links mentioned:


Alignment Lab AI ▷ #general-chat (1 messages):


Skunkworks AI ▷ #off-topic (2 messages):

Links mentioned:


AI Engineer Foundation ▷ #general (1 messages):

Links mentioned:

Blog: no description found


AI Engineer Foundation ▷ #events (1 messages):

Links mentioned:

Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team