Frozen AI News archive

DeepMind SIMA: one AI, 9 games, 600 tasks, vision+language ONLY

**DeepMind SIMA** is a generalist AI agent for 3D virtual environments evaluated on **600 tasks** across **9 games** using only screengrabs and natural language instructions, achieving **34%** success compared to humans' **60%**. The model uses a multimodal Transformer architecture. **Andrej Karpathy** outlines AI autonomy progression in software engineering, while **Arav Srinivas** praises Cognition Labs' AI agent demo. **François Chollet** expresses skepticism about automating software engineering fully. **Yann LeCun** suggests moving away from generative models and reinforcement learning towards human-level AI. Meta's **Llama-3** training infrastructure with **24k H100 Cluster Pods** is shared by **Soumith Chintala** and **Yann LeCun**. **Deepgram's Aura** offers low-latency speech APIs, and **Modal Labs' Devin AI** demonstrates document navigation and interaction with ComfyUI. Memes and humor circulate in the AI community.

Canonical issue URL

image.png

DeepMind SIMA is the news of the day: it takes a step beyond specialist AI systems developed for MineCraft or Dota 2 to be more general. Deepmind collaborated with game studios to evaluate it's abilities on 600 short (<10 second) skills in 9 different games, from No Man's Sky to Hydroneer to Goat Simulator.

image.png

The key constraint here is that SIMA only works from screengrabs + natural language instructions - no special APIs involved. The technical report offers a little more detail, with the classic multimodal Transformer you'd expect, with the Google flavors of things:

image.png

The 600 tasks are hard - humans only solve 60% of them, while SIMA hits 34%. image.png


Table of Contents

[TOC]


PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

Automating Software Engineering

Large Language Models and AI Architectures

AI Agents and Demos

AI Infrastructure and Training

Memes and Humor


PART 0: Summary of Summaries of Summaries

Since Claude 3 Haiku was released recently, we're adding them to this summary run for you to compare. We'll keep running these side by side for a little longer while we build the AINews platform for a better UX.

Claude 3 Haiku (3B?)

Claude 3 Sonnet (14B?)

1) Introducing Autonomous AI Software Engineers like Devin:

2) Advancements in Large Language Models and AI Hardware:

3) Open-Source AI Model Releases and Benchmarking Efforts:

4) Frameworks and Tools for Efficient Large Model Training:

Claude 3 Opus (>220B?)

ChatGPT (GPT4T)


PART 1: High level Discord summaries

Nous Research AI Discord Summary

Planting AI in the Game Development Field: An AI-driven endeavor to create Plants Vs Zombies using Claude 3 was showcased, piquing interest for its application of Python game development — watch the creativity unfold here.

Function Calling Is the New Black: The recently released Hermes 2 Pro 7B model shows significant advancements with a 91% Function Calling Accuracy and specialized prompts enabling advanced function calling. The model’s prowess is acknowledged by its enthusiastic uptake for integration with tools like llama.cpp and the Vercel AI SDK, seeking a new blend of structured JSON outputs GitHub - Hermes Function Calling.

The AI Chip Shape Debate: What's the most optimal design for the new Cerebras CS-3, the proclaimed world's fastest AI accelerator? Discussions swirl around its square chip design, while the model boasts readiness to train a colossal 24 trillion parameter model on a single device, presenting a leap in AI compute technology Cerebras.

The AI Rules Are Changing: The EU's new AI Act stirs the pot for AI companies by outlawing certain AI practices and demanding energy consumption disclosures. Meanwhile, anticipation brews for the open-source models with a focus on long context chatbots, as referenced in the Sparse Distributed Associative Memory repository.

Cognition Introduces a New Player to the Field: Enter Devin, an AI software engineer claiming a new success benchmark in addressing GitHub issues autonomously, demonstrating capabilities to navigate a shell, code editor, and web browser — a glimpse of the future at Cognition Labs Devin.


Latent Space Discord Summary


Perplexity AI Discord Summary


Unsloth AI (Daniel Han) Discord Summary


LM Studio Discord Summary

LaTeX Rendering Sparks Engineered Excitement: Discussions highlighted a desire for LM Studio to support LaTeX in markdown, as seen in a GitHub blog post, with an eye on improving math problem interfaces. Members pondered the incorporation of swipe-style blackboards for visual math inputs, demonstrating a playful tone surrounding serious technical aspirations.

GPU Performance Unearthed: A shared YouTube performance test fueled talks about the benefits and technical considerations of using dual GPU configurations for large language models (LLMs), including setups with dual RTX 4060 Ti 16GB GPUs. Some members noted more than two-fold efficiency gains while others shared tips for optimal configurations, even as they humorously exchanged views about high-priced NVLINK bridges and alternatives.

Better Together or Alone? Dual GPU Configs vs. Single: The effectiveness of running LLMs on a single GPU versus dual setups was scrutinized by engineers sharing personal testing outcomes. Discussions ranged from configuration tweaks for GPUs with mismatched VRAM to exploring the feasibility of powering multiple high-end GPUs.

Upgraded RAM for a Smarter Tomorrow: RAM upgrade considerations to run larger LLM models, such as upgrading to 128GB of RAM, were weighed against the need for more VRAM. Community members offered insights on hardware configurations and performance modifications, including tips for running concurrent instances of LM Studio and enhancing GPU acceleration with AMD's ROCm beta.

KIbosh on iGPU to Enhance Main GPU's Power: Users in the amd-rocm-tech-preview thread found that disabling the iGPU could resolve offloading issues. They exchanged strategies and suggestions for optimizing ROCm beta with AMD GPUs, from installing specific driver combos like Adrenalin 24.1.1 + HIP-SDK to cleaning cached directories for better model loading in LM Studio.

AVX Beta Buzzes Quietly: In the 🧪-beta-releases-chat, there was a brief touch on version updates to AVX beta and minimal conversation about the quality of unspecified subjects, with an expressed opinion that they "aren't any good".


OpenAI Discord Summary


Eleuther Discord Summary


LAION Discord Summary


LlamaIndex Discord Summary


OpenAccess AI Collective (axolotl) Discord Summary

Axolotl Embraces DoRA for Low-Bit Quantization: DoRA (Differentiable Quantization of Weights and Activations) support for 4-bit and 8-bit quantized models has been successfully merged, promising performance improvements, although it's limited to linear layers with notable overhead. Interested engineers can dive into the merge details on GitHub.

Big Models, Little GPUs - Fuyou to the Rescue: The Fuyou framework has shown potential in allowing engineers to fine-tune behemoth models up to 175 billion parameters on standard consumer-grade GPUs such as the RTX 4090, sparking interest for those confined by hardware limitations. _akhaliq's tweet weighs in on the excitement, flaunting a 156 TFLOPS computation capability.

API Evolution in DeepSpeed: DeepSpeed introduced an API modification for setting modules as leaf nodes which could make it easier to work with MoE models, thereby potentially benefiting Axolotl's development plans. More information is provided in their GitHub PR.

Command-R Forges Ahead with 35B Parameters: The creation of Command-R by CohereForAI, an open-source 35 billion parameter model, opens new frontiers as it's optimized for a multitude of use cases and accessible on Huggingface.

Mistral Medium Outshines Mixtral: In community showcases, Mistral Medium is noted for outperforming Mixtral, delivering more concise and instructive-compliant outputs while generating more relevant citations, possibly indicating an advanced, possibly closed-sourced, version of Mixtral.


LangChain AI Discord Summary


OpenRouter (Alex Atallah) Discord Summary


Interconnects (Nathan Lambert) Discord Summary


CUDA MODE Discord Summary


DiscoResearch Discord Summary


LLM Perf Enthusiasts AI Discord Summary


Datasette - LLM (@SimonW) Discord Summary


Skunkworks AI Discord Summary


Alignment Lab AI Discord Summary


AI Engineer Foundation Discord Summary


PART 2: Detailed by-Channel summaries and links

Nous Research AI ▷ #off-topic (35 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (8 messages🔥):

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):


Nous Research AI ▷ #general (349 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (107 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-general-chat (127 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (7 messages):

Links mentioned:


Latent Space ▷ #llm-paper-club-west (207 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #general (311 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (12 messages🔥):

Links mentioned:

Midjourney bans Stability staff, Marilyn Monroe AI Debut, Vision Pro aids spine surgery: This episode explores the latest AI news, including a heated data scraping controversy between Midjourney and Stability AI, the innovative "Digital Marilyn" ...


Perplexity AI ▷ #pplx-api (16 messages🔥):

Links mentioned:

Chat Completions: no description found


Unsloth AI (Daniel Han) ▷ #general (224 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #welcome (9 messages🔥):


Unsloth AI (Daniel Han) ▷ #random (6 messages):

Links mentioned:

no title found: no description found


Unsloth AI (Daniel Han) ▷ #help (63 messages🔥🔥):

Links mentioned:

no title found: no description found


Unsloth AI (Daniel Han) ▷ #suggestions (17 messages🔥):

Links mentioned:


LM Studio ▷ #💬-general (117 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (37 messages🔥):

Links mentioned:


LM Studio ▷ #🎛-hardware-discussion (76 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (4 messages):


LM Studio ▷ #amd-rocm-tech-preview (73 messages🔥🔥):

Links mentioned:

👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs


OpenAI ▷ #ai-discussions (136 messages🔥🔥):

Links mentioned:

notebooks/mistral-finetune-own-data.ipynb at main · brevdev/notebooks: Contribute to brevdev/notebooks development by creating an account on GitHub.


OpenAI ▷ #gpt-4-discussions (57 messages🔥🔥):


OpenAI ▷ #prompt-engineering (46 messages🔥):


OpenAI ▷ #api-discussions (46 messages🔥):


Eleuther ▷ #general (99 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (84 messages🔥🔥):

Links mentioned:


Eleuther ▷ #interpretability-general (3 messages):

Links mentioned:

[PDF] Dissecting Language Models: Machine Unlearning via Selective Pruning | Semantic Scholar: An academic search engine that utilizes artificial intelligence methods to provide highly relevant results and novel tools to filter them with ease.


Eleuther ▷ #lm-thunderdome (8 messages🔥):

Links mentioned:


Eleuther ▷ #gpt-neox-dev (1 messages):

Links mentioned:

Diffs to upstream megatron as a basis for discussion towards TE integration by tf-nv · Pull Request #1185 · EleutherAI/gpt-neox: Here's three commits: One with the full diff of GPT-NeoX's megatron folder with current upstream Megatron-LM. That's 256 files with ~60k lines. However most are completely new or deleted....


LAION ▷ #general (123 messages🔥🔥):

Links mentioned:


LAION ▷ #research (21 messages🔥):

Links mentioned:


LAION ▷ #learning-ml (1 messages):

Links mentioned:

Download & stream 400M images + text - a Lightning Studio by thomasgridai: Use, explore, & create from scratch the LAION-400-MILLION images & captions dataset.


LlamaIndex ▷ #announcements (1 messages):

Links mentioned:

LlamaIndex Webinar: Long-Term, Self-Editing Memory with MemGPT · Zoom · Luma: Long-term memory for LLMs is an unsolved problem, and doing naive retrieval from a vector database doesn’t work. The recent iteration of MemGPT (Packer et al.) takes a big step in this...


LlamaIndex ▷ #blog (4 messages):

Links mentioned:

Local & open-source AI developer meetup (Paris) · Luma: Ollama and Friends are in Paris! Ollama and Friends will be hosting a local & open-source AI developer meetup on Thursday, March 21st at 6pm at Station F in Paris. Come gather with developers...


LlamaIndex ▷ #general (128 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):

shure9200: I’m making huge database of recent llm papers https://shure-dev.github.io/


OpenAccess AI Collective (axolotl) ▷ #general (69 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (15 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages):


OpenAccess AI Collective (axolotl) ▷ #community-showcase (5 messages):


LangChain AI ▷ #general (82 messages🔥🔥):

(Note: The summary is based solely on the provided conversation snippets. No undocumented features, discussions, or external resources have been included.)

Links mentioned:


LangChain AI ▷ #langchain-templates (1 messages):


LangChain AI ▷ #share-your-work (3 messages):

Links mentioned:


LangChain AI ▷ #tutorials (3 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Links mentioned:

Anthropic: Claude 3 Haiku (self-moderated) by anthropic | OpenRouter: This is a lower-latency version of Claude 3 Haiku, made available in collaboration with Anthropic, that is self-moderated: response moderation happens on the model&...


OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Links mentioned:

Olympia | Better Than ChatGPT: Grow your business with affordable AI-powered consultants that are experts in business strategy, content development, marketing, programming, legal strategy and more.


OpenRouter (Alex Atallah) ▷ #general (54 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (9 messages🔥):


Interconnects (Nathan Lambert) ▷ #other-papers (2 messages):


Interconnects (Nathan Lambert) ▷ #random (42 messages🔥):

Links mentioned:


CUDA MODE ▷ #general (13 messages🔥):

Links mentioned:


CUDA MODE ▷ #cuda (6 messages):

Links mentioned:

Nsight Visual Studio Code Edition: CUDA development for NVIDIA platforms integrated into Microsoft Visual Studio Code


CUDA MODE ▷ #torch (2 messages):

Links mentioned:

[RFC] Plans for torchao · Issue #47 · pytorch-labs/ao: Summary Last year, we released pytorch-labs/torchao to provide acceleration of Generative AI models using native PyTorch techniques. Torchao added support for running quantization on GPUs, includin...


CUDA MODE ▷ #jobs (1 messages):


CUDA MODE ▷ #beginner (4 messages):


CUDA MODE ▷ #pmpp-book (2 messages):


CUDA MODE ▷ #ring-attention (13 messages🔥):

Links mentioned:


CUDA MODE ▷ #off-topic (5 messages):

Links mentioned:


DiscoResearch ▷ #disco_judge (1 messages):


DiscoResearch ▷ #general (15 messages🔥):

Links mentioned:


DiscoResearch ▷ #benchmark_dev (1 messages):

Links mentioned:

GitHub - EQ-bench/EQ-Bench at creative_writing: A benchmark for emotional intelligence in large language models - GitHub - EQ-bench/EQ-Bench at creative_writing


DiscoResearch ▷ #embedding_dev (2 messages):


DiscoResearch ▷ #discolm_german (3 messages):


LLM Perf Enthusiasts AI ▷ #general (13 messages🔥):

Links mentioned:

openai announces gpt-4.5 turbo - Bing: Pametno pretraživanje u tražilici Bing olakšava brzo pretraživanje onog što tražite i nagrađuje vas.


LLM Perf Enthusiasts AI ▷ #claude (1 messages):

ldj: Elon will be mad prob if OpenAI steals starship thunder on the same day like that 😭


LLM Perf Enthusiasts AI ▷ #opensource (4 messages):


Datasette - LLM (@SimonW) ▷ #ai (6 messages):

Links mentioned:

ComPromptMized: Stav Cohen Technion - Israel Institute of Technology


Datasette - LLM (@SimonW) ▷ #llm (2 messages):

Links mentioned:

Use an llm to automagically generate meaningful git commit messages: I've transformed my git commit process by using an AI to automatically generate meaningful messages. This setup involves a nifty integration of the llm CLI and git hooks, saving me time. Now I can fuc...


Skunkworks AI ▷ #general (2 messages):


Skunkworks AI ▷ #off-topic (3 messages):

Links mentioned:


Alignment Lab AI ▷ #looking-for-collabs (1 messages):

Links mentioned:

Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems — LessWrong: Behold the dogit lens. Patch-level logit attribution is an emergent segmentation map. Join our Discord here. …


Alignment Lab AI ▷ #general-chat (1 messages):


AI Engineer Foundation ▷ #general (2 messages):

Links mentioned: