Frozen AI News archive

Andrew likes Agents

**Andrew Ng's The Batch writeup on Agents** highlighted the significant improvement in coding benchmark performance when using an iterative agent workflow, with **GPT-3.5** wrapped in an agent loop achieving up to **95.1%** correctness on HumanEval, surpassing **GPT-4** zero-shot at **67.0%**. The report also covers new developments in **Stable Diffusion** models like **Cyberrealistic_v40**, **Platypus XL**, and **SDXL Lightning** for Naruto-style image generation, alongside innovations in LoRA and upscaling techniques. Discussions on **local LLM deployment** and optimization focus on hardware setups and finetuning strategies for efficient inference and multi-user serving. Emad's departure from **Stability AI** and new **Sora** videos from **OpenAI** were also noted.

Canonical issue URL

Andrew Ng's The Batch writeup on Agents made a splash across all platforms this weekend:

Devin’s splashy demo recently received a lot of social media buzz. My team has been closely following the evolution of AI that writes code. We analyzed results from a number of research teams, focusing on an algorithm’s ability to do well on the widely used HumanEval coding benchmark. You can see our findings in the diagram below.

GPT-3.5 (zero shot) was 48.1% correct. GPT-4 (zero shot) does better at 67.0%. However, the improvement from GPT-3.5 to GPT-4 is dwarfed by incorporating an iterative agent workflow. Indeed, wrapped in an agent loop, GPT-3.5 achieves up to 95.1%.

image.png

Nothing here is new to people who have studied the agents field, but Andrew's credibility and agent framework (very close to Lilian Weng + the recent new metagame of multiagent collaboration) sells it.

We published The Unbundling of ChatGPT today. Also Emad stepped down from Stability, and there are more Sora videos out, make sure to check out the Don Allen Stevenson III one.


Table of Contents

[TOC]


REDDIT

we've added more subreddits, and are synthesizing topics across them. Comment crawling still not implemented but coming along.

Stable Diffusion Models and Techniques

Local LLM Deployment and Optimization

Machine Learning Research and Techniques

AI Assistants and Applications

Memes and Humor

PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

Model Releases & Updates

Open Source Efforts & Challenges

Emerging Applications & Demos


PART 0: Summary of Summaries of Summaries


PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord


Unsloth AI (Daniel Han) Discord


Perplexity AI Discord


LM Studio Discord


OpenInterpreter Discord


LAION Discord


Nous Research AI Discord


OpenAI Discord


HuggingFace Discord

AI Art Prompt Guide Quest: Users are seeking advice on crafting prompts for AI-generated art, though specific resources weren't provided.

Blenderbot's Role-Play: Discussions highlight Blenderbot's ability to exhibit consistent character traits during interactions, in contrast to AI that acknowledges its non-human nature.

GPU Operation Showdown: A technical debate unfolded around the execution speed differences between multiplication and conditional checking on GPUs. Look into 'iq's work was suggested for further insights.

Complex Creativity for ChatGPT: A user requested a linguistically diverse and creative prompt for ChatGPT, prompting another to exclaim over the prompt's complexity.

Optimizing GPU Inference: The community explored methods and libraries like TensorRT-LLM and exLLama v2 for optimizing large language model inferencing on GPUs, with suggestions for tools ideal for simultaneous multi-user serving.

Rust's Rising Star: Conversations around converting the GLiNER model to Rust via the Candle library noted benefits including reduced dependencies and suitability for production, with GPU compatibility confirmed.

Efficient Coding with Federated Learning: An open-source GitHub project demonstrates an energy-efficient approach to federated learning for load forecasting.

Compiling the Stable Diffusion Compendium: A plethora of resources and guides for Stable Diffusion have been shared by community members, including civitai.com for comprehensive learning on Stable Diffusion.

Deck Out Your Memory – Diffusers Edition: An experimental tool for estimating the inference-time memory requirements of DiffusionPipeline has been released for feedback.

SegGPT: The Contextual Segmentor: Introducing SegGPT on HuggingFace, a model with impressive one-shot segmentation that can be trained for various image-to-image tasks.

BLIP-2 Ups the Fusion Game: In vision-language model fusion, BLIP-2 has been recommended for connecting pre-trained image encoders with language models, further elaborated in the transformers documentation.

Embedding Precision with Quantization: Embedding Quantization for Sentence Transformers brings major search speed improvements without compromising retrieval accuracy.

Catering to the German Learners: A GPT-powered German language learning tool named Hans promises enhanced user experience for German learners and is available on the GPT Store.

All-MiniLM-L6-v2 Download Dilemma: A user looked for assistance in downloading and training the all-MiniLM-L6-v2 model, emphasizing the power of community support for model implementation.

Revolutionizing Decision-Making with Langchain: An article on Medium posits Langchain as a transformative approach to how language agents resolve problems, available on Medium.

Diving Into Data's Importance: A shared arXiv paper emphasizes the significance of data as a potential critical influencing factor, reminding us of the indispensable value of quality data.

NEET/JEE Data Quest: A dataset of NEET/JEE exams is being sought for training MCQ answer generators, indicating the intersection of AI technology and educational resources.

AI on the Forefront: Recurrent Neural Notes newsletter discusses the potential limits of AI, possibly providing nuanced insights on future AI capabilities available on Substack.


LlamaIndex Discord

Twitter Sneak Peek on Human-LlamaIndex Workflow: A new template was introduced to streamline interactions between humans and LlamaIndex's agents, slated to reduce intrusiveness for users. The details and a preview were shared on Twitter.

Integrating Custom LLMs with LlamaIndex: Leonie Monigatti detailed the process of incorporating custom Language Models (LLMs) into LlamaIndex, with an explanation available on LinkedIn.

Guide to Building RAG Agent for PDFs: A tutorial by Ashish S. on creating a LlamaParse-powered RAG flow for PDF files was published and can be viewed in its entirety via this Tweet.

New LlamaIndex Python Documentation Released: LlamaIndex has updated its Python documentation to feature example notebooks better, improved search, and clearer API layouts, announced in a Twitter post.

LlamaIndex Community Tackles Integration and Documentation Challenges: Discussions in the community highlighted various integrations with Merlin API and LocalAI, an inquiry about the logic in LlamaIndex's evaluation process, conflicting documentation post v0.10 updates, requests for examples of multi-agent chatbots, and turning Python functions into LlamaIndex tools. Users exchanged resources, including several documentation links and GitHub code examples.


Latent Space Discord


OpenAccess AI Collective (axolotl) Discord

GaLore Optimization Sparks Debate: The GaLore optimizer discussion highlighted its VRAM savings abilities but also raised the question of potential over-training due to "coarseness." Some engineers are eager to test GaLore out, especially in light of the new Mistral v0.2 Base Model release, which now has a 32k context window.

Fine-Tuning Large Language Models on a Budget: Technical discussions surfaced around fine-tuning a 7b model within 27gb of memory, with a spotlight on a GitHub repository called torchtune that allows for efficient fine-tuning without Huggingface dependencies. A specific pull request was recommended to review full fine-tune methods requiring less than 16GB of RAM.

TypeError Troubles and Help Channel Support: A member grappling with a TypeError in "examples/openllama-3b/qlora.yml" was directed to a specialized help channel (#1111279858136383509) for expertise in resolving it. This exemplifies the collaborative environment, urging members to specific resources for technical resolutions.

Medical Model Publishing Dilemma: The decision whether to publicly share a preprint of a medical model in the midst of journal review sparked a discussion on the trade-offs of early disclosure. The conversation underscores the importance of strategic research dissemination in the field.

Open Calls for Developer Recognition and Business Collaboration: CHAI announced prizes for LLM developers, encouraging community contributions, whereas businesses were invited to share their applications of Axolotl confidentially, alluding to the value of real-world use-case narratives in furthering AI technology.


OpenRouter (Alex Atallah) Discord


Eleuther Discord


CUDA MODE Discord

When Discord Fails, Meet Pushes Through: Technical difficulties during a GTC event led to the suggestion of defaulting to voice channels for future lectures due to screen sharing issues on Discord stage channels. An unsatisfied member proposed switching to Google Meet in future due to the instability of Discord streams.

CUDA-tious Profiling: For engineers delving into CUDA, a lecture on how to profile CUDA kernels in PyTorch was shared, complete with accompanying slides and a GitHub code repository. CUDA programming becomes a necessity when seeking performance gains where PyTorch's speed is insufficient.

Triton Tricky Tidbits: Discussions around Triton's performance issues were prominent, and members were warned that Triton operations might be phased out in the future. A new prototype folder in the torchao repository was proposed for collaboration on API design for efficient kernel usage, as support for Triton continues.

Sparsity Meets Decomposition Elegance: A novel approach to distributed sparse matrix multiplication was introduced in the Arrow Matrix Decomposition paper by researchers Lukas Gianinazzi and Alexandros Nikolaos Ziogas, with the implementation available on GitHub.

Blackwell GPUs Smile for the Camera: Members discussed the new Blackwell GPUs, highlighting a tweet with a humorous take on the GPUs' smiley face pattern. Speculation on the unseen NVIDIA Developer Discord server took place after a GitHub discussion about the CUTLASS library was brought up. The community also touched on data type standardization in deep learning, noting the absence of Google in recent standard consortiums and the lack of an IEEE standard for new floating point numbers.


Interconnects (Nathan Lambert) Discord

Mistral's New 7B Model Steals the Spotlight: Mistral AI casually dropped a new model, the Mistral 7B v0.2 Base, at the @cerebral_valley hackathon. The model details including fine-tuning guidance are available here, although no magnet links were provided for this release, as noted by @natolambert.

Shakeup at Stability AI: CEO Emad Mostaque resigned from Stability AI, hinting at his future focus on #DecentralizedAI. The community expressed mixed feelings about the impact and direction of his tenure, amidst discussions of internal struggles and the nature of Stability AI's contributions to AI academia.

Nemo Interoperability Seekers: Questions arose about converting and wrapping Nemo checkpoints for compatibility with Hugging Face, underscoring the technical challenges in machine learning model interoperability.

AI's Ethical Tightrope:

February's Big AI Chats: Illuminating interviews with Anthropic's CEO and Mistral's CEO have been drawing attention, such as this "Fireside Chat" and the discussion on Amodei's AI industry predictions here. Additionally, Latent Space's February recap, highlighting key AI developments, can be found here.


LangChain AI Discord


LLM Perf Enthusiasts AI Discord

Real Estate Matching Gone Awry: A discussion unfolded around a problem with GPT4.Turbo misinterpreting property size requirements, with one property being suggested at 17,000 square feet despite a request for 2,000 - 4,000 square feet. A simple CSV-based database filter was recommended over a complex LLM, sparking a conversation about common missteps and linking to a resource by Jason Liu on the potential over-reliance on embedding search in LLMs.

Frustrations with Token Limitations: Participants voiced frustration with Anthropic's rate limit of 1M tokens per day, considering a 200k context window to be insufficient. The Bedrock monthly fee model was discussed as a potential alternative, while a $500 scale plan from Anthropic was suggested as offering easier access for extensive use.

Seeking Superior Explainers: The community was asked for their top explainer resources on advanced LLM topics, with a specific call-out for high-quality, clear content on topics like RHLF, rather than a vast collection of blogs. Exa.ai was suggested as a beneficial resource for delving into LLM-related subjects.

Brief Cry for Coding Quality: In the #jobs channel, a user lamented the difficulty in writing high-quality code with a succinct and relatable one-liner.

GPT-3.5-0125 Takes the Lead: GPT-3.5-0125 was lauded for its significant performance improvements over previous models, as observed in a user's comparative tests, elevating its status as a particularly advanced iteration within the realm of LLMs.


Alignment Lab AI Discord


Datasette - LLM (@SimonW) Discord


Skunkworks AI Discord


PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #general-chat (1195 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (1009 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (58 messages🔥🔥):

Link mentioned: GitHub - center-for-humans-and-machines/transformer-heads: Toolkit for attaching, training, saving and loading of new heads for transformer models: Toolkit for attaching, training, saving and loading of new heads for transformer models - center-for-humans-and-machines/transformer-heads


Unsloth AI (Daniel Han) ▷ #help (317 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (33 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #suggestions (29 messages🔥):

Links mentioned:


Perplexity AI ▷ #general (892 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (38 messages🔥):


Perplexity AI ▷ #pplx-api (24 messages🔥):

Link mentioned: Chat Completions: no description found


LM Studio ▷ #💬-general (533 messages🔥🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (71 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (17 messages🔥):

Link mentioned: nisten/obsidian-3b-multimodal-q6-gguf · Hugging Face: no description found


LM Studio ▷ #📘-docs-and-tips (2 messages):

Links mentioned:


LM Studio ▷ #🎛-hardware-discussion (228 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (16 messages🔥):


LM Studio ▷ #autogen (48 messages🔥):


LM Studio ▷ #langchain (1 messages):

pradeep1148: https://www.youtube.com/watch?v=Nc5Yk0XXgP8


LM Studio ▷ #memgpt (4 messages):


LM Studio ▷ #avx-beta (3 messages):


LM Studio ▷ #amd-rocm-tech-preview (26 messages🔥):

Link mentioned: How to see names and values of environment variables in Windows 10: In this article, we will see how to view environment variables defined in Windows 10 and their values for the current user and the system variables.


LM Studio ▷ #open-interpreter (23 messages🔥):

Links mentioned:


OpenInterpreter ▷ #general (362 messages🔥🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (576 messages🔥🔥🔥):

Links mentioned:


OpenInterpreter ▷ #ai-content (11 messages🔥):

Links mentioned:


LAION ▷ #general (574 messages🔥🔥🔥):

Links mentioned:


LAION ▷ #research (92 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #off-topic (26 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (13 messages🔥):

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):

proprietary: @everyone https://twitter.com/NousResearch/status/1771735632035127594


Nous Research AI ▷ #general (469 messages🔥🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (24 messages🔥):

Link mentioned: Why does the FeedForward have three linear layer? · Issue #1004 · meta-llama/llama: I find that the FFN implementation has three linear layers. https://github.com/facebookresearch/llama/blob/ef351e9cd9496c579bf9f2bb036ef11bdc5ca3d2/llama/model.py#L337-L345 But in the paper "Atte...


Nous Research AI ▷ #project-obsidian (3 messages):


Nous Research AI ▷ #rag-dataset (19 messages🔥):

Links mentioned:


Nous Research AI ▷ #world-sim (2 messages):

Link mentioned: Everyone Get In Here Grim Patron GIF - Everyone Get In Here Grim Patron - Discover & Share GIFs: Click to view the GIF


OpenAI ▷ #annnouncements (1 messages):

Link mentioned: Sora: First Impressions: We have gained valuable feedback from the creative community, helping us to improve our model.


OpenAI ▷ #ai-discussions (264 messages🔥🔥):

Links mentioned:


OpenAI ▷ #gpt-4-discussions (67 messages🔥🔥):


OpenAI ▷ #prompt-engineering (61 messages🔥🔥):

Links mentioned:


OpenAI ▷ #api-discussions (61 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #general (242 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (9 messages🔥):


HuggingFace ▷ #cool-finds (12 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (19 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (48 messages🔥):

Links mentioned:


HuggingFace ▷ #core-announcements (1 messages):

Link mentioned: Calculate the component-wise memory of DiffusionPipeline checkpoint · huggingface/diffusers · Discussion #7434: We shipped a Hugging Face Space that lets you calculate the memory requirements of a DiffusionPipeline checkpoint given a torch_dtype: https://huggingface.co/docs/diffusers/main/en/using-diffusers/...


HuggingFace ▷ #computer-vision (21 messages🔥):

Links mentioned:


HuggingFace ▷ #NLP (24 messages🔥):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (31 messages🔥):

Link mentioned: How to contribute to Diffusers 🧨: no description found


LlamaIndex ▷ #blog (8 messages🔥):


LlamaIndex ▷ #general (296 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-general-chat (164 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (5 messages):

Link mentioned: Tweet from swyx (@swyx): 🆕 The Unbundling of ChatGPT https://latent.space/p/feb-2024 A whole year has passed with ~0 growth in ChatGPT user numbers. Instead, users are exploring a whole host of verticalized players for ...


Latent Space ▷ #llm-paper-club-west (14 messages🔥):


Latent Space ▷ #ai-in-action-club (92 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (214 messages🔥🔥):

Links mentioned:

Some highli📝ghts:

  1. FSDP+QLoRA and DeepSpeed…": no description foundFully Sharded Data Parallel: no description foundTweet from Xiang Yue (@xiangyue96): @MistralAI just released their v0.2 Base😱. @WenhuChen and I quickly evaluated a few benchmarks using the OpenCompass evaluation package. It seems that the capability dropped a little bit on nearly al...DeepSpeed: no description foundDeepSpeed: no description foundChai Prize: Complete and win 3 days unlimited messages!GitHub - mistralai-sf24/hackathon: Contribute to mistralai-sf24/hackathon development by creating an account on GitHub.axolotl/examples/mistral/config.yml at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.trl/trl/trainer/dpo_trainer.py at 8534f0edf8608ad6bcbea9beefae380fa60ded77 · huggingface/trl: Train transformer language models with reinforcement learning. - huggingface/trlThird-party benchmark · Issue #6 · jiaweizzhao/GaLore: Hello, thank you very much for such excellent work. We have conducted some experiments using Llama-Factory, and the results indicate that Galore can significantly reduce memory usage during full pa...

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (15 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (14 messages🔥):

Link mentioned: TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ · Hugging Face: no description found


OpenRouter (Alex Atallah) ▷ #announcements (8 messages🔥):

Link mentioned: Midnight Rose 70B by sophosympatheia | OpenRouter: A merge with a complex family tree, this model was crafted for roleplaying and storytelling. Midnight Rose is a successor to Rogue Rose and Aurora Nights and improves upon them both. It wants to produ...


OpenRouter (Alex Atallah) ▷ #general (208 messages🔥🔥):

Links mentioned:


Eleuther ▷ #general (64 messages🔥🔥):

Link mentioned: Ratatouille • Flashback GIF - Ratatouille Flashback Childhood - Discover & Share GIFs: Click to view the GIF


Eleuther ▷ #research (105 messages🔥🔥):

Links mentioned:


Eleuther ▷ #interpretability-general (5 messages):

Links mentioned:


Eleuther ▷ #lm-thunderdome (26 messages🔥):

Links mentioned:


Eleuther ▷ #multimodal-general (3 messages):


CUDA MODE ▷ #general (9 messages🔥):

Link mentioned: Lecture 1 How to profile CUDA kernels in PyTorch: Slides: https://docs.google.com/presentation/d/110dnMW94LX1ySWxu9La17AVUxjgSaQDLOotFC3BZZD4/edit?usp=sharingCode: https://github.com/msaroufim/cudamodelecture1


CUDA MODE ▷ #triton (27 messages🔥):

Links mentioned:


CUDA MODE ▷ #cuda (7 messages):

Links mentioned:


CUDA MODE ▷ #algorithms (3 messages):

Links mentioned:


CUDA MODE ▷ #beginner (3 messages):


CUDA MODE ▷ #pmpp-book (12 messages🔥):


CUDA MODE ▷ #youtube-recordings (5 messages):

Link mentioned: no title found: no description found


CUDA MODE ▷ #ring-attention (21 messages🔥):

Links mentioned:


CUDA MODE ▷ #off-topic (15 messages🔥):

Links mentioned:


CUDA MODE ▷ #triton-puzzles (37 messages🔥):


Interconnects (Nathan Lambert) ▷ #news (11 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-questions (2 messages):


Interconnects (Nathan Lambert) ▷ #ml-drama (29 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (8 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #reads (19 messages🔥):

Links mentioned:


LangChain AI ▷ #general (42 messages🔥):

Links mentioned:


LangChain AI ▷ #langserve (1 messages):


LangChain AI ▷ #share-your-work (12 messages🔥):

Links mentioned:


LangChain AI ▷ #tutorials (5 messages):

Links mentioned:


LLM Perf Enthusiasts AI ▷ #general (21 messages🔥):

Link mentioned: RAG is more than just embedding search - Instructor: no description found


LLM Perf Enthusiasts AI ▷ #claude (5 messages):


LLM Perf Enthusiasts AI ▷ #resources (3 messages):


LLM Perf Enthusiasts AI ▷ #jobs (1 messages):

ibash: > write high quality code Damn.


LLM Perf Enthusiasts AI ▷ #openai (1 messages):


LLM Perf Enthusiasts AI ▷ #prompting (1 messages):

emrgnt_cmplxty: Basic prompting isn't getting it done for you?


Alignment Lab AI ▷ #looking-for-collabs (1 messages):


Alignment Lab AI ▷ #general-chat (8 messages🔥):

Link mentioned: Post-AGI Educational Reforms : no description found


Alignment Lab AI ▷ #looking-for-workers (1 messages):


Datasette - LLM (@SimonW) ▷ #llm (5 messages):

Link mentioned: GitHub - Nutlope/aicommits: A CLI that writes your git commit messages for you with AI: A CLI that writes your git commit messages for you with AI - Nutlope/aicommits


Skunkworks AI ▷ #off-topic (2 messages):

Link mentioned: Mr. Beast Meets Mistral: AI Created a Cookbook Based on His Wildest Stunts!: Today we create Beast CookbookThe "Beast Cookbook" idea is a fun and creative way to engage with Mr. Beast's content and generate an entertaining, fictional ...