Frozen AI News archive

not much happened today

The Reddit community /r/LocalLlama discusses **fine-tuning and training LLMs**, including tutorials and questions on training models with specific data like dictionaries and synthetic datasets with **25B+ tokens**. Users explore **retrieval-augmented generation (RAG)** challenges with models like **mistral-7b** and embedding generation for EEG brain activity. Discussions include **hardware optimization** for running **llama-2-70b** locally under budget constraints, and performance benchmarks for **qwen-1.5** models. There is interest in extending LLM capabilities, such as converting **llama-2-7b** into a vision-capable model like **llava** and improving model memory for longer context retention.

Canonical issue URL

We save you the most time when we can say an entire day's worth of news is skippable... and we like the (apocryphal) irony should we be wrong!

Happy peaceful reading, or check out the new Adept episode on Latent Space. We grow our Reddit coverage next week.


Table of Contents

[TOC]


REDDIT

Just starting with /r/LocalLlama for now, and we'll be summarizing the comments soon, but next we have r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence mapped out. Let us know if we're missing any major alpha drop subreddits.

/r/LocalLlama

Fine-Tuning and Training LLMs:

Retrieval-Augmented Generation (RAG) and Embeddings:

Deploying and Optimizing LLMs:

Extending LLMs:

Applications and Use Cases:


PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

Open Source Models & Frameworks

Compute Trends & Hardware

Evolutionary Model Merging

Retrieval Augmented Generation (RAG)

Emerging Trends & Applications

Prompt Engineering as a Career


PART 0: Summary of Summaries of Summaries

we are concluding that Claude Opus is just the best model for top level summaries so we're discontinuing the A/B/C tests (see archives for our struggles/record). We'll be exposing parallel runs for all 3 + more models (incl Gemini 1.5!!) as this problem is topologically similar to our personalization app we'll be launching.


PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord


Unsloth AI (Daniel Han) Discord


OpenInterpreter Discord


LM Studio Discord

Hermes 2.5 Holds the Crown: After the addition of code instruction examples, Hermes 2.5 has shown superior performance over Hermes 2 across various benchmarks, with users discussing the impact of different models and configurations on LMStudio's performance.

Tackling LM Studio Quirks and Quibbles: Members report issues with LM Studio version 0.2.17, including symlinks failing to be recognized and errors stating "Model with key Mistral/Hermes... not found." Additionally, performance discussions include abnormal CPU usage and compatibility with AMD Rocm and RX 570 graphics cards.

AI Ethics and Security - A Hot Debate: The community delved into the ethics and security of AI through discussions about interacting with models in Hugging Face's 'Guardrails Arena', and security exploits allowing the interception of encrypted AI chatbot tokens (detailed explanation here).

Model Mastery and Multitasking: Users exchanged knowledge on optimizing the functionality of multimodal models in LM Studio, dealing with issues of VRAM limitations, and using multi-model setups to improve complex tasks. The conversation also included advice on models that facilitate "Full GPU Offload Possible" on personal machines with specific capacities.

AMD ROCm - Going for Stability or Stirring Up Storms?: The release of ROCm 0.2.17 Beta v3 generated mixed feedback, with members reporting issues related to ejecting models, GPU offloading, ZLUDA interference, and high CPU utilization. Despite these challenges, several reported stable performance on AMD GPUs, suggesting potential improvements in the latest ROCm beta version.

Streamlining AI Workflows: Engineers recommend exploring the Instructor library for structured outputs in language model workflows and sharing successful integrations of special fine-tuned versions of OpenChat with the dolphin mistral fine-tune to enhance language modeling efficiency.


Perplexity AI Discord

The sources cited for technical reference included Inflection-2.5, Neuralink's first human trial patient insights, and Perplexity's nature as a possible Google Search wrapper according to Analytics India Magazine. The Perplexity documentation was noted for clarifying token counts.


LAION Discord


Nous Research AI Discord


OpenAccess AI Collective (axolotl) Discord


Latent Space Discord


LlamaIndex Discord

Sensitive Data Meets AI Safely: LlamaIndex blog highlighted the risks of training LLM/RAG apps with sensitive data such as patient clinical reports and proposed using differential privacy to protect individual information, with insights shared via a blog post tweet.

Navarasa 2.0 Embraces Diversity: The blog introduced Navarasa 2.0, the upgraded Google Gemma 7B fine-tuned for 15 Indian languages, emphasizing the value of local language support in AI, highlighted through a release tweet.

UX Gets Smarter: A new UX template featured on LlamaIndex aims to enhance agent-human interactions by limiting agent requests for human input to necessary instances, with more information available in the associated tweet.

Integration Headaches!: Discord members discussed the complexities of integrating various tools with a chatbot and encountered issues like "BadRequestError," with documentation suggestions and troubleshooting advice shared in the heated conversation.

Documentation Drama: Users wrestled with accessing the LlamaIndex documentation amidst an update to MKDocs, shared links to the new documentation format, and offered clarification on a query pipeline DAG confusion detailed here.


Eleuther Discord

Quest for Compact Code Datasets: The CodeSearchNet corpus was considered as a pretraining dataset but encountered issues with context length, and instead, The MiniPile, a 1M document corpus, was suggested for its diverse and compact size suitable for pre-training with minimal performance loss.

Under the Hood of Closed-Source Models: The community discussed the lack of access to logprobabilities and tokenizers in closed-source models like Claude and Gemini, in contrast to platforms like OpenAI that readily provide them, speculating proprietary reasoning behind the restriction.

Maximize Your Model's GPU Potential: Guidelines from a recent paper on maximizing GPU runtime performance for transformer models included hyperparameter tuning and efficient model shapes, potentially increasing throughput by up to 39%.

AI Venturing into Biotechnology: An Ars Technica article on AI in antibody design sparked discussions, revealing both excitement for the promise of diffusion models and skepticism regarding their practical economic applications.

Easing the Debugging Headache: Participants faced issues when using megatron-deepspeed with lm-eval 0.3.0 and proposed workarounds like loading from an older version of cais/mmlu, which was still problematic due to auxiliary train split relocations, as indicated by a Gist traceback.


HuggingFace Discord

ASCII Art Gets a Dataset and Develops in Diffusion: Engineers shared excitement over ASCII Art with the unveiling of an ASCII Art dataset, and discussions on fine-tuning LLMs and diffusion models to generate ASCII art. A particular challenge is fine-tuning a language model to generate intricate designs, prompting a search for efficient training methods and the idea of an ASCII-adaptive diffusion model.

SMIT Brings Audio to Language Models: A new modality integration tool named SMIT was introduced, making it easier to include audio in language models. A YouTube demonstration of SMIT for music generation models piqued the interest for its potential applications. Meanwhile, Fluently-v4 was globally released, offering a single model solution for multiple tasks.

1-bit LLMs Promise Efficiency: The paper on 1-bit LLM BitNet b1.58 suggested significant performance matching full-precision models while optimizing for cost-efficiency. This could lead to the development of 1-bit optimized hardware for LLMs.

New Approaches and Tools in Various AI Domains: SegGPT's introduction adds to the toolset for image segmentation tasks, promising one-shot results. The UniProt project’s 1024-dimensional embeddings are poised for retraining with Matryoshka embeddings for better searchability in protein databases. A profound exploration of obesity trends using data analysis sets a new precedence for health-related AI research.

Community Collaborations Flourish in Model Development and Federated Learning: The search for collaboration grows with members seeking assistance on projects from federated learning for load forecasting, sharing possibilities like the 6TB "The Stack" dataset for deep code generation, and invoking BERTopic for modernized topic modeling. Concerns over quantizing finely-tuned models and issues around the Trainer class in Huggingface were discussed, reflecting a shared commitment to overcoming technical hurdles together.


OpenAI Discord


LangChain AI Discord


OpenRouter (Alex Atallah) Discord


CUDA MODE Discord


LLM Perf Enthusiasts AI Discord


Interconnects (Nathan Lambert) Discord


Alignment Lab AI Discord

Calling All Open Source Enthusiasts: A community member is seeking collaboration on the 01, a fully open source hardware device, and has shared details in a public tweet.


PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #general-chat (884 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (696 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (35 messages🔥):

Link mentioned: Quantization: no description found


Unsloth AI (Daniel Han) ▷ #help (92 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (30 messages🔥):

Link mentioned: Samantha Mistral Instruct 7b - Comprehensive Bulleted Notes: no description found


Unsloth AI (Daniel Han) ▷ #suggestions (14 messages🔥):

Link mentioned: GitHub - Lightning-AI/lightning-thunder: Source to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed.: Source to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed. - Lightning-AI/lightning-thunder


OpenInterpreter ▷ #general (254 messages🔥🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (286 messages🔥🔥):

Links mentioned:


OpenInterpreter ▷ #ai-content (1 messages):

cyanidebyte: https://www.youtube.com/watch?v=Q_p82HtBqoc


LM Studio ▷ #💬-general (305 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (29 messages🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (26 messages🔥):


LM Studio ▷ #🎛-hardware-discussion (98 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (10 messages🔥):


LM Studio ▷ #autogen (10 messages🔥):


LM Studio ▷ #langchain (2 messages):

Links mentioned:


LM Studio ▷ #amd-rocm-tech-preview (23 messages🔥):

Link mentioned: no title found: no description found


Perplexity AI ▷ #general (340 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (19 messages🔥):


Perplexity AI ▷ #pplx-api (26 messages🔥):

Link mentioned: Chat Completions: no description found


LAION ▷ #general (369 messages🔥🔥):

Links mentioned:


LAION ▷ #research (4 messages):


Nous Research AI ▷ #ctx-length-research (21 messages🔥):


Nous Research AI ▷ #off-topic (5 messages):

Link mentioned: Finetune MultiModal LLaVA: This video explains how to fine-tune llava model#llm #ml #ai #deeplearning #largelanguagemodels #python https://wandb.ai/byyoung3/ml-news/reports/How-to-Fine...


Nous Research AI ▷ #interesting-links (23 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (126 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (21 messages🔥):

Link mentioned: GitHub - casper-hansen/AutoAWQ at striped_hyena: AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation: - GitHub - casper-hansen/AutoAWQ at striped_hyena


Nous Research AI ▷ #project-obsidian (3 messages):


Nous Research AI ▷ #rag-dataset (38 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (213 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (6 messages):

Link mentioned: FEAT / Optim: Add GaLore optimizer by younesbelkada · Pull Request #29588 · huggingface/transformers: What does this PR do? As per title, adds the GaLore optimizer from https://github.com/jiaweizzhao/GaLore Fixes: #29512 This is how I am currently testing the API: import torch import datasets from ...


OpenAccess AI Collective (axolotl) ▷ #general-help (10 messages🔥):


Latent Space ▷ #ai-general-chat (120 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (4 messages):


Latent Space ▷ #llm-paper-club-west (10 messages🔥):


Latent Space ▷ #ai-in-action-club (92 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #blog (4 messages):


LlamaIndex ▷ #general (184 messages🔥🔥):

Links mentioned:


Eleuther ▷ #general (51 messages🔥):

Links mentioned:


Eleuther ▷ #research (59 messages🔥🔥):

Links mentioned:


Eleuther ▷ #lm-thunderdome (73 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

Links mentioned:


HuggingFace ▷ #general (76 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

Links mentioned:


HuggingFace ▷ #cool-finds (10 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (23 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (2 messages):

Link mentioned: Deciphering Obesity Trends 📉: An In-depth EDA 📊: Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources


HuggingFace ▷ #computer-vision (4 messages):

Link mentioned: SegGPT: no description found


HuggingFace ▷ #NLP (33 messages🔥):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (28 messages🔥):

Links mentioned:


OpenAI ▷ #ai-discussions (40 messages🔥):

Links mentioned:


OpenAI ▷ #gpt-4-discussions (11 messages🔥):


OpenAI ▷ #prompt-engineering (41 messages🔥):


OpenAI ▷ #api-discussions (41 messages🔥):


LangChain AI ▷ #general (96 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #langserve (7 messages):

Links mentioned:


LangChain AI ▷ #share-your-work (5 messages):

Link mentioned: GitHub - alexmavr/promptsage: Promptsage is an LLM prompt builder, linter and sanitizer with built-in guardrails: Promptsage is an LLM prompt builder, linter and sanitizer with built-in guardrails - alexmavr/promptsage


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):


OpenRouter (Alex Atallah) ▷ #general (53 messages🔥):

Links mentioned:


CUDA MODE ▷ #general (4 messages):


CUDA MODE ▷ #triton (1 messages):

Link mentioned: [WIP] Fused Adam Triton Kernels by jeromeku · Pull Request #29 · jiaweizzhao/GaLore: Fused GaLore Adam (WIP) Various fused implementations of Adam update step per Gradient Low-Rank Projection This is an initial attempt at optimizing the update step of the GaLore Adam optimizer. Ove...


CUDA MODE ▷ #cuda (1 messages):

Link mentioned: GitHub - mlecauchois/micrograd-cuda: Contribute to mlecauchois/micrograd-cuda development by creating an account on GitHub.


CUDA MODE ▷ #torch (3 messages):

Links mentioned:


CUDA MODE ▷ #algorithms (9 messages🔥):

Links mentioned:


CUDA MODE ▷ #suggestions (5 messages):

Link mentioned: Parallel Processing and Applied Mathematics: no description found


CUDA MODE ▷ #pmpp-book (2 messages):


CUDA MODE ▷ #off-topic (2 messages):

Links mentioned:


CUDA MODE ▷ #triton-puzzles (12 messages🔥):


LLM Perf Enthusiasts AI ▷ #general (21 messages🔥):

Link mentioned: RAG is more than just embedding search - Instructor: no description found


LLM Perf Enthusiasts AI ▷ #claude (5 messages):


LLM Perf Enthusiasts AI ▷ #jobs (1 messages):

ibash: > write high quality code Damn.


LLM Perf Enthusiasts AI ▷ #openai (1 messages):

jeffreyw128: lol wut


LLM Perf Enthusiasts AI ▷ #prompting (1 messages):

emrgnt_cmplxty: Basic prompting isn't getting it done for you?


Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (15 messages🔥):


Interconnects (Nathan Lambert) ▷ #ml-questions (6 messages):


Interconnects (Nathan Lambert) ▷ #random (5 messages):

Link mentioned: Tweet from Machine Learning Street Talk (@MLStreetTalk): We just dropped the show with @MinqiJiang and @MarcRigter and discuss the philosophy of whether it is possible, in principle and in practice to build a "generalist agent" in RL.


Alignment Lab AI ▷ #looking-for-collabs (1 messages):


Alignment Lab AI ▷ #general-chat (1 messages):

venadore: life lesson


Skunkworks AI ▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=21Tc92g15pM