Frozen AI News archive

1/16/2024: TIES-Merging

**TheBloke's Discord** community actively discusses **Mixture of Experts (MoE) models**, focusing on **random gate routing layers** for training and the challenges of immediate model use. There is a robust debate on **quantization methods**, comparing **GPTQ** and **EXL2 quants**, with EXL2 noted for faster execution on specialized hardware. A new model, **Nous Hermes 2**, based on **Mixtral 8x7B** and trained with **RLHF**, claims benchmark superiority but shows some inconsistencies. The **Frontier supercomputer** at Oak Ridge National Laboratory is highlighted for training a **trillion-parameter LLM** with **14TB RAM**, sparking discussions on open-sourcing government-funded AI research. Additionally, the application of **ghost attention** in the **academicat** model is explored, with mixed reactions from the community. *"Random gate layer is good for training but not for immediate use,"* and *"EXL2 might offer faster execution on specialized hardware,"* are key insights shared.

Canonical issue URL

As highlighted in recent issues, model merging is top of everyone's minds. We featured Maxime Labonne's writeup 2 days ago, and the TIES paper is now making the rounds again.

image.png

Digging into the details, the results are encouraging but not conclusive.

image.png

--

Table of Contents

[TOC]

TheBloke Discord Summary

MoE Model Mixology: Discussions circled around creating efficient MoE (Mixture of Experts) models, with experiments in random gate routing layers for training and the potential of merging top models from benchmarks. @sanjiwatsuki posited that while beneficial for training, random gate layers may not be ideal for immediate model usage.

Quantize with Caution: A robust debate ensued over the efficacy of various quantization methods, comparing GPTQ and EXL2 quants. There was a general consensus that EXL2 might offer faster execution on specialized hardware, but the full scope of trade-offs requires further exploration.

The Narrative Behind Model Fine-Tuning: @superking__ flagged potential, undisclosed complexities in finetuning Mixtral models, citing recurring issues across finetunes. Additionally, a mention was made of a frankenMoE model, presumably optimized and performing better in certain benchmarks, available at FrankenDPO-4x7B-bf16 on Hugging Face.

Training Anomalies and Alternatives: The perplexing occurrence of a model's loss dropping to near zero sparked discussions about possible exploitation of the reward function. Alternatives to Google Colab Pro for cost-effective fine-tuning were discussed, with vast.ai and runpod recommended as potential options.

Supercomputing in the Name of AI: The community was abuzz about Oak Ridge National Laboratory's Frontier supercomputer used to train a trillion-parameter LLM, stirring debates on the openness of government-funded AI research. Meanwhile, @kaltcit boasted about incorporating ghost attention within their 'academicat' model, eliciting both skepticism and curiosity from peers.

TheBloke Channel Summaries

▷ #general (1786 messages🔥🔥🔥):

Links mentioned:

▷ #characters-roleplay-stories (43 messages🔥):

Links mentioned:

▷ #training-and-fine-tuning (24 messages🔥):


Nous Research AI Discord Summary

Nous Research AI Channel Summaries

▷ #off-topic (266 messages🔥🔥):

Links mentioned:

▷ #interesting-links (378 messages🔥🔥):

Links mentioned:

▷ #announcements (1 messages):

Links mentioned:

▷ #general (321 messages🔥🔥):

Links mentioned:

▷ #ask-about-llms (96 messages🔥🔥):

Links mentioned:

GitHub - huggingface/chat-ui: Open source codebase powering the HuggingChat app: Open source codebase powering the HuggingChat app. Contribute to huggingface/chat-ui development by creating an account on GitHub.


OpenAI Discord Summary

OpenAI Channel Summaries

▷ #ai-discussions (113 messages🔥🔥):

Links mentioned:

▷ #gpt-4-discussions (82 messages🔥🔥):

Links mentioned:

▷ #prompt-engineering (159 messages🔥🔥):

▷ #api-discussions (159 messages🔥🔥):


Mistral Discord Summary

Mistral Channel Summaries

▷ #general (75 messages🔥🔥):

Links mentioned:

▷ #models (3 messages):

▷ #deployment (2 messages):

▷ #finetuning (31 messages🔥):

Links mentioned:

TIES-Merging: Resolving Interference When Merging Models: Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sa...

▷ #random (1 messages):

▷ #la-plateforme (74 messages🔥🔥):

Links mentioned:


Eleuther Discord Summary

Eleuther Channel Summaries

▷ #general (95 messages🔥🔥):

Links mentioned:

▷ #research (62 messages🔥🔥):

Links mentioned:

▷ #interpretability-general (4 messages):

▷ #gpt-neox-dev (16 messages🔥):

Links mentioned:


LM Studio Discord Summary

LM Studio Channel Summaries

▷ #💬-general (77 messages🔥🔥):

Links mentioned:

▷ #🤖-models-discussion-chat (59 messages🔥🔥):

Links mentioned:

▷ #🧠-feedback (2 messages):

▷ #🎛-hardware-discussion (6 messages):


HuggingFace Discord Discord Summary

HuggingFace Discord Channel Summaries

▷ #general (62 messages🔥🔥):

Links mentioned:

Bingsu/adetailer at main

▷ #today-im-learning (3 messages):

Links mentioned:

GitHub - kuleshov/minillm: MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs: MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs - GitHub - kuleshov/minillm: MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs

▷ #cool-finds (2 messages):

Links mentioned:

GitHub - matthew-pisano/UniversalModels: An adapter between Huggingface transformers and several different APIs: An adapter between Huggingface transformers and several different APIs - GitHub - matthew-pisano/UniversalModels: An adapter between Huggingface transformers and several different APIs

▷ #i-made-this (13 messages🔥):

Links mentioned:

▷ #reading-group (1 messages):

annorita_anna: I would love to see this happen too!🤍

▷ #diffusion-discussions (5 messages):

Links mentioned:

Emu Edit: Precise Image Editing via Recognition and Generation Tasks

▷ #computer-vision (1 messages):

▷ #NLP (12 messages🔥):

Links mentioned:

Gyazo Screen Video:

▷ #diffusion-discussions (5 messages):

Links mentioned:

Emu Edit: Precise Image Editing via Recognition and Generation Tasks


Perplexity AI Discord Summary

Perplexity AI Channel Summaries

▷ #general (88 messages🔥🔥):

Links mentioned:

▷ #sharing (4 messages):

Links mentioned:

Tweet from Aravind Srinivas (@AravSrinivas): Perplexity Android Users: Thanks for waiting patiently for the widget! Enjoy!

▷ #pplx-api (4 messages):


OpenAccess AI Collective (axolotl) Discord Summary

OpenAccess AI Collective (axolotl) Channel Summaries

▷ #general (16 messages🔥):

▷ #axolotl-dev (31 messages🔥):

Links mentioned:

▷ #general-help (15 messages🔥):

▷ #bots (4 messages):

▷ #runpod-help (3 messages):


LlamaIndex Discord Discord Summary

LLMs Query Tables with Style: A new paper showcasing Language Models' abilities to query tabular data using textual and symbolic reasoning was highlighted, indicating the current state and potential of LLMs in this domain. Details and discussions can be found at this link and an accompanying image is available here.

Vector Search Goes Multi-Tenant: The complexities of implementing multi-tenancy in vector search, particularly in the context of private data and retrieval-augmented generation applications, was dissected in a recent blog post. Insights and full content, as well as a visual aid, are available here and here, respectively.

Collaborate on LlamaIndex Publishings: LlamaIndex blog openings for authors was a hot topic, with members discussing who to contact and how to get involved; @493606302971592747 was mentioned as a key contact. For those interested, an informative compatibility report to aid in selecting the appropriate LLM for local datasets was shared, LlamaIndex compatibility report link.

Data Storage Choices Clarified: LlamaIndex's data storage policy was clarified wherein data embedding and responses default through OpenAI, but storage is user's choice as no dedicated cloud is offered. Additionally, role assignment in GPT mimicking OpenAI's capabilities was touched upon, with SimpleChatEngine documentation provided for guidance.

AI Propels Dynamic Databases and Data Querying: Enthusiasm was shown for a Chain-of-Table framework aimed at enhancing data interpretation through LlamaIndex, explained in detail in a Medium article. A Twitter post introduced the fluid database concept meant for AI agents that dynamically updates its schema, further information is available on GitHub. Querying capabilities integrating tables with LlamaIndex's technology was also discussed, with an illustrative Medium article on the procedure.

LlamaIndex Discord Channel Summaries

▷ #blog (2 messages):

▷ #general (48 messages🔥):

Links mentioned:

▷ #ai-discussion (5 messages):

Links mentioned:


DiscoResearch Discord Summary

DiscoResearch Channel Summaries

▷ #mixtral_implementation (21 messages🔥):

Links mentioned:

▷ #general (9 messages🔥):

Links mentioned:

▷ #embedding_dev (16 messages🔥):

Links mentioned:


Latent Space Discord Summary

Latent Space Channel Summaries

▷ #ai-general-chat (21 messages🔥):

Links mentioned:

▷ #llm-paper-club (1 messages):


LangChain AI Discord Summary

LangChain AI Channel Summaries

▷ #general (15 messages🔥):

Links mentioned:

Loom | Free Screen & Video Recording Software: Use Loom to record quick videos of your screen and cam. Explain anything clearly and easily – and skip the meeting. An essential tool for hybrid workplaces.

▷ #langserve (1 messages):

▷ #share-your-work (3 messages):

Links mentioned:


Skunkworks AI Discord Summary

Skunkworks AI Channel Summaries

▷ #general (16 messages🔥):

Links mentioned:

Tweet from Teknium (e/λ) (@Teknium1): It's finally time! Our Mixtral 8x7B model is up and available now! Nous-Hermes-2 Mixtral 8x7B comes in two variants, an SFT+DPO and SFT-Only, so you can try and see which works best for you! It&...

▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=KGqWqgloSfY


LAION Discord Summary

LAION Channel Summaries

▷ #general (3 messages):

Links mentioned:

Many AI Safety Orgs Have Tried to Criminalize Currently-Existing Open-Source AI

▷ #research (1 messages):

mkaic: hf papers no update today, sadge


Datasette - LLM (@SimonW) Discord Summary

Only 1 channel had activity, so no need to summarize...


Alignment Lab AI Discord Summary

Only 1 channel had activity, so no need to summarize...

teknium: https://fxtwitter.com/Teknium1/status/1746990384738357731


YAIG (a16z Infra) Discord Summary

Only 1 channel had activity, so no need to summarize...

Links mentioned:

@aws-lambda-powertools/commons vs @cloudflare/kv-asset-handler vs aws-lambda vs miniflare vs netlify vs vercel vs wrangler | npm trends: Comparing trends for @aws-lambda-powertools/commons 1.17.0 which has 188,823 weekly downloads and unknown number of GitHub stars vs. @cloudflare/kv-asset-handler 0.3.0 which has 664,546 weekly downloa...