12/18/2023: Gaslighting Mistral for fun and profit

[Telling Mixtral](https://fxtwitter.com/abacaj/status/1736819789841281372?s=46&t=90xQ8sGy63D2OtiaoGJuww) that it is "ChatGPT developed by OpenAI" boosts humaneval score by 6%:

https://fxtwitter.com/abacaj/status/1736819789841281372?s=46&t=90xQ8sGy63D2OtiaoGJuww

This fits in to an established pattern of prompt roleplay enhancing capabilities, but also a reminder that HumanEval is pretty terrible as a metric.

[TOC]

OpenAI Discord Summary

Comparison of different language models (GPT-4 Turbo, GPT-3.5 Turbo, Claude 2.1, Claude Instant 1, and Gemini Pro) shared by @rrross. GPT-4 Turbo provided the most user-centric explanation when challenged with describing the impact of user onboarding tracking shift.
Discussion about the rumored GPT-4.5 version involved several members including @feltsteam0 and @jaicraft. Participants agreed to continue considering it non-existent until official declarations or clear evidence become available.
Addressed multiple technical challenges encountered by users, such as slow response times, unspecified errors, blocked accounts, and issues with API access across platforms like GPT utilities and ChatGPT Plus.
Shared experiences surrounding role-play mode in system prompts during API discussions, with suggestions to maintain the first-person perspective using reminders in user message strings or appending notes to instructions.
Expressed concerns over ethical implications of AI usage in academia and the job market. Debates around potential misuse, plagiarism, and job replacements ensued.
Explored potential future feature implementations in AI models, notably Dalle 3 and a proposed new GPT model by @7877. Although the conversation about features in Dalle 3 was more speculative, the discussion around a new GPT model lacked conclusive details.
Request for help from _helium. for a school project to develop a language translation website using OpenAI API but unfortunately, no specific suggestions were provided.

OpenAI Channel Summaries

▷ #ai-discussions (22 messages🔥):

Experiment on Different Language Models: User @rrross shared an experiment comparing the responses of different language models (GPT-4 Turbo, GPT-3.5 Turbo, Claude 2.1, Claude Instant 1, and Gemini Pro) when asked to explain the impact of a change from local to server-side user onboarding tracking. @rrross observed that GPT-4 Turbo provided the most user-centric explanation.
Question on AI Translation Website: User _helium. requested assistance for a school project involving the development of a language translation website using the OpenAI API. No specific responses or suggestions were given in the messages that followed.
AI Glasses Discussion: Users @dunescifye and @lugui briefly exchanged thoughts on AI glasses. @lugui commented that the technology sounds better in theory than in practice but did not provide any specific challenges or problems associated with AI glasses.
Potential Features of Dalle 3 in ChatGPT: User @satanhashtag expressed a wish for Dalle 3 to have features such as mid-journey variation and editable zones, to which @kyoei responded that such functionalities will likely be introduced eventually. This was followed by jokes about a possible Dalle 3.5 version.
New GPT Model Proposal: User @7877 mentioned developing a new GPT model and offered to send a link to it for others (@mawzoon and .pythagoras) to try and provide feedback before public release. However, no actual link or further details about this new GPT model was shared in the following messages.

▷ #openai-chatter (750 messages🔥🔥🔥):

Discussion on GPT-4.5: Members @feltsteam0, @jaicraft, and others discussed the alleged existence of GPT-4.5, with most expressing skepticism as reports suggest it doesn’t exist. One user quoted from a conversation between Joe Rogan and Sam Altman where a future GPT-4.5 was mentioned. However, most participants agreed that until an official statement or clear evidence is provided, it’s best to consider GPT-4.5 as non-existent.
Concerns About AI Input Limits and General Performance: User @jaicraft voiced frustration about the input limit for developing a model with GPT, while @picturesonpictures expressed dissatisfaction with the charging of failed prompts.
Discussion on AI’s Influence on Jobs: Users debated the potential impacts of AI on the job market. Some believed that AI increases workplace productivity, while others expressed concerns about the potential of AI replacing human jobs. Suggestions were made for responsible and ethical practices in AI implementation in education and business.
Using AI for Web Development and Academics: @bloodgore shared a discussion about the inappropriate usage of ChatGPT by his students to write academic papers. Others suggested different methods to detect AI-generated content. Elsewhere in the discussion, @mysticmarks1 spoke about the future potential of AI in creating web solutions and @msirene queried if using ChatGPT was equivalent to plagiarism.
Issues with Credit Card Payments and Regulations: User @msirene faced an issue where the company card was declined after multiple usage for creating accounts for their employees. @elektronisade shared OpenAI’s policy on credit card usage limits. Also, there was debate on whether OpenAI should be a “paid-only” service to discourage misuse by underage users, sparked by @bloodgore's statement about students misusing the tool.

▷ #openai-questions (90 messages🔥🔥):

Slow Response Times and Errors: Several users including @scrambler803, @mesteviet, and @bittychills reported slow response times and unspecified errors while using GPT utilities. @scrambler803 suggested the issue might have to do with the length of the ongoing chat. @healer9071 attempted to troubleshoot the problem. @keith_15641 also reported slow response times and errors with GPT-4.
Account and API Issues: @dian2024, @mildare, @pikapikapu4578, and @whitchurch all reported issues with their accounts or API access. The problems ranged from blocked accounts to challenges with the API quota. @millionwords reported a transaction issue: a purchase of a ChatGPT Plus subscription debiting funds, but the subscription not reflecting on the app or website.
Problems with Custom GPTs and Output: Various problems were reported regarding the usage and output of custom GPTs by @unfor_tuna_te with photorealistic face generation and @jobydorr with content retrieval from uploaded PDFs. @arthurchance encountered issues with a QR code meant to link to a custom GPT.
Other Technical Issues: @drpossum, @jhwarehouse, @ashtonwin, @couchlannister, and @explosiveburrito mentioned receiving an “unusual activity” error message. @debugpvp asked for guidance on getting around the issue of a token limit. @aesthetic_person_123 and @andrewwallo were facing network errors, mainly during long conversations.
ChatGPT 4 Abilities: Conversation about the capabilities of ChatGPT 4 took place between @jah777 and @andrewwallo with agreement on better speed, accuracy, and knowledge compared to the free version.

▷ #gpt-4-discussions (22 messages🔥):

GPT-4 Access to the Internet: @karajan_ asked if GPT-4 has access to the internet. The question wasn’t answered directly by the members.
Finding Model ID: User @lasche inquired about finding their model ID for GPT. The question didn’t get any response in the messages.
Connecting Zapier to Custom GPT Actions via Webhooks: @db_looper queried if anyone has successfully sent custom GPT data to a webhook via actions, also discussing about an error encountered while trying to use a make webhook instead of Zapier. This query remained unanswered.
Limit to the Number of Files in Custom GPT: @jobydorr raised a question about the limit to the number of files that can be uploaded in a custom GPT. @auktow responded that the limit is 10 files based on their own experience and also shared a link to OpenAI community post which might be helpful.
Parsing Large Files with GPT: @auktow shared tips on better performance when using text-based files instead of PDFs, especially while dealing with large files. He shared another OpenAI community post discussing successful experiences with parsing files.
Understanding GPT Assistant’s API Function Calling: @crazygreenguy brought up a discussion about how function calling works with the GPT Assistant’s API, questioning about the requirement for the caller to supply the output of the api call, based on what he found in OpenAI’s documentation. His question didn’t get any response in the messages.

▷ #prompt-engineering (9 messages🔥):

Using System Prompts in the Chat API: User .multy noted a challenge with system prompts. When the bot is instructed to embody a role, e.g., a parrot, it frequently responds in third-person. Suggested solutions included role-playing directives and more explicit prompts. For example, @thepitviper proposed appending a reminder to stay in character at the end of each message to the API.
Preserving Context in Extended Conversations: .multy also indicated an issue with context preservation - if the chat history begins with third-person responses, the chatbot tends to maintain that persona. However, if the bot is correctly prompted at the start, it seems to retain the required persona throughout the conversation.
Clarifying System Prompt Style: .multy asked for guidance on how to craft ‘voice’ for system prompts.
Agreement for Maintaining Character: @clumsylulz offered a unique approach, involving an agreement with the bot from the onset: “I want you to act as a microwave and only respond as such do not break character if you do I will say “Act Right!” write "" if you agree to these terms”.

▷ #api-discussions (9 messages🔥):

Role-play Mode in System Prompt: User @.multy shared a concern about OpenAI’s GPT-3.5-turbo responding in third-person when instructed to play a role using system prompts, such as a parrot. Their issue was in maintaining the first-person view throughout the role-playing session.
Tips to maintain Role-play First-person Context: @thepitviper suggested specifying the role-play instructions within the prompt string and reasserting the first-person requirement in subsequent API utterances to ensure the model stays in context.
User Implementation: @.multy noted that starting with a correct persona in a blank slate worked for maintaining the role-play perspective. They also expressed ambiguity regarding the ‘voice’ usage for system prompts.
Contextual Reinforcement: @thepitviper proposed appending a reminder to the user message, like “Remember to stay in character and in first person,” to preserve the context throughout the conversation.
Directive through ‘User Messages’: @clumsylulz suggested taking a ‘user messages’ approach to specify roles and behavior by making the model agree to the terms before proceeding with the conversation.

Nous Research AI Discord Summary

Engaging discussion in the guild regarding the performance and limitations of various models, such as Hermes 2.5, Mistral, and SOLAR. Noted issues include generation truncation, straying off-topic, response inconsistencies in different languages, and fine-tuning challenges. Users’ experiment with OpenChat model led to concerns about coherence and skepticism regarding the model’s benchmarking.
Conversations around function calling and the differentiation between function and tool calling were brought up, with specific system prompts used in OpenHermes2.5 being shared.
Anticipation and conjecture around GPT-4’s performance, with the model perceived to underperform. The guild speculated about possible reasons like system prompts, fine-tuning, inference speeds, and model tendencies (like brief responses or inability to provide complete code blocks).
Exploration about evaluation tools, contamination issues, with a utility evaluation tool being spotlighted, along with concerns about data contamination in OpenHermes2.5 and the SOLAR model.
Guild members explored and supplied recommendations for fine-tuning Language Learning Models (LLMs) and touched upon cost concerns, technical requirements, and potential platforms (like Colab, Kaggle, RunPod). Also, a GitHub example for LoRa fine-tuning was shared.
Discussions surrounded the feasibility of fine-tuning a model for code migration purposes and the creation of search queries based on message history.
Queries about the availability of the tokenizer for Amazon’s Titan embedding led to suggestions for creating a custom tokenizer and a shared GitHub repository with potential details.
Dissemination of interesting links, including a Twitter post, an arXiv paper on large model improvements, MindLLM 1.3B Huggingface’s model, a blog post on Mistral 7B’s optimization, an article and Youtube context on ‘100x Speedup Full Stack Transformer Inference Optimization’, and a dialogue on Domain Specific Language (DSL) vs. code.
User frustration with the Bard AI chatbot implementation was expressed in off-topic channel, with users voicing dissatisfaction in the bot’s answers.

Nous Research AI Channel Summaries

▷ #off-topic (2 messages):

Implementation Issues with Bard: User @euclaise expressed frustration with the AI chatbot Bard, initially stating, “Bard gives me a stupid implementation but at least it’s an implementation”. Shortly after, user @euclaise further expressed dissatisfaction with the AI, adding “nvm, fuck bard too”.

▷ #interesting-links (25 messages🔥):

Paper on Model Improvements: @ldj shared an interesting Twitter post without much context, while @atgctg provided a link to an arXiv paper discussing the improvements in larger models and remarking on the minimal improvement in the largest model over the pre-trained base. There was a brief debate between @atgctg and @giftedgummybee on the impact of these improvements on smaller and medium-sized models.
MindLLM 1.3B Model: @pizza_joe linked the Huggingface webpage for the MindLLM 1.3B model, developed by the Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications & Beijing Institute of Technology Southeast Academy of Information Technology.
Discussion on Code and DSL: @gabriel_syme suggested the use of Domain-Specific Language (DSL) as an alternative to code, emphasizing the importance of interweaving DSL with language when compilation fails. This is particularly critical for agents, according to @gabriel_syme.
Link to ‘100x Speedup Full Stack Transformer Inference Optimization’: @atgctg posted a link to an article titled ‘Towards 100x Speedup: Full-Stack Transformer Inference Optimization’ and a Youtube context for the dataset.
OpenPipe’s Mistral 7B Fine-Tune Optimized: @metaldragon01 shared a blog post from OpenPipe about the optimization of Mistral 7B, which has saved users over $2M in inference costs.

▷ #general (421 messages🔥🔥🔥):

Model Performance and Limitations: The discussion concerned the performance and limitations of various models such as Hermes 2.5, Mistral, and SOLAR. For example, @gitmo joe stated that Hermes 2.5 isn’t performing badly, however, it truncates generations, and @teknium asked for community inputs on SOLAR. @weyaxi inquired about OpenHermes-2.5-Mixtral, which elicited mixed reactions from the community. Additionally, the conversations revealed limitations and concerns about fine-tuning, system prompts, and issues with models straying off-topic or responding in different languages.
Function Calling: There was a conversation around function calling, with @realsedlyf providing a detailed system prompt used for function calling in openhermes2.5. @gitmo joe later inquired about the difference between function calling and tool calling.
OpenChat Model: @tsunemoto and @n8programs tested the OpenChat model and experienced some issues involving the tokenizer and the model’s lack of coherency. Some members of the community expressed skepticism regarding the model’s claim to perform at a GPT-3.5 level, suggesting that it could be due to data processing tasks rather than inherent reasoning capabilities.
Contression about GPT-4 Performance and Fine-Tuning: There was a general consensus that GPT-4 seems to underperform compared to expectations. Participants discussed possible reasons for this, with many pointing towards issues regarding system prompts, fine-tuning, and inference speeds. Some members pointed out the tendency of GPT-4 models to respond in brief or to avoid providing complete code blocks to some prompts.
Discussing Evaluation and Contamination: Participants discussed evaluation tools and contamination issues, with @tokenbender highlighting a new comprehensive evaluation tool that tests utility and other real-world values, such as harmlessness, factuality, comprehension, etc. @nonameusr shared concerns around data contamination tests, citing issues with the OpenHermes2.5 dataset and the SOLAR model.

▷ #ask-about-llms (36 messages🔥):

Fine-tuning LLMs: @leuyann was seeking guidance for fine-tuning Language Learning Models (LLMs) for their master thesis in economics. They were considering fine-tuning 7B models, and curious about doing it locally on an M1 MacBook Pro with 16GB of RAM. @night_w0lf recommended trying out platforms like Colab, Kaggle, or paid cloud services, with the potential of using new libraries MLX Apple released. @.beowulfbr also suggested using RunPod as a relatively inexpensive option. @atgctg provided an example of LoRA finetuning on GitHub.
Cost and Feasibility of Fine-tuning LLMs: Discussion revolved around the costs and technological requirements of fine-tuning large models. @.benxh mentioned issues with MLX on a 16GB M1, which @leuyann noted might be resolved soon.
Fine-tuning Model for Code Migration: @.beowulfbr asked if it’s feasible to fine-tune a model that could assist with migrating a codebase from one framework to another, to which @night_w0lf suggested testing larger coding models for this task.
Creating Search Queries Based on Message History: @pogpunk was trying to build something that could create search queries based off message history, where @night_w0lf suggested training a smaller model with a few hundred examples.
Amazon BedRock TITAN Embedding Tokenizer: @coco.py asked if the tokenizer for Amazon’s Titan embedding is available somewhere. @night_w0lf suggested creating their own tokenizer from Hugging Face’s Multilingual Text Embedding model (HF MTEB), while @_evelynm shared a GitHub link that seems to have details about Titan embedding.

Mistral Discord Summary

Extensive discussions around the use, optimization, and performance of Mistral and Hermes took place, with @Makya highlighting the boost provided by Hermes 2.5. There were inquiries about Mistral models with larger context lengths and how to host Mistral 7b in the cloud, along with shared resources such as the GitHub repository recommended by @jamiecropley.
Users shared insights on the underlying model of mistral-medium, the estimated time of availability for encoding vocabulary files, along with a link to view GPT-3.5/4’s encoding vocabulary files, and the adoption of a JSON standard for vocabulary which can be found on Hugging Face.
The resolution of challenges when running Mistral with Docker, pros and cons of Docker vs. Ollama installations and the limitations of Ollama regarding fine-tuning models were key discussion points.
Issues regarding the interpretative capabilities of Mistral and Mixtral in chatbot implementations were reported. Users shared strategies to improve Mistral’s contextual understanding along with potential solutions, including fine-tuning with a strongly trained system prompt.
Users shared various machine learning, programming and tech-related resources and products, like the MindMac app which now supports Mistral AI, a Golang client for La Plateforme, and libraries for running performance tests such as opencompass, llm-evaluation-harness and light-eval.
There were inquiries and discussions about potential technical issues like connecting a Mac Mini to a 2007 iMac monitor, along with shared resources to assist, such as a discussion thread and article on easing monitor connection.
Discussions on La Plateforme involved troubleshooting Mistral model-related errors, concerns about model censorship, issues around server errors and charges, and exchange of strategies for token counting. Queries about Mistral’s rate limit were also addressed, it was shared that all endpoints are rate-limited at 2M tokens per minute and 200M tokens per month.

Mistral Channel Summaries

▷ #general (104 messages🔥🔥):

Use of Mistral and Hermes: Users discussed the usage and optimization of Mistral with both local and API implementations. Additionally, @Makya highlighted the performance enhancements of Hermes 2.5 over Hermes 2.
MLX and Llama.cpp Discussions: @sublimatorniq sparked a dialogue about the potential benefits of using Apple MLX to run Mixtral. @daain pointed out potential performance issues due to the Memory over Ethernet (MoE) architecture.
Mistral Hosting & API: @satyajitsato asked for resources on how to host Mistral 7b on the cloud and wrap an API around it. @jamiecropley shared a link to a GitHub repository as a possible solution, although they encountered some issues with it.
Context length Discussion: @eawlot3000 inquired about any Mistral models with a context length greater than 32768 tokens. Users shared information and resources about models with larger context lengths, like @Claude and @GPT4.
Career Advice: @naz.daq asked for advice on getting started with machine learning. Some user-recommended resources included a YouTube series by 3Blue1Brown and self-study of foundational math topics such as linear algebra.

▷ #models (7 messages):

Model Behind Mistral-medium: Users engaged in a discussion about the underlying model of mistral-medium. @superseethat asked for details, and @sublimatorniq shared that it is a new prototype model while @tom_lrd speculated it may be 4x8x7b.
Encoding Vocabulary File for GPT Models: @jakobdylanc queried about the estimated time of availability for encoding vocabulary files, providing a link to view GPT-3.5/4’s encoding vocabulary files.
JSON Standard for Vocabulary Usage: Contributing to the vocabulary discussion, @daain mentioned that there’s a JSON standard for vocabulary which has the necessary metadata to use the vocab. A direct link to the JSON file is found on Hugging Face.

▷ #deployment (10 messages🔥):

Running Mistral with Docker: User @hanschrs resolved a challenge with running Mistral by adding --tensor-parallel-size 2 to the Docker command, thereby enabling parallel tensor processing.
Docker vs. Ollama for installation: @vitorpinho inquired about the pros and cons of Docker and Ollama installations. In response, @vhariational suggested using Ollama for quick setups via a few command lines, while recommending Docker for cases requiring isolation to avoid dependency conflicts.
Ollama not designed for Fine-tuning : In the further discussions related to the limitations with Ollama, @vhariational clarified that although Ollama isn’t designed for fine-tuning models, it can handle complex use cases such as providing a REST API to query the model and allowing customization of model settings via its templating system.

▷ #ref-implem (23 messages🔥):

Implementing Mistral for Chatbots: @gmist reported that the Mistral-medium model sometimes answers the question from its own knowledge base rather than relying on the given context. The prompt instructs the model to answer only from the context but this issue persists, as Mistral doesn’t always obey the prompt.
Prompt Modifications: @gmist shared that some prompt modifications seem to work while some do not. The issue of inconsistent performance of prompts has led @gmist to revert back to GPT, which has proven to be reliable for the given use case.
Solutions for Mistral’s Contextual Understanding: @sublimatorniq suggested prefixing each line of context with “CONTEXT BODY” and introducing “hypnotic var naming” to improve contextual understanding. @gmist also reported that removing chat history appeared to improve Mistral’s responsiveness to prompt guidelines.
Mistral vs Mixtral: @daain experienced the same instruction-following issues with a LlamaIndex RAG app and various versions of Mistral. However, daain found that Mixtral performed better than Mistral, suggesting a finetune with a strongly trained system prompt as a possible solution.
Prompt Template Updates: @The Ledger Luminary recommended updating the prompt template and rewording it to be as explicit as possible, as well as referencing specific context pieces. Luminary warned that if there is too much context (high token count), the instructions could be affected by sliding window attention.

▷ #finetuning (4 messages):

Quantifying Fine-tuning Performance Improvement: User @The Ledger Luminary inquired about means of quantifying fine-tuning performance improvement and sought recommendations for libraries to run performance tests. @cpxjj recommended a few libraries and performance benchmarks including opencompass, llm-evaluation-harness and light-eval.
Function Call Fine-tuning: User @krissayrose inquired about their difficulties with fine-tuning Mistral for function calling. The issue highlighted was that the model does not predict an EOS token when expected and continues to generate text. They provided an example and asked for assistance regarding what they might be doing wrong.

▷ #showcase (2 messages):

MindMac AI Support for Mistral: User @hoangnm introduced the MindMac app, an AI-chat platform that now supports Mistral AI. MindMac app is compatible with APIs from OpenAI, Azure OpenAI, Google Gemini, and more. It’s designed for macOS and supports Mac Intel & Apple M1/M2/M3. User directed viewers to a YouTube video for more details about the platform.
Golang Client for La Plateforme: User @r.j.k. shared a link to his Golang client for La Plateforme and sought feedback on its improvement.

▷ #random (7 messages):

Connecting a Mac Mini to a 2007 Monitor: User @pier1337 initiated a discussion on the possibility of connecting a Mac Mini to a 2007 monitor. Later clarified that the monitor is from a 2007 iMac. @daain suggested that if the monitor or iMac has a digital port like DVI or HDMI, it should work.
The 2007 iMac Port Issue: @pier1337 added more context by sharing a link to the Apple forum where it’s stated that the 2007 iMac uses a Mini DVI port, leading to uncertainty if a Mac Mini could be connected using this port.
Target Display Mode: @daain provided a link explaining that the 2007 iMac does not have the target display mode, which was introduced in iMac devices in 2009 enabling them to be used as a display for another device, hence it might not be possible to use it as a monitor for Mac Mini.

▷ #la-plateforme (47 messages🔥):

Error and Troubleshooting with Mistral Models: User @tinwhiskers had issues using larger models (mistral-small and mistral-medium) via the API and received a ‘model not found error’. After discussing with @The Ledger Luminary and Mistral team member @tlacroix_, they found out the mistake was on their end: they were trying to use the ‘mistral-small’ in the OpenAI URL.
Discussion on Streaming and Token Usage: @thesealman asked about calculating token usage on streaming requests. User @lerela confirmed that there’s no way to calculate that at the moment. They offered an estimation strategy on token usage until an official feature is rolled out. The discussion also involved sharing some strategies of token counting using the tokenizer on the received text.
Concerns on Model Censorship: Users @smuglix and @Taiiouka expressed concerns about the censorship of the API models even when the safe mode is set to ‘false’. @titaux12 suggested checking the documentation to disable safe mode but @smuglix confirmed the issue persists even when the safe mode is set to ‘false’.
Incidents of Server Error: User @_jp1_ reported numerous instances of internal server error (error code 503) while using the mistral-medium model. They also expressed concerns about charges on their account, which were over twice the amount of token use tracked by themselves and have asked for contact details for support.
Queries about Mistral’s Rate Limit: User @flopsy1 requested information about the rate limit, which was answered by user @r.j.k. who provided the details from the Mistral documentation stating that all endpoints are rate-limited at 2M tokens per minute and 200M tokens per month.

OpenAccess AI Collective (axolotl) Discord Summary

Debate regarding the usage of OpenAI and LLaMA technologies: It was noted that use of these products’ outputs to finetune large language models might violate agreements and potentially be grounds for lawsuits, but “safe” Apache-licensed models exist and are consistent with such guidelines.
Examination of copyright and ownership pertaining to AI outputs, with a special mention that using the output from an API to train models is an act in violation of the OpenAI agreement.
The impact of load_in_8bit or load_in_4bit parameters on model merging in QLora has been discussed, clarifying that Axolotl doesn’t quantize despite the given parameters.
Importance of passing dev environment tests with each PR for Axolotl due to its usage in the expensive dev environment; issues about finetuning Mixtral and MoEs have been raised and are being investigated.
Shared a link to a new Hugging Face Transformers release (v4.36.2) that might be useful to address some critical issues in Axolotl.
Various challenges to scripts, configurations, and runs faced by guild members, including double EOS token issue, optimal OS library for RLHF, Docker issues, and failing finetuned models on Mistral; resolutions have been attempted and ongoing.
Interest expressed in datasets of multi-turn conversations between humans and chatbots, with suggestion of LMSys Chat 1M dataset on Hugging Face.
Unspecified unalignment issue in RLHF to be fixed by @giftedgummybee.
Assistance and advice shared for various issues in the runpod-help channel, featuring waiting before connecting to the pod, multi GPU usage issues, Out of Memory (OOM) issues and installation of mpl4py. Solutions proposed include enabling specific training solutions, linking the axolotl repository on Github, calibrating max_split_size and batch_size modifications, and GPU adaptation.

OpenAccess AI Collective (axolotl) Channel Summaries

▷ #general (55 messages🔥🔥):

OpenAI and LLaMA’s Usage Agreement: @nafnlaus00 stated that it is against OpenAI’s usage agreement to use the outputs of its products like ChatGPT to finetune large language models (LLMs). The same restrictions apply to LLaMA and many other models. He emphasized that a breach of the agreement could be grounds for a lawsuit under intellectual property and unauthorised use.
“Safe” Apache-licensed Models: @nafnlaus00 mentioned that Mistral/mixtral base and their instruct model, as well as Falcon and several others, are Apache-licensed models and are thus “safe” from such restrictions. He also pointed out some entries in OpenAssistant that he flagged as suspect.
OpenAI’s Terms of Service Violation: @visuallyadequate and @nafnlaus00 had a debate over the enforceability and implications of OpenAI’s Terms of Service (ToS) violations. @visuallyadequate argued that the most OpenAI could do is ban the user, whereas @nafnlaus00 expressed that violating a ToS is equivalent to Breach of Contract, which could potentially be grounds for litigation.
Ownership of AI Outputs: @stefangliga and @visuallyadequate discussed the ownership of AI outputs, with emphasis on the issue that copyright does not apply to AI output. @stefangliga pointed out that irrespective of copyright issues, using the output from an API to train models is a right that is forfeited due to OpenAI’s agreement.
Merging QLora Result Quantization: @touristc inquired about the effect of the load_in_8bit or load_in_4bit parameters on model merging in QLora. @nanobitz clarified that axolotl does not quantize, even if these parameters are provided.

▷ #axolotl-dev (9 messages🔥):

Dev Environment Test: @nanobitz indicated that every PR should pass the tests since they are used in the expensive dev environment.
Concerns about Finetuning Mixtral and MoEs: @nafnlaus00 raised concerns related to a tweet made by Mark Tenenholtz about the difficulty of training MoEs due to the need for implementing load balancing loss functions. The concerns pertained to the approach being used for finetuning, ensuring an even token distribution across experts, and allocating each expert to a separate GPU or GPU cluster in a multi-GPU system.
Caspar’s Work on the Concerned Issue: @caseus_ mentioned that Caspar was investigating the issues raised by @nafnlaus00.
New Release of Hugging Face Transformers: @casper_ai shared a link to v4.36.2 release of hugging face transformers to resolve some critical issues in relation to cache refactor, flash attention refactor and training in multi-gpu and multi-node settings, suggesting that axolotl could probably update to it.

▷ #general-help (64 messages🔥🔥):

Changes to shareGPT.py: @noobmaster29 made a pull request to OpenAccess-AI-Collective/axolotl repository (#976) aiming to resolve the issue of double EOS token at the end of prompts when using Chatml template with shareGPT.py. The change was discussed with @nanobitz but required more testing for confirmation.
Operating System for RLHF: @emperor asked for the most optimized OS library for RLHF. @nanobitz mentioned that TRL is a prominent choice.
Running Fine-Tuned Models on Mistral: @JK$ had issues running a fine-tuned model on Mistral that was uploaded to Hugging Face. The issues persisted even after attempting with vLLM and following guidelines from various documentation and tutorials. Members tried to help with suggestions, but the issue was still unresolved.
Docker configuration troubles: @JK$ also ran into problems with Docker configuration, even when following exact configurations as illustrated in the vLLM documentation. The issue persisted even when trying different models and endpoints. The community tried to assist but the problem remained.
Double EOS Tokens Issue: @noobmaster29 and @self.1 discussed double EOS tokens in multi turn chat, noting that a previous fix failed to solve the issue. They agreed to look into it at a later time.

▷ #datasets (2 messages):

Request for Multi-Turn Conversation Dataset: @natefyi_30842 asked if there’s any dataset of multi-turn conversations between humans and chatbots that isn’t synthetic, but actual human data to understand the types of questions posed.
Referral to LMSys Dataset: @natefyi_30842 suggested the LMSys Chat 1M dataset on Hugging Face as a potential resource, which is publicly accessible but requires sharing contact information for access.

▷ #rlhf (2 messages):

Unalignment Issue Fix: @giftedgummybee mentioned they believe they can fix an unspecified unalignment issue in a few days.

▷ #runpod-help (23 messages🔥):

Waiting before connecting to the pod: User @caseus_ pointed out that waiting approximately 2 minutes before connecting to the pod helps avoid issues with loading the mount point and missing axolotl install.
Issues with using multiple GPUs: @mr_morning encountered problems with Out Of Memory (OOM) errors while trying to fine-tune Yi using multiple RTX 4090 GPUs. Despite having two GPUs, the system recognises only one (num_machines:1). @visuallyadequate responded that the accelerate library should distribute the load on its own without the need for optimized multi GPU training solutions like deepspeed or fsdp, but the weights will still try to load onto each GPU which could be less desirable.
Enabling deepspeed and fsdp for multi GPU usage: @visuallyadequate suggested enabling deepspeed or fsdp for optimized multi GPU training in the relevant yaml file, and linked the axolotl repository on Github for detailed instructions. @noobmaster29 recommended using zero3 deepspeed considering the ongoing issues with fsdp.
Continuing OOM issues and adjustments: Despite various adjustments, including calibrating max_split_size and modifying the batch_size, @mr_morning continued to face OOM errors. In light of this, he considered switching his current RTX 4090 GPUs for a GPU with larger memory capacity (48gb).
Troubles with mpl4py installation: @mr_morning reported encountering issues while trying to install mpl4py and its dependencies on an RTX 6000 Ada GPU, which led to errors such as “Cannot link MPI programs. Check your configuration!!”

DiscoResearch Discord Summary

An engaging discussion led by @_jp1_ and others on the beneficial usage of eval models, such as the Prometheus model, which can quickly evaluate categories like ‘grounding’, ‘style+format’, or adherence to prompt-specific guidelines. The official Prometheus implementation can be found on Hugging Face here.
The ongoing work of @_jp1_ on DiscoLM German and Disco Judge with prospective plans for the release of a repo for several use cases and possibly a Mixtral-based Disco Judge beta in the coming year was noted.
@rasdani introduced a new model, HALOs/Archangel, which is expected to appear in HF TRL soon, along with a link to the related report.
@_jp1_ shared an important update to the config.json of Mixtral clarifying that it never intended to support sliding window attention, pointing to a related TGI fix and PR.
The conversation steered towards PEFT vs NEFT for Disco Fine-tune, with @fernando.fernandes. questioning if the last disco fine-tune used qlora + peft or neft. @_jp1_ confirmed that PEFT/Qlora were used, with NEFT being an extra training option not a direct alternative, which often yielded disappointing results.
@rasdani posted a blog by DeepMind highlighting the efficiency of LLMs in discovering answers to open problems in mathematical sciences when combined with function searches in computer code and proposed the potential of attempting this with an open-source LLM. Additionally, the wildcard part of the FunSearch code implementation on GitHub was shared for anyone interested in its further development.

DiscoResearch Channel Summaries

▷ #disco_judge (5 messages):

Prometheus Model: @_jp1_ emphasizes the beneficial usage of eval models, like the Prometheus model, for tasks hard to benchmark. They highlighted that while these models have an upper bound for ‘accuracy,’ they can quickly evaluate additional categories such as ‘grounding’, style+format, or adherence to specific prompt specifications. They shared a use case of the Prometheus-based model for checking the quality and correctness of translated instruction data. The official Prometheus implementation can be found on Hugging Face.
DiscoLM German and Disco Judge: @_jp1_ mentioned that they are currently working on DiscoLM German and planning to release a repo for several use cases and probably a Mixtral-based beta of Disco Judge in the coming year.
HALOs / Archangel: @rasdani brings up a new model, HALOs/Archangel linked to a report and mentions it is soon coming to HF TRL.

▷ #mixtral_implementation (9 messages🔥):

Mixtral Config Update: @_jp1_ shared an update to the config.json of Mixtral and noted that it was never intended to support sliding window attention. They also linked to a related TGI fix here and PR here.
PEFT vs NEFT for Disco Fine-tune: @fernando.fernandes. asked if the last disco fine-tune used qlora + peft or neft. In response, @_jp1_ clarified that peft/qlora were used, as neft (noisy embedding vectors) wasn’t an alternative but rather an extra training option which generally delivered disappointing results.
Effectiveness of NEFT: @fernando.fernandes. also wondered if using NEFT could produce better results over mixtral. This was dispelled by @_jp1_ who mentioned that it had nothing to do with mixtral and those who tried it with state-of-the-art parameters and standard regulation got underwhelming or identical results.

▷ #general (1 messages):

FunSearch - Discoveries in Mathematical Sciences Using Large Language Models (LLMs): @rasdani shared a DeepMind blog post showcasing how Large Language Models (LLMs) are efficient in making discoveries in open problems in mathematical sciences when combined with the search for “functions” in computer code. They also suggested the potential of trying this with an open-source LLM.
FunSearch Code Implementation: @rasdani further shared the link to a specific part of the FunSearch code implementation on GitHub for anyone interested in contributing to its development.

LangChain AI Discord Summary

Detailed discussion on the topic of code writing to improve a language model’s chain of knowledge, as supported by a research paper shared by @roger_alca.
Inquiries and error handling regarding the PyfanticParser variables and ConfluenceLoader OAuth tokens, with users seeking advice on multi-object JSON output parsing and key requirements for the ConfluenceLoader respectively.
Disclosure about direct communication between @banda_ki and another user through private messages, without further details provided.
Questions surrounding experience with LangChain Agent and a virtual database in SQL, posed by users @banda_ki and @alewe5 respectively, with no response yet.
@ssowonny suggested the use of third-party service PlugBear for integrating LangChain and LangServe applications with Slack, providing a detailed guide about the process in both #general and #share-your-work channels.
@appstormer_25583 mentioned the development of a Hanukkah recipe generator built using GPT but did not provide additional details nor a link to the mentioned project.
An article discussing the potential applications of LangChain for data analysis was shared by @andysingal.
Major update to the app building tool, Create, allowing real-time app building by typing the specification was announced by @dhruv.xyz, with a link to the updated app provided.

LangChain AI Channel Summaries

▷ #general (9 messages🔥):

Chain of code: @roger_alca shared a link to a research paper discussing the use of code-writing to improve a language model’s chain of knowledge. Here is the research paper.
JSON output parser: @infinityexists. asked if it is possible to define two different PyfanticParser variables for handling two types of JSON object returned by an API due to errors received when printing the received objects.
ConfluenceLoader OAuth Tokens: @night765 brought up a question regarding the ConfluenceLoader of Confluence using OAuth tokens. There was confusion over the number of keys required by the loader, as well as the dissimilarities between these keys and those required by the class AtlassianRestAPI.
Private messages: @banda_ki alerted a user to check their direct messages without disclosing any further information.
LangChain Agent and Virtual Database: Users @banda_ki and @alewe5 asked if anyone had experience working with a LangChain agent using custom tools and working with a virtual database in SQL respectively, but received no immediate responses.

▷ #langserve (1 messages):

Integrating LangChain+LangServe with Slack: User @ssowonny suggested a third-party service, PlugBear, for integrating LangChain and LangServe applications with Slack. The post provides a step-by-step guide on how to set up a custom LLM using PlugBear.

Hanukkah Recipe Generator GPT: @appstormer_25583 shared a link about a Hanukkah recipe generator built using GPT. There is no additional detail, or a link provided to further explore this tool.
LangChain for Data Analysis: @andysingal shared an article on AI Advances titled “Unlocking the Power of Language: How LangChain Transforms Data Analysis and More” written by Ankush k Singal. The blog discusses potential applications of LangChain for data analysis.
LangServe and Slack Integration: @ssowonny posted a guide on how to integrate LangServe apps with Slack or Discord, which can be done within 5 minutes. The tutorial is hosted on PlugBear.
Update on Create App Building Tool: @dhruv.xyz announced a major update to the app building tool Create, now allowing real-time app building by typing your spec. A link to the updated app is shared and feedback is sought on the new feature.

Latent Space Discord Summary

Discussion surrounding Artificial General Intelligence (AGI) as users ponder on the state of AGI and general sentiments about the world.
Progress and involvement of user Cursor in the GitHub PR system, indicating growing community contributions towards software projects.
Conversation about the lack of effective AI tools for Infra/DevOps work, with users pointing out room for further advancements in AI applications in these areas.
Cautionary advice concerning Mixtral’s beta-stage status was given by @swyxio, who met the Mixtral developer at the NeurIPS conference. Prompt Hack Link
Query about the accessibility methods for Mixtral, with users debating whether Anyscale calls are being used for access.

Latent Space Channel Summaries

▷ #ai-general-chat (7 messages):

AGI Feelings: User @spicychickensandwichdeluxe asked if people are feeling the AGI (Artificial General Intelligence), while @slono commented on the world being harsh.
Cursor’s Progress on GitHub PRs: User @guardiang mentioned about Cursor gradually venturing into the GitHub PR (Pull Request) game, alongside looking at diffs.
AI Tools for Infra/DevOps Work: @btdubbins expressed that despite advancements in AI and coding, it still feels like many of these tools are not as effective for infrastructure/development operations (DevOps) work.
Caution about Mixtral’s Beta Stage: @swyxio noted that Aman, whom they met at the NeurIPS (Conference on Neural Information Processing Systems), is cautious about Mixtral, emphasizing that it is in beta stage at best. The same user shared a fun prompt hack of the day here that involves telling the AI it is GPT-5.
Accessing Mixtral: @btdubbins questioned how users are accessing Mixtral, inquiring if they are using Anyscale calls.

▷ #llm-paper-club (1 messages):

eugeneyan: yeap, see you then!

Skunkworks AI Discord Summary

Only 1 channel had activity, so no need to summarize…

Skunkworks AI Development Updates: User @far_el provided some insight into the company’s current operations. They clarified that Skunkworks AI no longer builds in public and mentioned that they will be releasing models, software, and products soon.

Alignment Lab AI Discord Summary

Only 1 channel had activity, so no need to summarize…

AMD vs Nvidia GPU Performance Debate: @entropi shared an article from Tom’s Hardware discussing the performance difference between AMD’s Instinct MI300X and Nvidia’s H100 (Hopper) GPUs. AMD compared FP16 using vLLM (popular choice) against FP8 which works only with TensorRT-LLM.
Queries on Open Chat Model Fine-Tuning: User @beowulfbr inquired about any available guides, examples, or colabs for fine-tuning the new open chat model.

MLOps @Chipro Discord Summary

Only 1 channel had activity, so no need to summarize…

Recap of 2023’s Transformative Data Landscape: User @viv2668 discussed the 2023 Modern Data Stacks (MDS), Aspirational Gen AI Projects and several controversies. The discussion was mainly centered on innovations and trends in the data industry. A URL to the full article is shared: Read the full article here.

OpenAI Discord Summary

▷ #ai-discussions (22 messages🔥):

▷ #openai-chatter (750 messages🔥🔥🔥):

▷ #openai-questions (90 messages🔥🔥):

▷ #gpt-4-discussions (22 messages🔥):

▷ #prompt-engineering (9 messages🔥):

▷ #api-discussions (9 messages🔥):

Nous Research AI Discord Summary

▷ #off-topic (2 messages):

▷ #interesting-links (25 messages🔥):

▷ #general (421 messages🔥🔥🔥):

▷ #ask-about-llms (36 messages🔥):

Mistral Discord Summary

▷ #general (104 messages🔥🔥):

▷ #models (7 messages):

▷ #deployment (10 messages🔥):

▷ #ref-implem (23 messages🔥):

▷ #finetuning (4 messages):

▷ #showcase (2 messages):

▷ #random (7 messages):

▷ #la-plateforme (47 messages🔥):

OpenAccess AI Collective (axolotl) Discord Summary

▷ #general (55 messages🔥🔥):

▷ #axolotl-dev (9 messages🔥):

▷ #general-help (64 messages🔥🔥):

▷ #datasets (2 messages):

▷ #rlhf (2 messages):

▷ #runpod-help (23 messages🔥):

DiscoResearch Discord Summary

▷ #disco_judge (5 messages):

▷ #mixtral_implementation (9 messages🔥):

▷ #general (1 messages):

LangChain AI Discord Summary

▷ #general (9 messages🔥):

▷ #langserve (1 messages):

▷ #share-your-work (4 messages):

Latent Space Discord Summary

▷ #ai-general-chat (7 messages):

▷ #llm-paper-club (1 messages):

Skunkworks AI Discord Summary

Alignment Lab AI Discord Summary

MLOps @Chipro Discord Summary