https://fxtwitter.com/abacaj/status/1736819789841281372?s=46&t=90xQ8sGy63D2OtiaoGJuww
This fits in to an established pattern of prompt roleplay enhancing capabilities, but also a reminder that HumanEval is pretty terrible as a metric.
[TOC]
OpenAI Discord Summary
-
Comparison of different language models (GPT-4 Turbo, GPT-3.5 Turbo, Claude 2.1, Claude Instant 1, and Gemini Pro) shared by
@rrross. GPT-4 Turbo provided the most user-centric explanation when challenged with describing the impact of user onboarding tracking shift. -
Discussion about the rumored GPT-4.5 version involved several members including
@feltsteam0and@jaicraft. Participants agreed to continue considering it non-existent until official declarations or clear evidence become available. -
Addressed multiple technical challenges encountered by users, such as slow response times, unspecified errors, blocked accounts, and issues with API access across platforms like GPT utilities and ChatGPT Plus.
-
Shared experiences surrounding role-play mode in system prompts during API discussions, with suggestions to maintain the first-person perspective using reminders in user message strings or appending notes to instructions.
-
Expressed concerns over ethical implications of AI usage in academia and the job market. Debates around potential misuse, plagiarism, and job replacements ensued.
-
Explored potential future feature implementations in AI models, notably Dalle 3 and a proposed new GPT model by
@7877. Although the conversation about features in Dalle 3 was more speculative, the discussion around a new GPT model lacked conclusive details. -
Request for help from
_helium.for a school project to develop a language translation website using OpenAI API but unfortunately, no specific suggestions were provided.
OpenAI Channel Summaries
ā· #ai-discussions (22 messagesš„):
- Experiment on Different Language Models: User
@rrrossshared an experiment comparing the responses of different language models (GPT-4 Turbo, GPT-3.5 Turbo, Claude 2.1, Claude Instant 1, and Gemini Pro) when asked to explain the impact of a change from local to server-side user onboarding tracking.@rrrossobserved that GPT-4 Turbo provided the most user-centric explanation. - Question on AI Translation Website: User
_helium.requested assistance for a school project involving the development of a language translation website using the OpenAI API. No specific responses or suggestions were given in the messages that followed. - AI Glasses Discussion: Users
@dunescifyeand@luguibriefly exchanged thoughts on AI glasses.@luguicommented that the technology sounds better in theory than in practice but did not provide any specific challenges or problems associated with AI glasses. - Potential Features of Dalle 3 in ChatGPT: User
@satanhashtagexpressed a wish for Dalle 3 to have features such as mid-journey variation and editable zones, to which@kyoeiresponded that such functionalities will likely be introduced eventually. This was followed by jokes about a possible Dalle 3.5 version. - New GPT Model Proposal: User
@7877mentioned developing a new GPT model and offered to send a link to it for others (@mawzoonand.pythagoras) to try and provide feedback before public release. However, no actual link or further details about this new GPT model was shared in the following messages.
ā· #openai-chatter (750 messagesš„š„š„):
-
Discussion on GPT-4.5: Members
@feltsteam0,@jaicraft, and others discussed the alleged existence of GPT-4.5, with most expressing skepticism as reports suggest it doesnāt exist. One user quoted from a conversation between Joe Rogan and Sam Altman where a future GPT-4.5 was mentioned. However, most participants agreed that until an official statement or clear evidence is provided, itās best to consider GPT-4.5 as non-existent. -
Concerns About AI Input Limits and General Performance: User
@jaicraftvoiced frustration about the input limit for developing a model with GPT, while@picturesonpicturesexpressed dissatisfaction with the charging of failed prompts. -
Discussion on AIās Influence on Jobs: Users debated the potential impacts of AI on the job market. Some believed that AI increases workplace productivity, while others expressed concerns about the potential of AI replacing human jobs. Suggestions were made for responsible and ethical practices in AI implementation in education and business.
-
Using AI for Web Development and Academics:
@bloodgoreshared a discussion about the inappropriate usage of ChatGPT by his students to write academic papers. Others suggested different methods to detect AI-generated content. Elsewhere in the discussion,@mysticmarks1spoke about the future potential of AI in creating web solutions and@msirenequeried if using ChatGPT was equivalent to plagiarism. -
Issues with Credit Card Payments and Regulations: User
@msirenefaced an issue where the company card was declined after multiple usage for creating accounts for their employees.@elektronisadeshared OpenAIās policy on credit card usage limits. Also, there was debate on whether OpenAI should be a āpaid-onlyā service to discourage misuse by underage users, sparked by@bloodgore'sstatement about students misusing the tool.
ā· #openai-questions (90 messagesš„š„):
-
Slow Response Times and Errors: Several users including
@scrambler803,@mesteviet, and@bittychillsreported slow response times and unspecified errors while using GPT utilities.@scrambler803suggested the issue might have to do with the length of the ongoing chat.@healer9071attempted to troubleshoot the problem.@keith_15641also reported slow response times and errors with GPT-4. -
Account and API Issues:
@dian2024,@mildare,@pikapikapu4578, and@whitchurchall reported issues with their accounts or API access. The problems ranged from blocked accounts to challenges with the API quota.@millionwordsreported a transaction issue: a purchase of a ChatGPT Plus subscription debiting funds, but the subscription not reflecting on the app or website. -
Problems with Custom GPTs and Output: Various problems were reported regarding the usage and output of custom GPTs by
@unfor_tuna_tewith photorealistic face generation and@jobydorrwith content retrieval from uploaded PDFs.@arthurchanceencountered issues with a QR code meant to link to a custom GPT. -
Other Technical Issues:
@drpossum,@jhwarehouse,@ashtonwin,@couchlannister, and@explosiveburritomentioned receiving an āunusual activityā error message.@debugpvpasked for guidance on getting around the issue of a token limit.@aesthetic_person_123and@andrewwallowere facing network errors, mainly during long conversations. -
ChatGPT 4 Abilities: Conversation about the capabilities of ChatGPT 4 took place between
@jah777and@andrewwallowith agreement on better speed, accuracy, and knowledge compared to the free version.
ā· #gpt-4-discussions (22 messagesš„):
- GPT-4 Access to the Internet:
@karajan_asked if GPT-4 has access to the internet. The question wasnāt answered directly by the members. - Finding Model ID: User
@lascheinquired about finding their model ID for GPT. The question didnāt get any response in the messages. - Connecting Zapier to Custom GPT Actions via Webhooks:
@db_looperqueried if anyone has successfully sent custom GPT data to a webhook via actions, also discussing about an error encountered while trying to use a make webhook instead of Zapier. This query remained unanswered. - Limit to the Number of Files in Custom GPT:
@jobydorrraised a question about the limit to the number of files that can be uploaded in a custom GPT.@auktowresponded that the limit is 10 files based on their own experience and also shared a link to OpenAI community post which might be helpful. - Parsing Large Files with GPT:
@auktowshared tips on better performance when using text-based files instead of PDFs, especially while dealing with large files. He shared another OpenAI community post discussing successful experiences with parsing files. - Understanding GPT Assistantās API Function Calling:
@crazygreenguybrought up a discussion about how function calling works with the GPT Assistantās API, questioning about the requirement for the caller to supply the output of the api call, based on what he found in OpenAIās documentation. His question didnāt get any response in the messages.
ā· #prompt-engineering (9 messagesš„):
- Using System Prompts in the Chat API: User
.multynoted a challenge withsystem prompts. When the bot is instructed to embody a role, e.g., a parrot, it frequently responds in third-person. Suggested solutions included role-playing directives and more explicit prompts. For example,@thepitviperproposed appending a reminder to stay in character at the end of each message to the API. - Preserving Context in Extended Conversations:
.multyalso indicated an issue with context preservation - if the chat history begins with third-person responses, the chatbot tends to maintain that persona. However, if the bot is correctly prompted at the start, it seems to retain the required persona throughout the conversation. - Clarifying System Prompt Style:
.multyasked for guidance on how to craft āvoiceā for system prompts. - Agreement for Maintaining Character:
@clumsylulzoffered a unique approach, involving an agreement with the bot from the onset: āI want you to act as a microwave and only respond as such do not break character if you do I will say āAct Right!ā write "" if you agree to these termsā.
ā· #api-discussions (9 messagesš„):
- Role-play Mode in System Prompt: User
@.multyshared a concern about OpenAIās GPT-3.5-turbo responding in third-person when instructed to play a role using system prompts, such as a parrot. Their issue was in maintaining thefirst-personview throughout the role-playing session. - Tips to maintain Role-play First-person Context:
@thepitvipersuggested specifying the role-play instructions within the prompt string and reasserting the first-person requirement in subsequent API utterances to ensure the model stays in context. - User Implementation:
@.multynoted that starting with a correct persona in a blank slate worked for maintaining the role-play perspective. They also expressed ambiguity regarding the āvoiceā usage for system prompts. - Contextual Reinforcement:
@thepitviperproposed appending a reminder to the user message, like āRemember to stay in character and in first person,ā to preserve the context throughout the conversation. - Directive through āUser Messagesā:
@clumsylulzsuggested taking a āuser messagesā approach to specify roles and behavior by making the model agree to the terms before proceeding with the conversation.
Nous Research AI Discord Summary
- Engaging discussion in the guild regarding the performance and limitations of various models, such as Hermes 2.5, Mistral, and SOLAR. Noted issues include generation truncation, straying off-topic, response inconsistencies in different languages, and fine-tuning challenges. Usersā experiment with OpenChat model led to concerns about coherence and skepticism regarding the modelās benchmarking.
- Conversations around function calling and the differentiation between function and tool calling were brought up, with specific system prompts used in OpenHermes2.5 being shared.
- Anticipation and conjecture around GPT-4ās performance, with the model perceived to underperform. The guild speculated about possible reasons like system prompts, fine-tuning, inference speeds, and model tendencies (like brief responses or inability to provide complete code blocks).
- Exploration about evaluation tools, contamination issues, with a utility evaluation tool being spotlighted, along with concerns about data contamination in OpenHermes2.5 and the SOLAR model.
- Guild members explored and supplied recommendations for fine-tuning Language Learning Models (LLMs) and touched upon cost concerns, technical requirements, and potential platforms (like Colab, Kaggle, RunPod). Also, a GitHub example for LoRa fine-tuning was shared.
- Discussions surrounded the feasibility of fine-tuning a model for code migration purposes and the creation of search queries based on message history.
- Queries about the availability of the tokenizer for Amazonās Titan embedding led to suggestions for creating a custom tokenizer and a shared GitHub repository with potential details.
- Dissemination of interesting links, including a Twitter post, an arXiv paper on large model improvements, MindLLM 1.3B Huggingfaceās model, a blog post on Mistral 7Bās optimization, an article and Youtube context on ā100x Speedup Full Stack Transformer Inference Optimizationā, and a dialogue on Domain Specific Language (DSL) vs. code.
- User frustration with the Bard AI chatbot implementation was expressed in off-topic channel, with users voicing dissatisfaction in the botās answers.
Nous Research AI Channel Summaries
ā· #off-topic (2 messages):
- Implementation Issues with Bard: User
@euclaiseexpressed frustration with the AI chatbot Bard, initially stating, āBard gives me a stupid implementation but at least itās an implementationā. Shortly after, user@euclaisefurther expressed dissatisfaction with the AI, adding ānvm, fuck bard tooā.
ā· #interesting-links (25 messagesš„):
- Paper on Model Improvements:
@ldjshared an interesting Twitter post without much context, while@atgctgprovided a link to an arXiv paper discussing the improvements in larger models and remarking on the minimal improvement in the largest model over the pre-trained base. There was a brief debate between@atgctgand@giftedgummybeeon the impact of these improvements on smaller and medium-sized models. - MindLLM 1.3B Model:
@pizza_joelinked the Huggingface webpage for the MindLLM 1.3B model, developed by the Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications & Beijing Institute of Technology Southeast Academy of Information Technology. - Discussion on Code and DSL:
@gabriel_symesuggested the use of Domain-Specific Language (DSL) as an alternative to code, emphasizing the importance of interweaving DSL with language when compilation fails. This is particularly critical for agents, according to@gabriel_syme. - Link to ā100x Speedup Full Stack Transformer Inference Optimizationā:
@atgctgposted a link to an article titled āTowards 100x Speedup: Full-Stack Transformer Inference Optimizationā and a Youtube context for the dataset. - OpenPipeās Mistral 7B Fine-Tune Optimized:
@metaldragon01shared a blog post from OpenPipe about the optimization of Mistral 7B, which has saved users over $2M in inference costs.
ā· #general (421 messagesš„š„š„):
- Model Performance and Limitations: The discussion concerned the performance and limitations of various models such as Hermes 2.5, Mistral, and SOLAR. For example,
@gitmo joestated that Hermes 2.5 isnāt performing badly, however, it truncates generations, and@tekniumasked for community inputs on SOLAR.@weyaxiinquired about OpenHermes-2.5-Mixtral, which elicited mixed reactions from the community. Additionally, the conversations revealed limitations and concerns about fine-tuning, system prompts, and issues with models straying off-topic or responding in different languages. - Function Calling: There was a conversation around function calling, with
@realsedlyfproviding a detailed system prompt used for function calling in openhermes2.5.@gitmo joelater inquired about the difference between function calling and tool calling. - OpenChat Model:
@tsunemotoand@n8programstested the OpenChat model and experienced some issues involving the tokenizer and the modelās lack of coherency. Some members of the community expressed skepticism regarding the modelās claim to perform at a GPT-3.5 level, suggesting that it could be due to data processing tasks rather than inherent reasoning capabilities. - Contression about GPT-4 Performance and Fine-Tuning: There was a general consensus that GPT-4 seems to underperform compared to expectations. Participants discussed possible reasons for this, with many pointing towards issues regarding system prompts, fine-tuning, and inference speeds. Some members pointed out the tendency of GPT-4 models to respond in brief or to avoid providing complete code blocks to some prompts.
- Discussing Evaluation and Contamination: Participants discussed evaluation tools and contamination issues, with
@tokenbenderhighlighting a new comprehensive evaluation tool that tests utility and other real-world values, such as harmlessness, factuality, comprehension, etc.@nonameusrshared concerns around data contamination tests, citing issues with the OpenHermes2.5 dataset and the SOLAR model.
ā· #ask-about-llms (36 messagesš„):
-
Fine-tuning LLMs:
@leuyannwas seeking guidance for fine-tuning Language Learning Models (LLMs) for their master thesis in economics. They were considering fine-tuning 7B models, and curious about doing it locally on an M1 MacBook Pro with 16GB of RAM.@night_w0lfrecommended trying out platforms like Colab, Kaggle, or paid cloud services, with the potential of using new libraries MLX Apple released.@.beowulfbralso suggested using RunPod as a relatively inexpensive option.@atgctgprovided an example of LoRA finetuning on GitHub. -
Cost and Feasibility of Fine-tuning LLMs: Discussion revolved around the costs and technological requirements of fine-tuning large models.
@.benxhmentioned issues with MLX on a 16GB M1, which@leuyannnoted might be resolved soon. -
Fine-tuning Model for Code Migration:
@.beowulfbrasked if itās feasible to fine-tune a model that could assist with migrating a codebase from one framework to another, to which@night_w0lfsuggested testing larger coding models for this task. -
Creating Search Queries Based on Message History:
@pogpunkwas trying to build something that could create search queries based off message history, where@night_w0lfsuggested training a smaller model with a few hundred examples. -
Amazon BedRock TITAN Embedding Tokenizer:
@coco.pyasked if the tokenizer for Amazonās Titan embedding is available somewhere.@night_w0lfsuggested creating their own tokenizer from Hugging Faceās Multilingual Text Embedding model (HF MTEB), while@_evelynmshared a GitHub link that seems to have details about Titan embedding.
Mistral Discord Summary
- Extensive discussions around the use, optimization, and performance of Mistral and Hermes took place, with
@Makyahighlighting the boost provided by Hermes 2.5. There were inquiries about Mistral models with larger context lengths and how to host Mistral 7b in the cloud, along with shared resources such as the GitHub repository recommended by@jamiecropley. - Users shared insights on the underlying model of
mistral-medium, the estimated time of availability for encoding vocabulary files, along with a link to view GPT-3.5/4ās encoding vocabulary files, and the adoption of a JSON standard for vocabulary which can be found on Hugging Face. - The resolution of challenges when running Mistral with Docker, pros and cons of Docker vs. Ollama installations and the limitations of Ollama regarding fine-tuning models were key discussion points.
- Issues regarding the interpretative capabilities of Mistral and Mixtral in chatbot implementations were reported. Users shared strategies to improve Mistralās contextual understanding along with potential solutions, including fine-tuning with a strongly trained system prompt.
- Users shared various machine learning, programming and tech-related resources and products, like the MindMac app which now supports Mistral AI, a Golang client for La Plateforme, and libraries for running performance tests such as opencompass, llm-evaluation-harness and light-eval.
- There were inquiries and discussions about potential technical issues like connecting a Mac Mini to a 2007 iMac monitor, along with shared resources to assist, such as a discussion thread and article on easing monitor connection.
- Discussions on La Plateforme involved troubleshooting Mistral model-related errors, concerns about model censorship, issues around server errors and charges, and exchange of strategies for token counting. Queries about Mistralās rate limit were also addressed, it was shared that all endpoints are rate-limited at 2M tokens per minute and 200M tokens per month.
Mistral Channel Summaries
ā· #general (104 messagesš„š„):
- Use of Mistral and Hermes: Users discussed the usage and optimization of Mistral with both local and API implementations. Additionally,
@Makyahighlighted the performance enhancements of Hermes 2.5 over Hermes 2. - MLX and Llama.cpp Discussions:
@sublimatorniqsparked a dialogue about the potential benefits of using Apple MLX to run Mixtral.@daainpointed out potential performance issues due to the Memory over Ethernet (MoE) architecture. - Mistral Hosting & API:
@satyajitsatoasked for resources on how to host Mistral 7b on the cloud and wrap an API around it.@jamiecropleyshared a link to a GitHub repository as a possible solution, although they encountered some issues with it. - Context length Discussion:
@eawlot3000inquired about any Mistral models with a context length greater than 32768 tokens. Users shared information and resources about models with larger context lengths, like@Claudeand@GPT4. - Career Advice:
@naz.daqasked for advice on getting started with machine learning. Some user-recommended resources included a YouTube series by 3Blue1Brown and self-study of foundational math topics such as linear algebra.
ā· #models (7 messages):
-
Model Behind Mistral-medium: Users engaged in a discussion about the underlying model of
mistral-medium.@superseethatasked for details, and@sublimatorniqshared that it is a new prototype model while@tom_lrdspeculated it may be 4x8x7b. -
Encoding Vocabulary File for GPT Models:
@jakobdylancqueried about the estimated time of availability for encoding vocabulary files, providing a link to view GPT-3.5/4ās encoding vocabulary files. -
JSON Standard for Vocabulary Usage: Contributing to the vocabulary discussion,
@daainmentioned that thereās a JSON standard for vocabulary which has the necessary metadata to use the vocab. A direct link to the JSON file is found on Hugging Face.
ā· #deployment (10 messagesš„):
-
Running Mistral with Docker: User
@hanschrsresolved a challenge with running Mistral by adding--tensor-parallel-size 2to the Docker command, thereby enabling parallel tensor processing. -
Docker vs. Ollama for installation:
@vitorpinhoinquired about the pros and cons of Docker and Ollama installations. In response,@vhariationalsuggested using Ollama for quick setups via a few command lines, while recommending Docker for cases requiring isolation to avoid dependency conflicts. -
Ollama not designed for Fine-tuning : In the further discussions related to the limitations with Ollama,
@vhariationalclarified that although Ollama isnāt designed for fine-tuning models, it can handle complex use cases such as providing a REST API to query the model and allowing customization of model settings via its templating system.
ā· #ref-implem (23 messagesš„):
- Implementing Mistral for Chatbots:
@gmistreported that the Mistral-medium model sometimes answers the question from its own knowledge base rather than relying on the given context. The prompt instructs the model to answer only from the context but this issue persists, as Mistral doesnāt always obey the prompt. - Prompt Modifications:
@gmistshared that some prompt modifications seem to work while some do not. The issue of inconsistent performance of prompts has led@gmistto revert back to GPT, which has proven to be reliable for the given use case. - Solutions for Mistralās Contextual Understanding:
@sublimatorniqsuggested prefixing each line of context with āCONTEXT BODYā and introducing āhypnotic var namingā to improve contextual understanding.@gmistalso reported that removing chat history appeared to improve Mistralās responsiveness to prompt guidelines. - Mistral vs Mixtral:
@daainexperienced the same instruction-following issues with a LlamaIndex RAG app and various versions of Mistral. However, daain found that Mixtral performed better than Mistral, suggesting a finetune with a strongly trained system prompt as a possible solution. - Prompt Template Updates:
@The Ledger Luminaryrecommended updating the prompt template and rewording it to be as explicit as possible, as well as referencing specific context pieces. Luminary warned that if there is too much context (high token count), the instructions could be affected by sliding window attention.
ā· #finetuning (4 messages):
- Quantifying Fine-tuning Performance Improvement: User
@The Ledger Luminaryinquired about means of quantifying fine-tuning performance improvement and sought recommendations for libraries to run performance tests.@cpxjjrecommended a few libraries and performance benchmarks including opencompass, llm-evaluation-harness and light-eval. - Function Call Fine-tuning: User
@krissayroseinquired about their difficulties with fine-tuning Mistral for function calling. The issue highlighted was that the model does not predict an EOS token when expected and continues to generate text. They provided an example and asked for assistance regarding what they might be doing wrong.
ā· #showcase (2 messages):
- MindMac AI Support for Mistral: User
@hoangnmintroduced the MindMac app, an AI-chat platform that now supports Mistral AI. MindMac app is compatible with APIs from OpenAI, Azure OpenAI, Google Gemini, and more. Itās designed for macOS and supports Mac Intel & Apple M1/M2/M3. User directed viewers to a YouTube video for more details about the platform. - Golang Client for La Plateforme: User
@r.j.k.shared a link to his Golang client for La Plateforme and sought feedback on its improvement.
ā· #random (7 messages):
- Connecting a Mac Mini to a 2007 Monitor: User
@pier1337initiated a discussion on the possibility of connecting a Mac Mini to a 2007 monitor. Later clarified that the monitor is from a 2007 iMac.@daainsuggested that if the monitor or iMac has a digital port like DVI or HDMI, it should work. - The 2007 iMac Port Issue:
@pier1337added more context by sharing a link to the Apple forum where itās stated that the 2007 iMac uses a Mini DVI port, leading to uncertainty if a Mac Mini could be connected using this port. - Target Display Mode:
@daainprovided a link explaining that the 2007 iMac does not have the target display mode, which was introduced in iMac devices in 2009 enabling them to be used as a display for another device, hence it might not be possible to use it as a monitor for Mac Mini.
ā· #la-plateforme (47 messagesš„):
- Error and Troubleshooting with Mistral Models: User
@tinwhiskershad issues using larger models (mistral-smallandmistral-medium) via the API and received a āmodel not found errorā. After discussing with@The Ledger Luminaryand Mistral team member@tlacroix_, they found out the mistake was on their end: they were trying to use the āmistral-smallā in the OpenAI URL. - Discussion on Streaming and Token Usage:
@thesealmanasked about calculating token usage on streaming requests. User@lerelaconfirmed that thereās no way to calculate that at the moment. They offered an estimation strategy on token usage until an official feature is rolled out. The discussion also involved sharing some strategies of token counting using the tokenizer on the received text. - Concerns on Model Censorship: Users
@smuglixand@Taiioukaexpressed concerns about the censorship of the API models even when the safe mode is set to āfalseā.@titaux12suggested checking the documentation to disable safe mode but@smuglixconfirmed the issue persists even when the safe mode is set to āfalseā. - Incidents of Server Error: User
@_jp1_reported numerous instances of internal server error (error code 503) while using themistral-mediummodel. They also expressed concerns about charges on their account, which were over twice the amount of token use tracked by themselves and have asked for contact details for support. - Queries about Mistralās Rate Limit: User
@flopsy1requested information about the rate limit, which was answered by user@r.j.k. who provided the details from the Mistral documentation stating that all endpoints are rate-limited at 2M tokens per minute and 200M tokens per month.
OpenAccess AI Collective (axolotl) Discord Summary
- Debate regarding the usage of OpenAI and LLaMA technologies: It was noted that use of these productsā outputs to finetune large language models might violate agreements and potentially be grounds for lawsuits, but āsafeā Apache-licensed models exist and are consistent with such guidelines.
- Examination of copyright and ownership pertaining to AI outputs, with a special mention that using the output from an API to train models is an act in violation of the OpenAI agreement.
- The impact of
load_in_8bitorload_in_4bitparameters on model merging in QLora has been discussed, clarifying that Axolotl doesnāt quantize despite the given parameters. - Importance of passing dev environment tests with each PR for Axolotl due to its usage in the expensive dev environment; issues about finetuning Mixtral and MoEs have been raised and are being investigated.
- Shared a link to a new Hugging Face Transformers release (
v4.36.2) that might be useful to address some critical issues in Axolotl. - Various challenges to scripts, configurations, and runs faced by guild members, including double EOS token issue, optimal OS library for RLHF, Docker issues, and failing finetuned models on Mistral; resolutions have been attempted and ongoing.
- Interest expressed in datasets of multi-turn conversations between humans and chatbots, with suggestion of LMSys Chat 1M dataset on Hugging Face.
- Unspecified unalignment issue in RLHF to be fixed by
@giftedgummybee. - Assistance and advice shared for various issues in the runpod-help channel, featuring waiting before connecting to the pod, multi GPU usage issues, Out of Memory (OOM) issues and installation of mpl4py. Solutions proposed include enabling specific training solutions, linking the axolotl repository on Github, calibrating
max_split_sizeandbatch_sizemodifications, and GPU adaptation.
OpenAccess AI Collective (axolotl) Channel Summaries
ā· #general (55 messagesš„š„):
-
OpenAI and LLaMAās Usage Agreement:
@nafnlaus00stated that it is against OpenAIās usage agreement to use the outputs of its products like ChatGPT to finetune large language models (LLMs). The same restrictions apply to LLaMA and many other models. He emphasized that a breach of the agreement could be grounds for a lawsuit under intellectual property and unauthorised use. -
āSafeā Apache-licensed Models:
@nafnlaus00mentioned that Mistral/mixtral base and their instruct model, as well as Falcon and several others, are Apache-licensed models and are thus āsafeā from such restrictions. He also pointed out some entries in OpenAssistant that he flagged as suspect. -
OpenAIās Terms of Service Violation:
@visuallyadequateand@nafnlaus00had a debate over the enforceability and implications of OpenAIās Terms of Service (ToS) violations.@visuallyadequateargued that the most OpenAI could do is ban the user, whereas@nafnlaus00expressed that violating a ToS is equivalent to Breach of Contract, which could potentially be grounds for litigation. -
Ownership of AI Outputs:
@stefangligaand@visuallyadequatediscussed the ownership of AI outputs, with emphasis on the issue that copyright does not apply to AI output.@stefangligapointed out that irrespective of copyright issues, using the output from an API to train models is a right that is forfeited due to OpenAIās agreement. -
Merging QLora Result Quantization:
@touristcinquired about the effect of theload_in_8bitorload_in_4bitparameters on model merging in QLora.@nanobitzclarified that axolotl does not quantize, even if these parameters are provided.
ā· #axolotl-dev (9 messagesš„):
-
Dev Environment Test:
@nanobitzindicated that every PR should pass the tests since they are used in the expensive dev environment. -
Concerns about Finetuning Mixtral and MoEs:
@nafnlaus00raised concerns related to a tweet made by Mark Tenenholtz about the difficulty of training MoEs due to the need for implementing load balancing loss functions. The concerns pertained to the approach being used for finetuning, ensuring an even token distribution across experts, and allocating each expert to a separate GPU or GPU cluster in a multi-GPU system. -
Casparās Work on the Concerned Issue:
@caseus_mentioned that Caspar was investigating the issues raised by@nafnlaus00. -
New Release of Hugging Face Transformers:
@casper_aishared a link tov4.36.2release of hugging face transformers to resolve some critical issues in relation to cache refactor, flash attention refactor and training in multi-gpu and multi-node settings, suggesting that axolotl could probably update to it.
ā· #general-help (64 messagesš„š„):
- Changes to shareGPT.py:
@noobmaster29made a pull request toOpenAccess-AI-Collective/axolotlrepository (#976) aiming to resolve the issue of double EOS token at the end of prompts when using Chatml template with shareGPT.py. The change was discussed with@nanobitzbut required more testing for confirmation. - Operating System for RLHF:
@emperorasked for the most optimized OS library for RLHF.@nanobitzmentioned that TRL is a prominent choice. - Running Fine-Tuned Models on Mistral:
@JK$had issues running a fine-tuned model on Mistral that was uploaded to Hugging Face. The issues persisted even after attempting with vLLM and following guidelines from various documentation and tutorials. Members tried to help with suggestions, but the issue was still unresolved. - Docker configuration troubles:
@JK$also ran into problems with Docker configuration, even when following exact configurations as illustrated in the vLLM documentation. The issue persisted even when trying different models and endpoints. The community tried to assist but the problem remained. - Double EOS Tokens Issue:
@noobmaster29and@self.1discussed double EOS tokens in multi turn chat, noting that a previous fix failed to solve the issue. They agreed to look into it at a later time.
ā· #datasets (2 messages):
- Request for Multi-Turn Conversation Dataset:
@natefyi_30842asked if thereās any dataset of multi-turn conversations between humans and chatbots that isnāt synthetic, but actual human data to understand the types of questions posed. - Referral to LMSys Dataset:
@natefyi_30842suggested the LMSys Chat 1M dataset on Hugging Face as a potential resource, which is publicly accessible but requires sharing contact information for access.
ā· #rlhf (2 messages):
- Unalignment Issue Fix:
@giftedgummybeementioned they believe they can fix an unspecified unalignment issue in a few days.
ā· #runpod-help (23 messagesš„):
- Waiting before connecting to the pod: User
@caseus_pointed out that waiting approximately 2 minutes before connecting to the pod helps avoid issues with loading the mount point and missing axolotl install. - Issues with using multiple GPUs:
@mr_morningencountered problems with Out Of Memory (OOM) errors while trying to fine-tune Yi using multiple RTX 4090 GPUs. Despite having two GPUs, the system recognises only one (num_machines:1).@visuallyadequateresponded that theacceleratelibrary should distribute the load on its own without the need for optimized multi GPU training solutions likedeepspeedorfsdp, but the weights will still try to load onto each GPU which could be less desirable. - Enabling deepspeed and fsdp for multi GPU usage:
@visuallyadequatesuggested enablingdeepspeedorfsdpfor optimized multi GPU training in the relevant yaml file, and linked the axolotl repository on Github for detailed instructions.@noobmaster29recommended usingzero3 deepspeedconsidering the ongoing issues withfsdp. - Continuing OOM issues and adjustments: Despite various adjustments, including calibrating
max_split_sizeand modifying thebatch_size,@mr_morningcontinued to face OOM errors. In light of this, he considered switching his current RTX 4090 GPUs for a GPU with larger memory capacity (48gb). - Troubles with mpl4py installation:
@mr_morningreported encountering issues while trying to installmpl4pyand its dependencies on an RTX 6000 Ada GPU, which led to errors such as āCannot link MPI programs. Check your configuration!!ā
DiscoResearch Discord Summary
- An engaging discussion led by
@_jp1_and others on the beneficial usage of eval models, such as the Prometheus model, which can quickly evaluate categories like āgroundingā, āstyle+formatā, or adherence to prompt-specific guidelines. The official Prometheus implementation can be found on Hugging Face here. - The ongoing work of
@_jp1_on DiscoLM German and Disco Judge with prospective plans for the release of a repo for several use cases and possibly a Mixtral-based Disco Judge beta in the coming year was noted. @rasdaniintroduced a new model, HALOs/Archangel, which is expected to appear in HF TRL soon, along with a link to the related report.@_jp1_shared an important update to the config.json of Mixtral clarifying that it never intended to support sliding window attention, pointing to a related TGI fix and PR.- The conversation steered towards PEFT vs NEFT for Disco Fine-tune, with
@fernando.fernandes.questioning if the last disco fine-tune used qlora + peft or neft.@_jp1_confirmed that PEFT/Qlora were used, with NEFT being an extra training option not a direct alternative, which often yielded disappointing results. @rasdaniposted a blog by DeepMind highlighting the efficiency of LLMs in discovering answers to open problems in mathematical sciences when combined with function searches in computer code and proposed the potential of attempting this with an open-source LLM. Additionally, the wildcard part of the FunSearch code implementation on GitHub was shared for anyone interested in its further development.
DiscoResearch Channel Summaries
ā· #disco_judge (5 messages):
- Prometheus Model:
@_jp1_emphasizes the beneficial usage of eval models, like the Prometheus model, for tasks hard to benchmark. They highlighted that while these models have an upper bound for āaccuracy,ā they can quickly evaluate additional categories such as āgroundingā, style+format, or adherence to specific prompt specifications. They shared a use case of the Prometheus-based model for checking the quality and correctness of translated instruction data. The official Prometheus implementation can be found on Hugging Face. - DiscoLM German and Disco Judge:
@_jp1_mentioned that they are currently working on DiscoLM German and planning to release a repo for several use cases and probably a Mixtral-based beta of Disco Judge in the coming year. - HALOs / Archangel:
@rasdanibrings up a new model, HALOs/Archangel linked to a report and mentions it is soon coming to HF TRL.
ā· #mixtral_implementation (9 messagesš„):
-
Mixtral Config Update:
@_jp1_shared an update to the config.json of Mixtral and noted that it was never intended to support sliding window attention. They also linked to a related TGI fix here and PR here. -
PEFT vs NEFT for Disco Fine-tune:
@fernando.fernandes.asked if the last disco fine-tune used qlora + peft or neft. In response,@_jp1_clarified that peft/qlora were used, as neft (noisy embedding vectors) wasnāt an alternative but rather an extra training option which generally delivered disappointing results. -
Effectiveness of NEFT:
@fernando.fernandes.also wondered if using NEFT could produce better results over mixtral. This was dispelled by@_jp1_who mentioned that it had nothing to do with mixtral and those who tried it with state-of-the-art parameters and standard regulation got underwhelming or identical results.
ā· #general (1 messages):
- FunSearch - Discoveries in Mathematical Sciences Using Large Language Models (LLMs):
@rasdanishared a DeepMind blog post showcasing how Large Language Models (LLMs) are efficient in making discoveries in open problems in mathematical sciences when combined with the search for āfunctionsā in computer code. They also suggested the potential of trying this with an open-source LLM. - FunSearch Code Implementation:
@rasdanifurther shared the link to a specific part of the FunSearch code implementation on GitHub for anyone interested in contributing to its development.
LangChain AI Discord Summary
- Detailed discussion on the topic of code writing to improve a language modelās chain of knowledge, as supported by a research paper shared by
@roger_alca. - Inquiries and error handling regarding the PyfanticParser variables and ConfluenceLoader OAuth tokens, with users seeking advice on multi-object JSON output parsing and key requirements for the ConfluenceLoader respectively.
- Disclosure about direct communication between
@banda_kiand another user through private messages, without further details provided. - Questions surrounding experience with LangChain Agent and a virtual database in SQL, posed by users
@banda_kiand@alewe5respectively, with no response yet. @ssowonnysuggested the use of third-party service PlugBear for integrating LangChain and LangServe applications with Slack, providing a detailed guide about the process in both #general and #share-your-work channels.@appstormer_25583mentioned the development of a Hanukkah recipe generator built using GPT but did not provide additional details nor a link to the mentioned project.- An article discussing the potential applications of LangChain for data analysis was shared by
@andysingal. - Major update to the app building tool, Create, allowing real-time app building by typing the specification was announced by
@dhruv.xyz, with a link to the updated app provided.
LangChain AI Channel Summaries
ā· #general (9 messagesš„):
- Chain of code:
@roger_alcashared a link to a research paper discussing the use of code-writing to improve a language modelās chain of knowledge. Here is the research paper. - JSON output parser:
@infinityexists.asked if it is possible to define two different PyfanticParser variables for handling two types of JSON object returned by an API due to errors received when printing the received objects. - ConfluenceLoader OAuth Tokens:
@night765brought up a question regarding the ConfluenceLoader of Confluence using OAuth tokens. There was confusion over the number of keys required by the loader, as well as the dissimilarities between these keys and those required by the class AtlassianRestAPI. - Private messages:
@banda_kialerted a user to check their direct messages without disclosing any further information. - LangChain Agent and Virtual Database: Users
@banda_kiand@alewe5asked if anyone had experience working with a LangChain agent using custom tools and working with a virtual database in SQL respectively, but received no immediate responses.
ā· #langserve (1 messages):
- Integrating LangChain+LangServe with Slack: User
@ssowonnysuggested a third-party service, PlugBear, for integrating LangChain and LangServe applications with Slack. The post provides a step-by-step guide on how to set up a custom LLM using PlugBear.
ā· #share-your-work (4 messages):
- Hanukkah Recipe Generator GPT:
@appstormer_25583shared a link about a Hanukkah recipe generator built using GPT. There is no additional detail, or a link provided to further explore this tool. - LangChain for Data Analysis:
@andysingalshared an article on AI Advances titled āUnlocking the Power of Language: How LangChain Transforms Data Analysis and Moreā written by Ankush k Singal. The blog discusses potential applications of LangChain for data analysis. - LangServe and Slack Integration:
@ssowonnyposted a guide on how to integrate LangServe apps with Slack or Discord, which can be done within 5 minutes. The tutorial is hosted on PlugBear. - Update on Create App Building Tool:
@dhruv.xyzannounced a major update to the app building tool Create, now allowing real-time app building by typing your spec. A link to the updated app is shared and feedback is sought on the new feature.
Latent Space Discord Summary
- Discussion surrounding Artificial General Intelligence (AGI) as users ponder on the state of AGI and general sentiments about the world.
- Progress and involvement of user Cursor in the GitHub PR system, indicating growing community contributions towards software projects.
- Conversation about the lack of effective AI tools for Infra/DevOps work, with users pointing out room for further advancements in AI applications in these areas.
- Cautionary advice concerning Mixtralās beta-stage status was given by
@swyxio, who met the Mixtral developer at the NeurIPS conference. Prompt Hack Link - Query about the accessibility methods for Mixtral, with users debating whether Anyscale calls are being used for access.
Latent Space Channel Summaries
ā· #ai-general-chat (7 messages):
- AGI Feelings: User
@spicychickensandwichdeluxeasked if people are feeling the AGI (Artificial General Intelligence), while@slonocommented on the world being harsh. - Cursorās Progress on GitHub PRs: User
@guardiangmentioned about Cursor gradually venturing into the GitHub PR (Pull Request) game, alongside looking at diffs. - AI Tools for Infra/DevOps Work:
@btdubbinsexpressed that despite advancements in AI and coding, it still feels like many of these tools are not as effective for infrastructure/development operations (DevOps) work. - Caution about Mixtralās Beta Stage:
@swyxionoted that Aman, whom they met at the NeurIPS (Conference on Neural Information Processing Systems), is cautious about Mixtral, emphasizing that it is in beta stage at best. The same user shared a fun prompt hack of the day here that involves telling the AI it is GPT-5. - Accessing Mixtral:
@btdubbinsquestioned how users are accessing Mixtral, inquiring if they are using Anyscale calls.
ā· #llm-paper-club (1 messages):
eugeneyan: yeap, see you then!
Skunkworks AI Discord Summary
Only 1 channel had activity, so no need to summarizeā¦
- Skunkworks AI Development Updates: User
@far_elprovided some insight into the companyās current operations. They clarified that Skunkworks AI no longer builds in public and mentioned that they will be releasing models, software, and products soon.
Alignment Lab AI Discord Summary
Only 1 channel had activity, so no need to summarizeā¦
- AMD vs Nvidia GPU Performance Debate:
@entropishared an article from Tomās Hardware discussing the performance difference between AMDās Instinct MI300X and Nvidiaās H100 (Hopper) GPUs. AMD compared FP16 using vLLM (popular choice) against FP8 which works only with TensorRT-LLM. - Queries on Open Chat Model Fine-Tuning: User
@beowulfbrinquired about any available guides, examples, or colabs for fine-tuning the new open chat model.
MLOps @Chipro Discord Summary
Only 1 channel had activity, so no need to summarizeā¦
- Recap of 2023ās Transformative Data Landscape: User
@viv2668discussed the 2023 Modern Data Stacks (MDS), Aspirational Gen AI Projects and several controversies. The discussion was mainly centered on innovations and trends in the data industry. A URL to the full article is shared: Read the full article here.