12/20/2023: Project Obsidian - Multimodal Mistral 7B from Nous

If you've ever wanted to see a multimodal model being trained in front of your eyes, now's the time for [Project Obsidian](https://github.com/NousResearch/Obsidian):

Teknium has just opened up a new channel tracking it in the Nous Discord, to the general public.

There was also discussion on 4M: Massively Multimodal Masked Modeling and Reason.dev, a TS framework for LLM apps.

[TOC]

OpenAI Discord Summary

PC Specifications for TensorFlow JS & Image Detection: User @z3wins initiated discussion on the hardware requirements for running TensorFlow JS, with @marchy providing insight and sharing related codesandbox link.
Security API Service for Filtering Inappropriate Images: ‘Security API Service’ idea by @z3wins led to discussions on data transfer costs and feasibility of on-device vs. server-side processing.
Bias in AI & Applications: @whynot66k20ni engaged in dialogue on inherent racial and cultural biases in AI, discussing specifically around facial recognition technology and its potential applications in healthcare and mental health.
Challenges with Language Models in Word Puzzles: Users @eskcanta & @juanitosway described difficulties in applying language models, such as GPT-3.5 and GPT-4, to word puzzle games.
GPU Recommendations for AI Models: GPU suggestions were made by @lugui in response to @isntfunny’s query about budget GPUs suitable for running AI models.
GPT-4 & DALL·E 3 Functionality: There were discussions regarding GPT-4’s capabilities and potential limitations, as well as user dissatisfaction with DALL·E 3’s restrictions and outputs on the OpenAI platform.
Communication Limitations and Commercial Usage of GPT Models: Users expressed frustrations about message limit caps and discussed commercial usage rights for images generated using GPT models.
Platform Access and Usage Issues: Numerous users reported problems with OpenAI platform access and features, specifically around rate limit issues, unusual human verification frequency, and loss of custom GPT models.
Prompting Strategies: Users sought guidance and discussed strategies for effective prompting, focusing on limiting image generation in DALL·E 3, ensuring literal responses in ChatGPT, and creating suitable prompts for marketing research planning. Suggestions were offered on how to reduce the respectful tone in ChatGPT outputs.

OpenAI Channel Summaries

▷ #ai-discussions (126 messages🔥🔥):

PC specs for TensorFlow JS and image detection: User @z3wins initiated a conversation regarding the necessary specifications for a PC to run TensorFlow JS for image detection. @.marchy suggested that it can be run on a Raspberry Pi and shared a related code sandbox link.
Concerns about data transfer rates: The discussion pointed out the potential data transfer costs associated with real-time detection.
Potential for a security API service: @z3wins voiced a preliminary idea to establish a security API service that checks for inappropriate content in images. @.marchy raised questions about the advantage of such a service and proposed that this could be done more efficiently on-device rather than server-side.
Discussion on racial and cultural bias in AI: @whynot66k20ni engaged in a conversation with @.marchy about the potential for racial and cultural bias in AI. Specifically, they discussed challenges in facial recognition technology and underrepresented groups in datasets, plus potential applications in healthcare and mental health.
Impact of LLMs on word puzzle games: @eskcanta and @juanitosway discussed the difficulty of using LLMs like GPT-3.5 and GPT-4 in word puzzle games like hangman due to the models’ struggle with individual letters in a word.
Recommendations on budget GPU for AI models: @isntfunny inquired about a budget-friendly GPU that can run AI inference in a timely manner. @lugui suggested prioritizing the GPU with the most VRAM affordable, highlighting that VRAM is a primary requirement for AI models.

▷ #openai-chatter (144 messages🔥🔥):

GPT-4 Capabilities: There were multiple discussions about the functionalities of GPT-4 (user @【ｐｅｎｕｌｔｉｍａｔｅ】). The AI was noted to possess vision capabilities in addition to its text processing abilities. However, it was clarified that GPT-4 is not a significantly new model but rather an extension of GPT-3 that can process image inputs.
Dall-E 3 Usage: Users had several conversations about the image creation capabilities of Dall-E 3 (users @errorsource, @satanhashtag, @lugui, @asura0_00). Some users expressed dissatisfaction with the output and restrictions of Dall-E 3 on the OpenAI platform and discussed potential alternatives.
Communication Limitations of GPT Models: Several users expressed frustrations about the message limit per 3 hours (users @cherrywavescollide, @sirthatsillegal, @winchawa, @satanhashtag). Some were unsure if the cap was exactly 40 messages per 3 hours, and others noted prematurely reaching the limit.
GPT Commercial Usage: Users @pickle2108, @Furnance and @solbus discussed the commercial rights for images generated using GPT models. User @solbus provided a link to OpenAI’s terms of use with clarification about image ownership.
Platform Issues and Complaints: Several users (@sieventer, @superiornickson5312, @kingkkaktus) reported problems with the OpenAI platform, including experiencing errors, being unable to access certain features, or being blocked due to excessive usage.

▷ #openai-questions (48 messages🔥):

Security Concerns Regarding ChatGPT: User @RobertGOD expressed concerns about potential security vulnerabilities in ChatGPT, particularly noting the lack of two-factor authentication that could allow brute-force attacks. In response, @satanhashtag directed to official OpenAI support (help.openai.com) for official help.
Rate Limit Issues for GPT-4: @amanshrestha faced a rate limit issue when using GPT-4, having reached the limit of 10,000 tokens. @lugui suggested it can be offset by adding credits to the account. Rate limits Link was shared.
Unusual Human Verification on ChatGPT for users: Several users (@Rock, @beret90, @cssupport, @knowcryptoshow) discussed experiencing repeated human verification checks while using ChatGPT. @solbus suggested it might be due to certain privacy settings or extensions in browsers, and recommended testing in incognito mode and/or using different browsers.
Problems with Account Access and custom GPT Disappearance: @khoughy and @cdav respectively encountered issues with account access and disappearance of their custom GPT from the dashboard. @elektronisade clarified that site verification is not related to account access, and suggested that usage policies forbidding “Activity that has high risk of economic harm” might be the reason for the removal of certain custom GPTs.
DALL-E Seed Recreation Issue: @lxreilly asked about recreating a previous image in DALL-E using a seed but advised to reach out to [email protected] by @toror. @solbus pointed out that OpenAI support tickets should be submitted via help.openai.com.

▷ #gpt-4-discussions (1 messages):

openheroes: Oh i don’t know.. you mean it cant let you download the file ?

▷ #prompt-engineering (21 messages🔥):

Prompting Resources Recommendations: @akas7488 asked for recommendations on resources, courses, or YouTube channels to improve in prompting for text and images, but no responses or recommendations were noted within the given messages.
DALL·E 3 Image Generation: @suptrox was looking for a way to make DALL·E 3 generate images with a more limited number of elements based on a specific prompt. @alienpotus suggested using a more focused prompt that clearly states any elements that should not be included in the image.
ChatGPT’s Over-Interpretation of Prompts: @agowa338 expressed concerns about chatGPT interpreting too much into the prompts and not responding to the actual question asked, especially in follow up questions. No resolution or suggestions were given in the discussions.
Creating Prompts for Marketing Research Planning: @raunchotron asked for advice on creating a prompt for a marketing research plan. @eskcanta provided a general guideline for prompt engineering, emphasizing precise communication of what you want to avoid confusion or unwanted responses.
Reducing Respectful Tone in ChatGPT Dialogues: @stealth2077 inquired if there’s a way to do negative prompting to stop the AI from forcing respect, ethic or moral topics in each story. @eskcanta stated that negative prompting is generally ineffective, and better results can be achieved by carefully guiding the AI to produce desired output within its programming and limitations. @eskcanta also provided a detailed example to illustrate how effectively guiding the AI can lead to in-depth and engaging character development in storytelling scenarios.

▷ #api-discussions (21 messages🔥):

Prompting resources: @akas7488 asked for resources to improve their prompting skills for text and images. No resources were shared in the following messages.
DALL·E 3 image generation: @suptrox asked if there’s a way to limit the elements generated by DALL·E 3, including an example where a garden’s land is to be focused on without other distractions. In response, @alienpotus suggested a way to modify the prompt to achieve this, emphasizing the importance of specificity and exclusivity in the prompt.
Literal responses from ChatGPT: @agowa338 expressed concerns about ChatGPT interpreting too much into their prompts, rather than answering them literally, and asked for recommendations to make ChatGPT respond more literally. No direct solutions were provided in the following messages.
Creating prompts for a marketing research plan: @raunchotron inquired about creating prompts for a marketing research plan. @eskcanta advised to make the AI understand what is wanted clearly and use language as accurately as possible, check the output carefully, fact check the AI’s responses and avoid areas where the AI is known to hallucinate.
Reducing respectfulness in generated texts: @stealth2077 queried on how to reduce the amount of respect, ethic or moral topics in every story. @eskcanta responded, explaining OpenAI’s content policy, the model’s inherent nature to set good examples and gave a comprehensive example of how to craft a prompt to guide the AI in generating a story where disagreement exists.

Nous Research AI Discord Summary

Advice against using Langchain by @fullstack6209 within the context of LLM graph queries. Both @maxwellandrews and @pogpunk engaged an unidentified user, with suggestions for utilizing topologies for creating an LLM graph query system and questioning their Twitter presence respectively.
Positive reception to the idea of broader evaluations and the addition of custom tasks mentioned by @gabriel_syme in the benchmarks-log channel, with the efficiency depending on the simplicity of inclusion.
Extensive discussion under interesting-links mostly focused on improving model performance and data usage strategies. Key topics included are Instruction Finetuning (IFT), pretraining data, specific datasets such as GoodWiki and ARXIV papers, a ratio of 1:100 for mixing IFT and pretraining data, model hallucinations, practical application of RAG (Retrieval-Augmented Generation), and the limitations of GPT-4. Notable contributors were @euclaise, @tokenbender, @gabriel_syme, @giftedgummybee.
In general, wide-ranging discussions took place. These involved AI model performance with models like Character AI, Bard and ChatGPT, & Claude being highlighted. Comments on the performance of 4bit Mistral vs unquantized Mistral and the utility of gpt4-vision were made. Training strategies were discussed with @fblgit suggesting a concept of knowledge infinite-turn conversation training doctrine and @mihai4256 indicating plans to proceed with manual dataset annotation. The server reorganization and the Project Obsidian received mentions. Anticipation for upcoming models like UNA Solar & predictions on MMLU (Multiple-Choice Machine Learning Understanding) outcomes filled the room.
In-depth LLM-related discussions were held in ask-about-llms. Discussed topics include deploying/running an LLM locally with suggestions provided by @teknium, fine-tuning QLORA models with noteworthy insights from @teknium, and questions on validation during QLORA training. @semantic_zone poked the group on the scarcity of GPT-4 finetuning discussions while beneficial performance feedback and fine-tuning statuses for the Upstage Solar 10B model was provided by @jaredquek.

Nous Research AI Channel Summaries

▷ #ctx-length-research (3 messages):

User @fullstack6209 strongly advised against using Langchain for unspecified reasons.
@maxwellandrews commented that some topologies can serve as a foundation for anyone to create their own LLM graph query system, regardless of they decide to use the base library or not.
@pogpunk inquired if an unidentified user has a Twitter account.

▷ #benchmarks-log (1 messages):

Adding Custom Tasks: User @gabriel_syme expressed positivity towards broader evaluations, appreciating the ability to include attacks and prompt engineering. The efficiency of this technique, however, will be determined by the ease of adding custom tasks.

▷ #interesting-links (185 messages🔥🔥):

The conversation started with a discussion about the efficacy of Instruction Finetuning (IFT) and the possibility of incorporating pretraining data during fine-tuning to prevent catastrophic forgetting. User @euclaise suggested including data such as GoodWiki for factuality, but cautioned that for models like Mistral, this approach could potentially degrade the model’s performance, due to Mistral’s high-quality base data.
The discussion further evolved to consider the possibility of using higher quality data like ARXIV papers in combination with general pretrain data to maintain a model’s non-domain-specific capabilities, even when it is being fine-tuned towards a particular domain. This led to the recommendation of mixing pretraining data with Instruction Finetuning (IFT) data in a ratio of 1:100 if working with a large SFT dataset.
One contraindication was made by @tokenbender stating that if the objective was memorization, extensive finetuning on pretraining data could lead to significant hallucinations. This invoked a discussion around successful model training strategies and the importance of preserving a model’s base capabilities while minimizing hallucinations and data loss.
The members also discussed the practical application of RAG (Retrieval-Augmented Generation) in the context of AI research. @gabriel_syme and @giftedgummybee propose a step-by-step approach, moving from structured data towards task-specific data, suggesting that it could be a more effective way of conducting retrieval, as more traditional retrieval methods were resulting in irrelevant selection.
Lastly, the conversation touched upon the limitations of GPT-4, with several members expressing disappointment over its reasoning capabilities and outputs. They suggested that the model’s performance seemed to have regressed compared to an earlier iteration, which was apparently more effective.

▷ #general (98 messages🔥🔥):

AI Model Performance and Usage: Discussion surrounds the performance and usage of multiple AI models. @makya expressed surprise over Character Ai being used more than Bard and ChatGPT. In the same vein, there was confusion around a chart suggesting Claude has much more usage than other models, prompting @night_w0lf to suggest that it represents overall usage, with each provider adding to this total.
Testing Models and Potential Improvements: @gitmo joe and @teknium had contrasting views on the performance of 4bit Mistral vs unquantized Mistral, with teknium arguing that 4bit Mistral performs better assuming the same dataset fine-tuning. @gabriel_syme shared their experience of testing gpt4-vision for the first time, finding it useful for data annotation in multimodal environments but unclear on how to provide prompts effectively.
AI Training Strategies and Developments: @fblgit proposed the idea of a knowledge infinite-turn conversation training doctrine, involving reading a dataset as an iterative and synthesized interaction for gradual improvement in model learning. Separately, @mihai4256 made their plan known to continue manually writing dataset samples.
Discord Server Reorganization: @teknium mentioned the restructuring of the server and the addition of a public project called Project Obsidian. A few users mentioned access issues with the new channels, but these were resolved after refreshing their Discord.
Upcoming AI Models and Predictions: Members anticipated the release of new AI models like UNA Solar. Predictions regarding the potential performance of these models on the MMLU (Multiple-Choice Machine Learning Understanding) were made. Some users joked about AGI (Artificial General Intelligence) being achieved through a random model merge or by Nous Research.

▷ #ask-about-llms (47 messages🔥):

Deploying/Running an LLM locally: User @deki04 asks for a guide to deploy/run an LLM locally and prefers via a CLI but a GUI option would be good too. @teknium shares an OpenHermes script for inference, which can be adapted for back and forth chats, and suggests that @deki04 look into llama.cpp for fast inference on a Mac. They note that Mac can only utilize CPU with transformers.
FInetuning QLORA model: User @.beowulfbr shares their configuration and raises a concern about training loss not decreasing significantly while fine-tuning an OpenChat 3.5 model using QLORA with a dataset of 300k entries. @teknium assures that this is normal behavior and the model will stay relatively flat after epoch 1.
Validation during QLORA Training: User @ruggsea ask for insight into validation during QLORA training and best practices for splitting a finetuning dataset into train/validation sets.
GPT-4 Finetuning: @semantic_zone inquires why there hasn’t been more discussion about GPT-4 finetuning, particularly for reasoning. @giftedgummybee responds indicating a couple reasons which include the costs associated with it and it being heavily gated.
Testing and Fine-tuning Upstage Solar 10B model: @jaredquek shares their positive evaluation of the Upstage Solar 10B model on complex French philosophical translations and philosophical questions. They mention they are currently fine-tuning it.

HuggingFace Discord Discord Summary

Announcement of major updates in the open-source sphere such as the debut of 4-bit Mixtral in transformers, a major release of huggingface_hub, a sizeable Gradio update, and the addition of new models and chat templates to Transformers.js, along with gsplat.js, a new JavaScript Gaussian Splatting Library.
Product update rollouts including the release of Mixtral in Hugging Chat, the introduction of preventative measures against commits containing valid tokens, a tool for easy datasets transfer from GitHub to Hugging Face, and new models’ availability on The Hub and MLX.
In the “Cool Stuff” category, the announcement of the new Games with AI course, Hugging Face’s participation in NeurIPS, the invitation to become an author and publish with Hugging Face, a review of the Year of LLM 2023, and the round of AI predictions for 2024.
General inquiries about the prompt format for Mistral, training custom models, a resolution on a HuggingFace Datasets error, accessing private spaces, gpt models suggestions for fine-tuning, issues with inference on a fine-tuned model, concerns on the evaluation of mixtral models in the leaderboard, and submission of models that require trust_remote_code to the LLM leaderboard.
Shared links to the new free and open Deep Reinforcement Learning course by Hugging Face, a paper discussing the efficiency of large language models (LLMs) exceeding the available DRAM capacity, a study paper on preference distillation for large vision language models (LVLMs) and an interesting approach without specific details provided.
Showcased projects within the Discord guild including a project for infinite scene generation, a neural style transfer architecture, a digital asset ownership project, self-trained 2x general upscaling models, and an LLM’s dataset contamination detector.
Reading group discussions revolving around understanding and visualizing diffusion models, writing and publishing a blog post, and understanding diffusion noise schedules and sample steps.
The announcement of Segmind’s new SDXL variants ‘Segmind-Vega’ and ‘Segmind-VegaRT’, offering a reduction in size and a speedup.
NLP discussions focusing on LLM/LORA Vocab limitation, sharing of “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition” book, updates on ctransformers library, and training queries on GPT.
Query on domain translation conditioning for converting Depth to RGB images using diffusion models, with no responses or resources provided in the analyzed messages.

HuggingFace Discord Channel Summaries

▷ #announcements (1 messages):

Open Source Updates: A variety of major updates announced, including the availability of 4-bit Mixtral in transformers source, a huge release of huggingface_hub with features like easier login in Colab source, massive Gradio update with bug fixes and new features source, and new models and chat templates added to Transformers.js source 1, source 2. Additionally, the announcement of gsplat.js, a JavaScript Gaussian Splatting Library, was made source.
Product Updates: Mixtral is now available in Hugging Chat source, commits that contain valid tokens are now rejected in repo updates source, datasets can be easily transfered from GitHub to Hugging Face with a new tool source, and the availability of new models in The Hub + MLX, with the possibility for users to submit their own source.
Cool Stuff: Announcement of dates for the new Games with AI course source, Hugging Face’s participation at NeurIPS source, an invitation to become an author and publish with Hugging Face in 2024 source, a review of the Year of LLM 2023 source, and a round of predictions for AI in 2024 from prominent members of the AI community source 1, source 2, source 3.

▷ #general (58 messages🔥🔥):

Prompt Format for Mistral: User @drummer_ asked where to find the prompt format for Mistral. @osanseviero replied that the format can be found on Mistral’s model card.
Training Custom Models: User @kepardev inquired about starting to train their own model, wondering if they can train it by showing it a few questions and their respective answers.
HuggingFace Datasets Error: User @twisterheather encountered an error (DatasetGenerationError) while trying to download some datasets from Hugging Face.
Accessing Private Spaces: User @tractrixarch tried to access one of his private Spaces from a public Space without committing the token, which wasn’t working. @Cubie | Tom suggested adding the token to the public space’s secrets and loading it in the code with os.environ.get("...").
GPT Model for Fine-tuning: User @vishyouluck asked for recommendations of a small GPT model to fine-tune. @Cubie | Tom recommended gpt2-large and TinyLlama/TinyLlama-1.1B-Chat-v0.6.
Issues with Inference on Fine-tuned Model: @vishyouluck reported an issue with inference on their fine-tuned model VishalMysore/cookgptlama, which was showing an Internal Server Error.
Evaluation of Mixtral Models: User @DavidG88 reported a problem with the evaluation of mixtral models in the leaderboard and asked for contact suggestions for the leaderboard team. @cakiki suggested opening an issue on the space itself.
Submitting a model that requires trust_remote_code to LLM leaderboard: User @testgggggggggg queried if there’s a way to submit a model that requires trust_remote_code = True to the LLM leaderboard.

▷ #today-im-learning (1 messages):

Request to Join RL-Study-Group: User @cloudhu inquired about how to join the rl-study-group channel, as it was locked for them. No further discussion or responses were given in the provided messages.

▷ #cool-finds (4 messages):

Hugging Face Deep Reinforcement Learning Course: @samanofficial shared a link to the new free and open Deep Reinforcement Learning course by Hugging Face. The course aims to teach Deep Reinforcement Learning from beginner to expert.
Efficiently running Large Language Models: @osanseviero expressed excitement for a paper that discusses a method to efficiently run large language models (LLMs) that exceed the available DRAM capacity by storing model parameters in flash memory.
Improving the Ability of Large Vision Language Models: @merve3234 shared an interesting paper which explores preference distillation for large vision language models (LVLMs) with an aim to enhance their ability to generate helpful and faithful responses anchoring the visual context. The model and dataset used in the study are available on Hub.
Innovative Approach Discussion: @martinmunch pointed out an interesting approach but didn’t provide specific details about the content of the paper.

▷ #i-made-this (18 messages🔥):

Infinite Scene Generation Project: @Lil K and @shehuiwojiege shared a project for infinite scene generation, providing links to the code https://www.github.com/HyoKong/DreamD…, a demo https://www.huggingface.co/spaces/imsuper… and the main project https://www.hyokong.github.io/dreamdrone-pag….
Neural Style Transfer Architecture: @om7059 mentioned an implementation of Gatys et al.’s neural style transfer architecture in PyTorch and shared a twitter link with results from their implementation.
Digital Asset Ownership Project: @vedsayys introduced a project by Mngl.club to enhance the digital asset ownership experience and invited users to their X profile and Telegram community via provided links and https://t.me/mngl_club.
Upscaling Models Demo: @helaman introduced his latest released self-trained 2x general upscaling models with a demo and more details available here.
LLM’s Dataset Contamination Detector: @yeyito777 created a space to test an LLM’s dataset contamination as per this paper and shared the link to the space. They explained that models with scores above 0.95 were likely to have seen the data they were tested on before.

▷ #reading-group (7 messages):

Understanding and Visualizing Diffusion Models: @asrielhan shared a GitHub link to a paper on how specific training data impacts the image generation process of a diffusion model.
Blog Post Announcement: @chad_in_the_house announced that they are almost done with writing a blog post, despite being in a different timezone.
Hugging Face Community Blogs: @merve3234 suggested that @chad_in_the_house could publish their blog post in the community blogs section on hf.co/blog.
Medium Blog Post: @chad_in_the_house shared the link to their blog post on Medium about understanding diffusion noise schedules and sample steps. The blog post was inspired by GitHub user @bghira’s model based on the research paper “Common Diffusion Noise Schedules and Sample Steps are Flawed”.
Hugging Face Blog Post: @chad_in_the_house confirmed they will also create a blog post on Hugging Face, providing a simplified version for the presentation.

▷ #core-announcements (1 messages):

Segmind’s New SDXL Variants: ‘@sayakpaul’ announced the release of two new smaller versions of SDXL by Segmind. The first model is ‘Segmind-Vega’, which is a distilled version of Stable Diffusion XL (SDXL) and offers a 70% reduction in size and 100% speedup. Try out this model at Segmind-Vega. The model card can be seen here.
The second model is ‘Segmind-VegaRT’, which is another distilled model. Real-time inference of this model can be tried here and the API can be accessed here. The model card can be seen here.

▷ #diffusion-discussions (2 messages):

Domain Translation with Diffusion: User @rapelaheitor sought advice on learning domain translation conditioning - specifically translating Depth image to RGB image - using diffusion models. They requested any suitable resources or materials to study.
Blend Command: User @alchemistaccelerator_22034 responded with a brief remark: /blend.

▷ #NLP (16 messages🔥):

LLM/LORA Vocab Limitation: User @opencuiguy questioned how to ensure the LLM/LORA models generate only from a fixed vocabulary, such as [“true”, “false”]. @vipitis surmised that @opencuiguy seemed to be using a decoder model for a classification task, suggesting to look at the probability of the two tokens and choose the highest.
Speech and Language Processing Book: @stroggoz shared about a free draft book titled “Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition” by Daniel Jurafsky and James H. Martin. @vipitis commented on the book’s eternal draft status.
Update on ctransformers library: @aiman1993 pointed out that the ctransformers library hasn’t been updated for the last 4 months, hence making llama.cpp tough to run. They also inquired about future updates to the library.
Hugging Face Book on NLP: @merve3234 mentioned a useful book from Hugging Face on Natural Language Processing to which @pomidorich_ requested a link. @cakiki shared a link to the book on O’reilly’s website.
Training Query on GPT: @exponentialxp asked if the quality of text would improve more during GPT training when the loss decreases by 5% at 60k iterations compared to other instances where it decreased by the same percentage. They also inquired about the potential negative effects of changing the learning rate by 10x mid-training.

▷ #diffusion-discussions (2 messages):

Domain Translation Conditioning: User @rapelaheitor asked for educational material on domain translation conditioning. They expressed a specific interest in converting Depth to RGB images, using the Depth image as a conditioning. No responses or resources were provided in the analyzed messages.

OpenAccess AI Collective (axolotl) Discord Summary

Exploration of 4M: Massively Multimodal Masked Modeling shared by user @lightningralf along with this link.
Discussion on fine-tuning the Llama-2-7b-hf model with Huggingface for sequences of length = 8192 and possible workarounds like Mistral and altering the sequence length to 8k were talked about, though it was expressed that such adjustments might impact quality.
Mention of a test with a dropout setting of 0.5 on a Mistral model by @nruaif and it didn’t overfit at the start of the 2nd epoch.
Query by @lightningralf on adding new knowledge to a fine-tuned model like Hermes without affecting the fine-tuning and suggestion by @nruaif to use RAG instead, with an emphasis on the risk of forgetting other knowledge.
Inquiry about an exception instance when multiple models were found in Llama CCP with no at-hand solution being provided.
Brief exchange on 40gb A100s where @faldore stated their lack of usefulness and @yamashi denying their existence.
Query by @seungduk on the possibility of Axolotl merging samples within the sequence length when sample packing is enabled and speculation on its use of binary search to find the next sample.
Inquiry by @latentfog on whether Fill In The Middle (FIM) is supported for fine-tuning code base models.
Question from @wizmak about a strategy for mapping models according to tasks where the user prompt determines the model to be used.
Inquiry from @enima on how to fine-tune a pretrained large language model (LLM) unsupervised, focusing on domain adaptation and suggestion by @noobmaster29 to continue pre-training with additional text data.
Showcase article on the potential of locally trainable LLMs was shared by @visuallyadequate emphasizing the possibility of injecting knowledge into models.
Endorsement of RAG (Retrieval-Augmented Generation) for adding specific information to a model by @noobmaster29 who also recommended ChipNeMo from NVIDIA as a valuable resource.
Request by @_awill for help related to understanding the internals of llama.cpp.

OpenAccess AI Collective (axolotl) Channel Summaries

▷ #general (34 messages🔥):

4M: Massively Multimodal Masked Modeling: User @lightningralf shared a link to the 4M framework for training multimodal and multitask models and applies it to various tokenized modalities, adding that it could be interesting for @257999024458563585 and @208256080092856321.
Llama Model Fine-tuning with Huggingface: User @staticpunch raised a query about fine-tuning the Llama-2-7b-hf model with Huggingface for sequences of length = 8192, despite the model’s config.json file having "max_position_embeddings": 4096. In response, @nanobitz suggested that using Mistral could be an option, or altering the sequence length to 8k within yaml, although the quality might be affected.
Mistral and Dropout: @nruaif revealed they were running a test with a dropout setting of 0.5 on a Mistral model, adding later that the model had not overfit at the beginning of the 2nd epoch.
Inserting New Knowledge into Fine-tuned Models: User @lightningralf asked if it was possible to enhance a fine-tuned model like Hermes by inserting new knowledge in a pre-trained manner, without impacting the fine-tuning. In response, @nruaif suggested using RAG for this purpose, and reiterated that attempting to insert knowledge into a pretrained model could lead to it forgetting other knowledge.
Multiple Models Found Exception in Llama CCP: User @dangfutures asked for a workaround for an exception encountered when multiple models were found in Llama CCP. No solution has been provided within the given messages.

▷ #axolotl-dev (5 messages):

A100s 40gb Discussion: @faldore mentioned that 40gb A100s are not useful to him, however, @yamashi countered that 40gb A100s don’t exist.
Sample Packing in Axolotl: @seungduk questioned about the possibility that Axolotl might merge samples within the sequence length when sample packing is enabled, with different usage of position ids like 0, 1, 2, 3, …, 0, 1, 2, 3… He also noticed that Axolotl seems to find the next sample to merge using binary search.
FIM Support for Fine-tuning Code Base Models: @latentfog raised a query about whether Fill In The Middle (FIM) is supported for fine-tuning code base models.

▷ #general-help (19 messages🔥):

Model Mapping According to Task: @wizmak asked about an approach or framework for mapping models according to tasks where user prompt determines the allocation of the request to a specific model.
Unsupervised Fine-tuning of Pretrained LLM: @enima asked for advice on how to fine-tune a pretrained large language model (LLM) in an unsupervised way, focusing on domain adaptation. @noobmaster29 suggested continued pre-training with additional text data and mentioned the mixed consensus on adding knowledge to LLMs using lora/qlora tuning.
Examples of LLM Fine-tuning: @visuallyadequate shared an article showcasing the potential of locally trainable LLMs. It was recommended that despite potential challenges and pitfalls, it is possible to inject knowledge into models.
Use of RAG to Inject Knowledge: When it comes to adding specific information to a model, @noobmaster29 highly recommended the use of RAG (Retrieval-Augmented Generation). ChipNeMo from NVIDIA was also mentioned as a favorite paper on the topic.
Understanding Llama.cpp internals: @_awill asked for help related to the internals of llama.cpp.

▷ #rlhf (1 messages):

emperor: only 90%?

Latent Space Discord Summary

Discussion on GPT-4 Fine-tuning originated by @semantic_zone, questioning the limited conversation on its fine-tuning capabilities.
Skepticism expressed by @slono regarding the usage of chat.openai.com, proposing alternatives like mixtral or 70b models that provide speed and code generation facilities.
Reference to RΞASON framework, an open-source Typescript backend for building LLM applications, shared by @lightningralf.
Introduction to Model Merging technique, highlighted by @swyxio who shared an article.
Probabilistic Programming discussed as a challenging first-class feature in programming languages, a viewpoint shared by @swizec.
Showcased application of AI in flying drones in latent space, as shared by @kylemathews in a blogpost.
Announcement of an upcoming discussion on the LlamaGuard/Purple Llama paper, led by <@458440277548335125>, and the release of a new podcast episode acknowledged by @Swyxio.
Appreciation for a Meta research paper for its readability expressed by @swizec, alongside @swyxio sharing a Tweet from Andrej Karpathy recommending several papers.
Clarification provided by @swyxio about meetings occurring on Discord, in response to @ayenem’s request for a Zoom link.
Questioning by @swizec on the authors’ decision to release Llama Guard weights without subsequent fine-tuning on ToxicChat.
Scheduled discussion of a paper recommended by Karpathy for the following week, with @eugeneyan considering presenting it.

Latent Space Channel Summaries

▷ #ai-general-chat (12 messages🔥):

GPT-4 Fine-tuning: @semantic_zone raised a query about access to the GPT-4 fine-tuning API for reasoning and asked why there wasn’t much discussion on its fine-tuning possibilities.
Chat.OpenAI.Com Usage: @slono expressed doubts about the utility of using chat.openai.com when alternatives like mixtral or 70b models offer speed and code generation capabilities.
RΞASON Framework: @lightningralf shared the RΞASON framework, which provides a backend open-source Typescript infrastructure for building applications using Large Language Models (LLMs).
Model Merging: @swyxio linked to an article about model merging, a technique people are beginning to explore.
LLMs As a Primitive: @swizec opined that probabilistic programming is difficult and that there have been various attempts to make it a first-class feature in programming languages.
Applying AI for flying drones in latent space: @kylemathews shared a blogpost on bricolage.io discussing the application of AI for flying drones in latent space.

▷ #ai-event-announcements (2 messages):

LlamaGuard/Purple Llama Paper Discussion: A session is to be led by <@458440277548335125> on the LlamaGuard/Purple Llama paper in 30 minutes as announced by @Swyxio. The paper discussed was authored by an extensive team, including Hakan Inan, Kartikeya Upasani, Jianfeng Chi, and others. Interested members are advised to join <#1107320650961518663> to receive discord notifications.
New Podcast Episode: @Swyxio announced the release of a new podcast episode, thanking <@194927177265840128> for their contribution. The podcast can be listened to on this link.

▷ #llm-paper-club (9 messages🔥):

Discussion on a readable paper: @swizec expressed their enjoyment of a paper from the Meta research team, complimenting the readability of their writing.
Paper Recommendations: @swyxio shared a Tweet from Andrej Karpathy providing several paper recommendations, mentioning that Karpathy suggested looking at this paper.
Request for Zoom link: @ayenem asked for a Zoom link to join the discussion. However, @swyxio clarified that meetings now happen on Discord in a specific channel.
Query on Llama Guard Weights: @swizec questioned the decision of the authors to release the weights for Llama Guard without the weights after further fine-tuning on ToxicChat.
Next Week’s Paper: @swyxio announced that the paper for next week’s discussion will be this paper applauded by Karpathy, and encouraged new participants to lead the discussion. @eugeneyan found the paper interesting and is considering presenting it.

LangChain AI Discord Summary

Issues identified in using OpenAI’s return_direct while using a callback method for streaming, described as unpredictable ‘Final Answer’. Also, a query on integrating OpenAI Assistant API with databases without creating a full script for the entire Assistant API.
Shared training resources and practice prompts for Language Learning Models (LLMs), with a desire for a LeetCode-like platform specifically for LLMs identified.
- “Adventure 6 - Prompt Practice”
Challenges reported with adjusting max_tokens parameter in local LLM models, observed application of setting max_tokens failing to yield expected token length.
Request for assistance on how to determine the number of indexed and added documents in PGVector in LangChain, issue described in this StackOverflow post.
Interest in devising a system for collating results from multiple LLM requests into a comprehensive result for user inquiries, with requests for project templates or starters involving RunnableParallel.
Difficulties encountered in using LangServe, such as parsing output from template applications, with particularly a case of chat history disappearing during attempts to filter JSON object display, the pertinent template application available here. Identified issues also with adding routes in LangServe and subsequent unexpected errors.
Identification of output_key="output" as a necessary setting in ConversationBufferMemory for a functional LangServe, albeit the standalone `AgentExecutor’ can operate without this.
Shared article on Analytics Vidhya exploring the transformative role of ChatGPT plugins in digital storytelling and user engagement.
Cryptocurrency Jobs: Invitation to a Discord channel focusing on roles within the crypto field shared by gauravmandal, likely relevant to a broad audience.
- Discord Invite.

LangChain AI Channel Summaries

▷ #general (12 messages🔥):

Handling return_direct and Streaming: User @a404.eth is struggling with handling return_direct while using a callback method for streaming. They stated that if they use the stream method there’s no predictable Final Answer. The issue hasn’t been resolved yet.
OpenAI Assistant API Experience: @refik0727 is seeking assistance on how to connect with their database using the OpenAI Assistant API without writing a script for the whole Assistant API, specifically within the OpenAI platform.
LLM Training Resources: @seththunder shared a link to a practice prompt injection/task for those interested in Language Learning Models (LLMs). Meanwhile, @schimazing inquired if there exists a LeetCode type of site specifically for LLMs.
Trouble Adjusting Max Tokens in LLM Model: @ninamani, a beginner, is encountering issues when adjusting the max_tokens parameter value in a locally hosted LLM model. Specifically, when they set max_tokens to 600, the generated output still tends to stay around 400 tokens.
Seeking Help with PGVector: @alekseyr1987 is seeking help regarding PGVector in LangChain, specifically on how to find out how many documents have been indexed and added. This user provided a link to the specific issue on Stack Overflow.
Chain Query Project Templates: @squadzero. mentioned an interest in developing a chain that will gather results from multiple LLM requests into one whole for user queries. They are looking for any project templates or starters, possibly involving RunnableParallel.

▷ #langserve (8 messages🔥):

Parsing Output in Langserve: User @rodralez encountered an issue when trying to parse the output from a template application in Langserve. They wanted to display only the “output” key of a JSON object in the Output window instead of the entire JSON object. Trying to achieve this by using lambda x: x["output"] resulted in disappearing of the chat history. They were seeking possible solutions to this problem. The template application being used can be viewed here.
LangServe Route Adding Issue: @vilelaone had a problem adding routes using LangServe with their AgentExecutor. While their agent execution worked standalone, it failed when added with LangServe. An attempt to use a custom input and output model led to a ValueError of expecting one output key but receiving more.
ConversationBufferMemory Impact: A small discussion suggested that the use of ConversationBufferMemory could be causing the above issues, as the chat history was disappearing for @rodralez and the LangServe route addition was failing for @vilelaone.
Solving Problem with output_key: @vilelaone was able to resolve their issue by using output_key="output" in ConversationBufferMemory. It was noted that this is necessary for LangServe, even though the standalone AgentExecutor works fine without it.

@gauravmandal shared a link to a discord group focusing on cryptocurrency jobs.
ChatGPT Plugins: @soumyadarshani posted a link to an Analytics Vidhya article, which discusses transforming user interactions with ChatGPT plugins. It suggests that these plugins are revolutionizing digital storytelling and user engagement.

▷ #tutorials (1 messages):

Crypto Jobs Discord Invite: User @gauravmandal shared an invite link to a Discord channel focused on jobs in the crypto industry. He tagged @everyone and @here, signaling that the information might be of broad interest to the group.

DiscoResearch Discord Summary

Dialogue regarding fine-tuning challenges with 7b models, with observations that 7b models can be hard to fine-tune using LoRA due to a propensity for performance degradation and catastrophic forgetting. User @.calytrix speculated this could be due to mixtral’s dense yet low-redundancy 7b models.
Comparative conversation between Foundation 7b vs Old 7b models. User @fernando.fernandes mentioned that the new 7b models seem more challenging for everyone despite the old 7b models being denser and more sensitive to catastrophic forgetting.
Theory by @fernando.fernandes discussing self-attention orthogonality and performance, proposing that performance could be linked to the information volume in the self-attention layers. Especially, in poorly performing models such as the undi95 mixtral fine-tune, the self-attention layers are more orthogonal.
Proposed solutions for fine-tuning from _jp1_ and @fernando.fernandes. They recommend methods such as higher dropout rates, freezing router layers, and potential freezing of the embed_token layer. All suggestion aim to improve the performance of models like mixtral 7b.
Discussion around Disco Research by @le_mess and @.pathos, focusing on the 1970s disco music impact.
Update on the leoIm Preprint release as shared by @bjoernp. Despite the delay in the preprint availability due to ongoing improvements and evaluations, the release is confirmed for the future.
Detailed information about LEOIm Training was provided by @bjoernp. The training of Mistral 7b was on A100s at ~3000 tokens/s/GPU, utilizing approximately 65 billion tokens.

DiscoResearch Channel Summaries

▷ #mixtral_implementation (9 messages🔥):

Fine-tuning Challenges with 7b Models: User @.calytrix has observed that the 7b models are particularly challenging to fine-tune with LoRA without causing performance degradation or catastrophic forgetting. This user speculates that the issue might be tied to mixtral’s dense, low-redundancy 7b models which might be less tolerant to LoRA fine-tuning.
Foundation 7b vs Old 7b: @fernando.fernandes. shares the observation that everyone, regardless of their finetuning methods, appears to be struggling with the new 7b models. This is contrary to the experience with the older 7b models, which are even denser and thus more susceptible to catastrophic forgetting.
Self-Attention Orthogonality and Performance: @fernando.fernandes. proposes that the amount of information stored in self-attention layers, conceptualized as databases, is related to their rankings and orthogonality. He noted that for models with poor performance, such as the undi95 mixtral finetune, the self-attention layers tend to be more orthogonal. Here, orthogonality is measured via the Frobenius norm calculated between the weights of self-attention modules from diverse experts.
Potential Solutions for Fine-tuning: User _jp1_ proposes that QLoRA fine-tuning may not work well with router (or gate) layers, thus requiring higher dropout rates. Future rounds of finetuning, incorporating frozen router layers and additional bugfixes/improvements, could significantly improve performance. @fernando.fernandes. agrees with this and suggests that it may be necessary to freeze the embed_token layer as well, though the reason for its potential positive impact still needs to be understood.

▷ #general (9 messages🔥):

Disco Research: Participants @le_mess and @.pathos mentioned that they are conducting research on the impact of disco music in the 1970s.
leoIm Preprint Release: User @zitronesimo inquired @bjoernp about the release of the leolm preprint. @bjoernp responded stating that the preprint is delayed due to ongoing work on improving the contribution and additional evaluations along with other projects but would certainly be released in due course.
LEOIm Training Details: Upon further queries from @zitronesimo, @bjoernp provided specific details about the training of Mistral 7b. He stated that the training was conducted on A100s with speeds of about 3000 tokens/s/GPU and ~65 billion tokens were used for training.

LLM Perf Enthusiasts AI Discord Summary

Conversation about Cursor vs. VS Code Copilot and decision-making criteria including code output quality, context construction, and codebase-oriented discussions. An outline of recent enhancements in Copilot was provided, punctuated by a YouTube link demonstrating these capabilities.
Raising performance concerns with the Assistants API and GPTs integration. Discussion exploring possible speed improvements, including caching results, waiting for OpenAI remediation, and ingenious solutions. A sudden enhancement in the product’s speed was also mentioned humorously.
A suggestion by @dongdong0755 offered an interesting experiment in prompt splitting and an existing issue regarding extractions in their work. The acclaimed usefulness of a potential embeddings search functionality for Discord was also highlighted.

LLM Perf Enthusiasts AI Channel Summaries

▷ #general (1 messages):

joshcho_: llamaindex most likely

▷ #opensource (1 messages):

joshcho_: holy

▷ #offtopic (4 messages):

VS Code Copilot vs Cursor: User @robhaisfield expressed uncertainty over whether to stick with Cursor or switch back to using VS Code Copilot exclusively due to recent feature and UX changes in Copilot. The user considered Cursor to have an edge in yielding better outputs and having advanced context construction, despite Copilot’s improvements.
Benefits of Cursor: @robhaisfield highlighted one advantage of using Cursor is that all conversations about a code base are grouped within that codebase, creating a more organized system compared to having the conversations distributed across all ChatGPT conversations.
Questions about new features: @jeffreyw128 inquired about the recent features added to VS Code Copilot.
Explaining new features in VS Code Copilot: In response, @robhaisfield cited several enhancements including inline chat, workspace search commands, capability to load documentation from sites or repositories, and editing code chunks through chat commands. A detailed demonstration of these capabilities was linked through a YouTube video.

▷ #speed (5 messages):

Performance of Assistants API + GPTs: @joshcho_ expressed concerns about the slow speed of Assistants API and GPTs integration, and inquired if there were ways to overcome this like caching results.
OpenAI Product Release Concern: @jeffreyw128 suspected that OpenAI might have shipped products prematurely, causing the slowness, and suggested either waiting for OpenAI to rectify the issue or building one’s own solutions to speed up the process.
@joshcho_ noted an apparent improvement in product speed and was somewhat amused by it.

▷ #feedback-meta (1 messages):

joshcho_: i think retrieval would be useful. like an embeddings search for discord

▷ #prompting (2 messages):

Experiment with Prompt Splitting: User @dongdong0755 suggested an experiment of splitting the prompt into two parts to see if there would be any difference in the performance.
Issues with Extractions: User @dongdong0755 also mentioned facing a dilemma regarding extractions in their work.

Skunkworks AI Discord Summary

Only 1 channel had activity, so no need to summarize…

Custom Filtered Dataset for Enhancing Reasoning: @far_el discusses that they have trained on a custom filtered dataset that is formatted to enhance reasoning. The dataset is also trained on multiple prompt formats. Far_el is interested in feedback on this model. (source)
Use Case - Understanding Large Code Bases: @spirobel shares their use case of understanding large code bases and expanding them. Spirobel experimented with the Phind codellama 34b and Mistralic, and realized that Mistralic performs better than Mistrallite by Amazon for their specific use case of detecting important function names from a git diff output. Spirobel wishes to understand why mistralic performs better for this specific retrieval task. (source)
Superiority of Mistralic over Mistrallite for Code Retrieval: @spirobel notes that Mistralic performs better at the task of code retrieval than Mistrallite, even though Mistrallite was supposedly optimized for retrieval. Spirobel speculates that the concept of “retrieval” may vary in different contexts. (source)
Better Generalization of Mistralic: @far_el hypothesizes that Mistralic’s better performance may be due to their use of multiple prompt formats, which could potentially enable it to generalize better. Far_el will be investigating this further and plans to open source whatever they have for Mistralic-1. (source)
Axolotl Docker Image for H100: @tcapelle inquires about the availability of an Axolotl docker image compatible with H100. (source)
Mistralic vs OpenHermes 2.5 Performance: @spirobel states that upon experimenting, they found Mistralic to be more robust and produce better quality output compared to OpenHermes 2.5. Mistralic’s output was found to be often perfectly formatted in markdown. (source)

Alignment Lab AI Discord Summary

Discussion in ‘ai-and-ml-discussion’ on a new tool introduced via the article Introducing Text-to-CAD by @entropi.
Announcement of available collaborations on open source and research projects by users @algomancer and @rabiussany in ‘looking-for-collabs’, with mention of specific areas of interest and openness for private message discussions.
Shared source code for Fine-tuning project in ‘general-chat’ by @teknium, with the GitHub repository link provided by @propback found at openchat.

Alignment Lab AI Channel Summaries

▷ #ai-and-ml-discussion (1 messages):

entropi: https://zoo.dev/blog/introducing-text-to-cad

▷ #looking-for-collabs (2 messages):

Open Source Contribution: User @algomancer offered to contribute to open source and open research projects over the holiday season. Their areas of interest include variable rate compute at inference, non-standard classes of generative models, Jepa style models with anything beyond an autoregressive decoder, and conditioning schemes for enhanced controllability. They expressed comfort with writing Triton/PyTorch and data pipelines.
Research Project Collaboration: User @rabiussany is offering help with any deep learning research projects. They are open to private messages for collaboration discussions.

▷ #general-chat (3 messages):

Fine-tuning Code Source: User @teknium relayed that the entire fine-tuning code for their project is hosted on GitHub but didn’t provide the link. User @propback followed up with the link to the openchat repository, containing the training instructions for the project.

OpenAI Discord Summary

▷ #ai-discussions (126 messages🔥🔥):

▷ #openai-chatter (144 messages🔥🔥):

▷ #openai-questions (48 messages🔥):

▷ #gpt-4-discussions (1 messages):

▷ #prompt-engineering (21 messages🔥):

▷ #api-discussions (21 messages🔥):

Nous Research AI Discord Summary

▷ #ctx-length-research (3 messages):

▷ #benchmarks-log (1 messages):

▷ #interesting-links (185 messages🔥🔥):

▷ #general (98 messages🔥🔥):

▷ #ask-about-llms (47 messages🔥):

HuggingFace Discord Discord Summary

▷ #announcements (1 messages):

▷ #general (58 messages🔥🔥):

▷ #today-im-learning (1 messages):

▷ #cool-finds (4 messages):

▷ #i-made-this (18 messages🔥):

▷ #reading-group (7 messages):

▷ #core-announcements (1 messages):

▷ #diffusion-discussions (2 messages):

▷ #NLP (16 messages🔥):

▷ #diffusion-discussions (2 messages):

OpenAccess AI Collective (axolotl) Discord Summary

▷ #general (34 messages🔥):

▷ #axolotl-dev (5 messages):

▷ #general-help (19 messages🔥):

▷ #rlhf (1 messages):

Latent Space Discord Summary

▷ #ai-general-chat (12 messages🔥):

▷ #ai-event-announcements (2 messages):

▷ #llm-paper-club (9 messages🔥):

LangChain AI Discord Summary

▷ #general (12 messages🔥):

▷ #langserve (8 messages🔥):

▷ #share-your-work (2 messages):

▷ #tutorials (1 messages):

DiscoResearch Discord Summary

▷ #mixtral_implementation (9 messages🔥):

▷ #general (9 messages🔥):

LLM Perf Enthusiasts AI Discord Summary

▷ #general (1 messages):

▷ #opensource (1 messages):

▷ #offtopic (4 messages):

▷ #speed (5 messages):

▷ #feedback-meta (1 messages):

▷ #prompting (2 messages):

Skunkworks AI Discord Summary

Alignment Lab AI Discord Summary

▷ #ai-and-ml-discussion (1 messages):

▷ #looking-for-collabs (2 messages):

▷ #general-chat (3 messages):