Teknium has just opened up a new channel tracking it in the Nous Discord, to the general public.
There was also discussion on 4M: Massively Multimodal Masked Modeling and Reason.dev, a TS framework for LLM apps.
[TOC]
OpenAI Discord Summary
- PC Specifications for TensorFlow JS & Image Detection: User
@z3wins
initiated discussion on the hardware requirements for running TensorFlow JS, with@marchy
providing insight and sharing related codesandbox link. - Security API Service for Filtering Inappropriate Images: âSecurity API Serviceâ idea by
@z3wins
led to discussions on data transfer costs and feasibility of on-device vs. server-side processing. - Bias in AI & Applications:
@whynot66k20ni
engaged in dialogue on inherent racial and cultural biases in AI, discussing specifically around facial recognition technology and its potential applications in healthcare and mental health. - Challenges with Language Models in Word Puzzles: Users
@eskcanta
&@juanitosway
described difficulties in applying language models, such as GPT-3.5 and GPT-4, to word puzzle games. - GPU Recommendations for AI Models: GPU suggestions were made by
@lugui
in response to@isntfunny
âs query about budget GPUs suitable for running AI models. - GPT-4 & DALL·E 3 Functionality: There were discussions regarding GPT-4âs capabilities and potential limitations, as well as user dissatisfaction with DALL·E 3âs restrictions and outputs on the OpenAI platform.
- Communication Limitations and Commercial Usage of GPT Models: Users expressed frustrations about message limit caps and discussed commercial usage rights for images generated using GPT models.
- Platform Access and Usage Issues: Numerous users reported problems with OpenAI platform access and features, specifically around rate limit issues, unusual human verification frequency, and loss of custom GPT models.
- Prompting Strategies: Users sought guidance and discussed strategies for effective prompting, focusing on limiting image generation in DALL·E 3, ensuring literal responses in ChatGPT, and creating suitable prompts for marketing research planning. Suggestions were offered on how to reduce the respectful tone in ChatGPT outputs.
OpenAI Channel Summaries
â· #ai-discussions (126 messagesđ„đ„):
-
PC specs for TensorFlow JS and image detection: User
@z3wins
initiated a conversation regarding the necessary specifications for a PC to run TensorFlow JS for image detection.@.marchy
suggested that it can be run on a Raspberry Pi and shared a related code sandbox link. -
Concerns about data transfer rates: The discussion pointed out the potential data transfer costs associated with real-time detection.
-
Potential for a security API service:
@z3wins
voiced a preliminary idea to establish a security API service that checks for inappropriate content in images.@.marchy
raised questions about the advantage of such a service and proposed that this could be done more efficiently on-device rather than server-side. -
Discussion on racial and cultural bias in AI:
@whynot66k20ni
engaged in a conversation with@.marchy
about the potential for racial and cultural bias in AI. Specifically, they discussed challenges in facial recognition technology and underrepresented groups in datasets, plus potential applications in healthcare and mental health. -
Impact of LLMs on word puzzle games:
@eskcanta
and@juanitosway
discussed the difficulty of using LLMs like GPT-3.5 and GPT-4 in word puzzle games like hangman due to the modelsâ struggle with individual letters in a word. -
Recommendations on budget GPU for AI models:
@isntfunny
inquired about a budget-friendly GPU that can run AI inference in a timely manner.@lugui
suggested prioritizing the GPU with the most VRAM affordable, highlighting that VRAM is a primary requirement for AI models.
â· #openai-chatter (144 messagesđ„đ„):
-
GPT-4 Capabilities: There were multiple discussions about the functionalities of GPT-4 (user
@ăïœïœ ïœïœïœïœïœïœïœïœïœ ă
). The AI was noted to possess vision capabilities in addition to its text processing abilities. However, it was clarified that GPT-4 is not a significantly new model but rather an extension of GPT-3 that can process image inputs. -
Dall-E 3 Usage: Users had several conversations about the image creation capabilities of Dall-E 3 (users
@errorsource
,@satanhashtag
,@lugui
,@asura0_00
). Some users expressed dissatisfaction with the output and restrictions of Dall-E 3 on the OpenAI platform and discussed potential alternatives. -
Communication Limitations of GPT Models: Several users expressed frustrations about the message limit per 3 hours (users
@cherrywavescollide
,@sirthatsillegal
,@winchawa
,@satanhashtag
). Some were unsure if the cap was exactly 40 messages per 3 hours, and others noted prematurely reaching the limit. -
GPT Commercial Usage: Users
@pickle2108
,@Furnance
and@solbus
discussed the commercial rights for images generated using GPT models. User@solbus
provided a link to OpenAIâs terms of use with clarification about image ownership. -
Platform Issues and Complaints: Several users (
@sieventer
,@superiornickson5312
,@kingkkaktus
) reported problems with the OpenAI platform, including experiencing errors, being unable to access certain features, or being blocked due to excessive usage.
â· #openai-questions (48 messagesđ„):
- Security Concerns Regarding ChatGPT: User
@RobertGOD
expressed concerns about potential security vulnerabilities in ChatGPT, particularly noting the lack of two-factor authentication that could allow brute-force attacks. In response,@satanhashtag
directed to official OpenAI support (help.openai.com) for official help. - Rate Limit Issues for GPT-4:
@amanshrestha
faced a rate limit issue when using GPT-4, having reached the limit of 10,000 tokens.@lugui
suggested it can be offset by adding credits to the account. Rate limits Link was shared. - Unusual Human Verification on ChatGPT for users: Several users (
@Rock
,@beret90
,@cssupport
,@knowcryptoshow
) discussed experiencing repeated human verification checks while using ChatGPT.@solbus
suggested it might be due to certain privacy settings or extensions in browsers, and recommended testing in incognito mode and/or using different browsers. - Problems with Account Access and custom GPT Disappearance:
@khoughy
and@cdav
respectively encountered issues with account access and disappearance of their custom GPT from the dashboard.@elektronisade
clarified that site verification is not related to account access, and suggested that usage policies forbidding âActivity that has high risk of economic harmâ might be the reason for the removal of certain custom GPTs. - DALL-E Seed Recreation Issue:
@lxreilly
asked about recreating a previous image in DALL-E using a seed but advised to reach out to [email protected] by@toror
.@solbus
pointed out that OpenAI support tickets should be submitted via help.openai.com.
â· #gpt-4-discussions (1 messages):
openheroes: Oh i donât know.. you mean it cant let you download the file ?
â· #prompt-engineering (21 messagesđ„):
- Prompting Resources Recommendations:
@akas7488
asked for recommendations on resources, courses, or YouTube channels to improve in prompting for text and images, but no responses or recommendations were noted within the given messages. - DALL·E 3 Image Generation:
@suptrox
was looking for a way to make DALL·E 3 generate images with a more limited number of elements based on a specific prompt.@alienpotus
suggested using a more focused prompt that clearly states any elements that should not be included in the image. - ChatGPTâs Over-Interpretation of Prompts:
@agowa338
expressed concerns about chatGPT interpreting too much into the prompts and not responding to the actual question asked, especially in follow up questions. No resolution or suggestions were given in the discussions. - Creating Prompts for Marketing Research Planning:
@raunchotron
asked for advice on creating a prompt for a marketing research plan.@eskcanta
provided a general guideline for prompt engineering, emphasizing precise communication of what you want to avoid confusion or unwanted responses. - Reducing Respectful Tone in ChatGPT Dialogues:
@stealth2077
inquired if thereâs a way to do negative prompting to stop the AI from forcing respect, ethic or moral topics in each story.@eskcanta
stated that negative prompting is generally ineffective, and better results can be achieved by carefully guiding the AI to produce desired output within its programming and limitations.@eskcanta
also provided a detailed example to illustrate how effectively guiding the AI can lead to in-depth and engaging character development in storytelling scenarios.
â· #api-discussions (21 messagesđ„):
- Prompting resources:
@akas7488
asked for resources to improve their prompting skills for text and images. No resources were shared in the following messages. - DALL·E 3 image generation:
@suptrox
asked if thereâs a way to limit the elements generated by DALL·E 3, including an example where a gardenâs land is to be focused on without other distractions. In response,@alienpotus
suggested a way to modify the prompt to achieve this, emphasizing the importance of specificity and exclusivity in the prompt. - Literal responses from ChatGPT:
@agowa338
expressed concerns about ChatGPT interpreting too much into their prompts, rather than answering them literally, and asked for recommendations to make ChatGPT respond more literally. No direct solutions were provided in the following messages. - Creating prompts for a marketing research plan:
@raunchotron
inquired about creating prompts for a marketing research plan.@eskcanta
advised to make the AI understand what is wanted clearly and use language as accurately as possible, check the output carefully, fact check the AIâs responses and avoid areas where the AI is known to hallucinate. - Reducing respectfulness in generated texts:
@stealth2077
queried on how to reduce the amount of respect, ethic or moral topics in every story.@eskcanta
responded, explaining OpenAIâs content policy, the modelâs inherent nature to set good examples and gave a comprehensive example of how to craft a prompt to guide the AI in generating a story where disagreement exists.
Nous Research AI Discord Summary
-
Advice against using Langchain by
@fullstack6209
within the context of LLM graph queries. Both@maxwellandrews
and@pogpunk
engaged an unidentified user, with suggestions for utilizing topologies for creating an LLM graph query system and questioning their Twitter presence respectively. -
Positive reception to the idea of broader evaluations and the addition of custom tasks mentioned by
@gabriel_syme
in the benchmarks-log channel, with the efficiency depending on the simplicity of inclusion. -
Extensive discussion under interesting-links mostly focused on improving model performance and data usage strategies. Key topics included are Instruction Finetuning (IFT), pretraining data, specific datasets such as GoodWiki and ARXIV papers, a ratio of 1:100 for mixing IFT and pretraining data, model hallucinations, practical application of RAG (Retrieval-Augmented Generation), and the limitations of GPT-4. Notable contributors were
@euclaise
,@tokenbender
,@gabriel_syme
,@giftedgummybee
. -
In general, wide-ranging discussions took place. These involved AI model performance with models like Character AI, Bard and ChatGPT, & Claude being highlighted. Comments on the performance of 4bit Mistral vs unquantized Mistral and the utility of gpt4-vision were made. Training strategies were discussed with
@fblgit
suggesting a concept of knowledge infinite-turn conversation training doctrine and@mihai4256
indicating plans to proceed with manual dataset annotation. The server reorganization and the Project Obsidian received mentions. Anticipation for upcoming models like UNA Solar & predictions on MMLU (Multiple-Choice Machine Learning Understanding) outcomes filled the room. -
In-depth LLM-related discussions were held in ask-about-llms. Discussed topics include deploying/running an LLM locally with suggestions provided by
@teknium
, fine-tuning QLORA models with noteworthy insights from@teknium
, and questions on validation during QLORA training.@semantic_zone
poked the group on the scarcity of GPT-4 finetuning discussions while beneficial performance feedback and fine-tuning statuses for the Upstage Solar 10B model was provided by@jaredquek
.
Nous Research AI Channel Summaries
â· #ctx-length-research (3 messages):
- User
@fullstack6209
strongly advised against using Langchain for unspecified reasons. @maxwellandrews
commented that some topologies can serve as a foundation for anyone to create their own LLM graph query system, regardless of they decide to use the base library or not.@pogpunk
inquired if an unidentified user has a Twitter account.
â· #benchmarks-log (1 messages):
- Adding Custom Tasks: User
@gabriel_syme
expressed positivity towards broader evaluations, appreciating the ability to include attacks and prompt engineering. The efficiency of this technique, however, will be determined by the ease of adding custom tasks.
â· #interesting-links (185 messagesđ„đ„):
-
The conversation started with a discussion about the efficacy of Instruction Finetuning (IFT) and the possibility of incorporating pretraining data during fine-tuning to prevent catastrophic forgetting. User
@euclaise
suggested including data such as GoodWiki for factuality, but cautioned that for models like Mistral, this approach could potentially degrade the modelâs performance, due to Mistralâs high-quality base data. -
The discussion further evolved to consider the possibility of using higher quality data like ARXIV papers in combination with general pretrain data to maintain a modelâs non-domain-specific capabilities, even when it is being fine-tuned towards a particular domain. This led to the recommendation of mixing pretraining data with Instruction Finetuning (IFT) data in a ratio of 1:100 if working with a large SFT dataset.
-
One contraindication was made by
@tokenbender
stating that if the objective was memorization, extensive finetuning on pretraining data could lead to significant hallucinations. This invoked a discussion around successful model training strategies and the importance of preserving a modelâs base capabilities while minimizing hallucinations and data loss. -
The members also discussed the practical application of RAG (Retrieval-Augmented Generation) in the context of AI research.
@gabriel_syme
and@giftedgummybee
propose a step-by-step approach, moving from structured data towards task-specific data, suggesting that it could be a more effective way of conducting retrieval, as more traditional retrieval methods were resulting in irrelevant selection. -
Lastly, the conversation touched upon the limitations of GPT-4, with several members expressing disappointment over its reasoning capabilities and outputs. They suggested that the modelâs performance seemed to have regressed compared to an earlier iteration, which was apparently more effective.
â· #general (98 messagesđ„đ„):
-
AI Model Performance and Usage: Discussion surrounds the performance and usage of multiple AI models.
@makya
expressed surprise over Character Ai being used more than Bard and ChatGPT. In the same vein, there was confusion around a chart suggesting Claude has much more usage than other models, prompting@night_w0lf
to suggest that it represents overall usage, with each provider adding to this total. -
Testing Models and Potential Improvements:
@gitmo joe
and@teknium
had contrasting views on the performance of 4bit Mistral vs unquantized Mistral, with teknium arguing that 4bit Mistral performs better assuming the same dataset fine-tuning.@gabriel_syme
shared their experience of testing gpt4-vision for the first time, finding it useful for data annotation in multimodal environments but unclear on how to provide prompts effectively. -
AI Training Strategies and Developments:
@fblgit
proposed the idea of a knowledge infinite-turn conversation training doctrine, involving reading a dataset as an iterative and synthesized interaction for gradual improvement in model learning. Separately,@mihai4256
made their plan known to continue manually writing dataset samples. -
Discord Server Reorganization:
@teknium
mentioned the restructuring of the server and the addition of a public project called Project Obsidian. A few users mentioned access issues with the new channels, but these were resolved after refreshing their Discord. -
Upcoming AI Models and Predictions: Members anticipated the release of new AI models like UNA Solar. Predictions regarding the potential performance of these models on the MMLU (Multiple-Choice Machine Learning Understanding) were made. Some users joked about AGI (Artificial General Intelligence) being achieved through a random model merge or by Nous Research.
â· #ask-about-llms (47 messagesđ„):
- Deploying/Running an LLM locally: User
@deki04
asks for a guide to deploy/run an LLM locally and prefers via a CLI but a GUI option would be good too.@teknium
shares an OpenHermes script for inference, which can be adapted for back and forth chats, and suggests that@deki04
look intollama.cpp
for fast inference on a Mac. They note that Mac can only utilize CPU with transformers. - FInetuning QLORA model: User
@.beowulfbr
shares their configuration and raises a concern about training loss not decreasing significantly while fine-tuning an OpenChat 3.5 model using QLORA with a dataset of 300k entries.@teknium
assures that this is normal behavior and the model will stay relatively flat after epoch 1. - Validation during QLORA Training: User
@ruggsea
ask for insight into validation during QLORA training and best practices for splitting a finetuning dataset into train/validation sets. - GPT-4 Finetuning:
@semantic_zone
inquires why there hasnât been more discussion about GPT-4 finetuning, particularly for reasoning.@giftedgummybee
responds indicating a couple reasons which include the costs associated with it and it being heavily gated. - Testing and Fine-tuning Upstage Solar 10B model:
@jaredquek
shares their positive evaluation of the Upstage Solar 10B model on complex French philosophical translations and philosophical questions. They mention they are currently fine-tuning it.
HuggingFace Discord Discord Summary
- Announcement of major updates in the open-source sphere such as the debut of 4-bit Mixtral in transformers, a major release of huggingface_hub, a sizeable Gradio update, and the addition of new models and chat templates to Transformers.js, along with gsplat.js, a new JavaScript Gaussian Splatting Library.
- Product update rollouts including the release of Mixtral in Hugging Chat, the introduction of preventative measures against commits containing valid tokens, a tool for easy datasets transfer from GitHub to Hugging Face, and new modelsâ availability on The Hub and MLX.
- In the âCool Stuffâ category, the announcement of the new Games with AI course, Hugging Faceâs participation in NeurIPS, the invitation to become an author and publish with Hugging Face, a review of the Year of LLM 2023, and the round of AI predictions for 2024.
- General inquiries about the prompt format for Mistral, training custom models, a resolution on a HuggingFace Datasets error, accessing private spaces, gpt models suggestions for fine-tuning, issues with inference on a fine-tuned model, concerns on the evaluation of mixtral models in the leaderboard, and submission of models that require
trust_remote_code
to the LLM leaderboard. - Shared links to the new free and open Deep Reinforcement Learning course by Hugging Face, a paper discussing the efficiency of large language models (LLMs) exceeding the available DRAM capacity, a study paper on preference distillation for large vision language models (LVLMs) and an interesting approach without specific details provided.
- Showcased projects within the Discord guild including a project for infinite scene generation, a neural style transfer architecture, a digital asset ownership project, self-trained 2x general upscaling models, and an LLMâs dataset contamination detector.
- Reading group discussions revolving around understanding and visualizing diffusion models, writing and publishing a blog post, and understanding diffusion noise schedules and sample steps.
- The announcement of Segmindâs new SDXL variants âSegmind-Vegaâ and âSegmind-VegaRTâ, offering a reduction in size and a speedup.
- NLP discussions focusing on LLM/LORA Vocab limitation, sharing of âSpeech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognitionâ book, updates on ctransformers library, and training queries on GPT.
- Query on domain translation conditioning for converting Depth to RGB images using diffusion models, with no responses or resources provided in the analyzed messages.
HuggingFace Discord Channel Summaries
â· #announcements (1 messages):
-
Open Source Updates: A variety of major updates announced, including the availability of 4-bit Mixtral in transformers source, a huge release of huggingface_hub with features like easier login in Colab source, massive Gradio update with bug fixes and new features source, and new models and chat templates added to Transformers.js source 1, source 2. Additionally, the announcement of gsplat.js, a JavaScript Gaussian Splatting Library, was made source.
-
Product Updates: Mixtral is now available in Hugging Chat source, commits that contain valid tokens are now rejected in repo updates source, datasets can be easily transfered from GitHub to Hugging Face with a new tool source, and the availability of new models in The Hub + MLX, with the possibility for users to submit their own source.
-
Cool Stuff: Announcement of dates for the new Games with AI course source, Hugging Faceâs participation at NeurIPS source, an invitation to become an author and publish with Hugging Face in 2024 source, a review of the Year of LLM 2023 source, and a round of predictions for AI in 2024 from prominent members of the AI community source 1, source 2, source 3.
â· #general (58 messagesđ„đ„):
- Prompt Format for Mistral: User
@drummer_
asked where to find the prompt format for Mistral.@osanseviero
replied that the format can be found on Mistralâs model card. - Training Custom Models: User
@kepardev
inquired about starting to train their own model, wondering if they can train it by showing it a few questions and their respective answers. - HuggingFace Datasets Error: User
@twisterheather
encountered an error (DatasetGenerationError
) while trying to download some datasets from Hugging Face. - Accessing Private Spaces: User
@tractrixarch
tried to access one of his private Spaces from a public Space without committing the token, which wasnât working.@Cubie | Tom
suggested adding the token to the public spaceâs secrets and loading it in the code withos.environ.get("...")
. - GPT Model for Fine-tuning: User
@vishyouluck
asked for recommendations of a small GPT model to fine-tune.@Cubie | Tom
recommendedgpt2-large
andTinyLlama/TinyLlama-1.1B-Chat-v0.6
. - Issues with Inference on Fine-tuned Model:
@vishyouluck
reported an issue with inference on their fine-tuned modelVishalMysore/cookgptlama
, which was showing an Internal Server Error. - Evaluation of Mixtral Models: User
@DavidG88
reported a problem with the evaluation of mixtral models in the leaderboard and asked for contact suggestions for the leaderboard team.@cakiki
suggested opening an issue on the space itself. - Submitting a model that requires
trust_remote_code
to LLM leaderboard: User@testgggggggggg
queried if thereâs a way to submit a model that requirestrust_remote_code = True
to the LLM leaderboard.
â· #today-im-learning (1 messages):
- Request to Join RL-Study-Group: User
@cloudhu
inquired about how to join therl-study-group
channel, as it was locked for them. No further discussion or responses were given in the provided messages.
â· #cool-finds (4 messages):
- Hugging Face Deep Reinforcement Learning Course:
@samanofficial
shared a link to the new free and open Deep Reinforcement Learning course by Hugging Face. The course aims to teach Deep Reinforcement Learning from beginner to expert. - Efficiently running Large Language Models:
@osanseviero
expressed excitement for a paper that discusses a method to efficiently run large language models (LLMs) that exceed the available DRAM capacity by storing model parameters in flash memory. - Improving the Ability of Large Vision Language Models:
@merve3234
shared an interesting paper which explores preference distillation for large vision language models (LVLMs) with an aim to enhance their ability to generate helpful and faithful responses anchoring the visual context. The model and dataset used in the study are available on Hub. - Innovative Approach Discussion:
@martinmunch
pointed out an interesting approach but didnât provide specific details about the content of the paper.
â· #i-made-this (18 messagesđ„):
- Infinite Scene Generation Project:
@Lil K
and@shehuiwojiege
shared a project for infinite scene generation, providing links to the codehttps://www.github.com/HyoKong/DreamDâŠ
, a demohttps://www.huggingface.co/spaces/imsuperâŠ
and the main projecthttps://www.hyokong.github.io/dreamdrone-pagâŠ
. - Neural Style Transfer Architecture:
@om7059
mentioned an implementation of Gatys et al.âs neural style transfer architecture in PyTorch and shared a twitter link with results from their implementation. - Digital Asset Ownership Project:
@vedsayys
introduced a project by Mngl.club to enhance the digital asset ownership experience and invited users to their X profile and Telegram community via provided links and https://t.me/mngl_club. - Upscaling Models Demo:
@helaman
introduced his latest released self-trained 2x general upscaling models with a demo and more details available here. - LLMâs Dataset Contamination Detector:
@yeyito777
created a space to test an LLMâs dataset contamination as per this paper and shared the link to the space. They explained that models with scores above 0.95 were likely to have seen the data they were tested on before.
â· #reading-group (7 messages):
- Understanding and Visualizing Diffusion Models:
@asrielhan
shared a GitHub link to a paper on how specific training data impacts the image generation process of a diffusion model. - Blog Post Announcement:
@chad_in_the_house
announced that they are almost done with writing a blog post, despite being in a different timezone. - Hugging Face Community Blogs:
@merve3234
suggested that@chad_in_the_house
could publish their blog post in the community blogs section on hf.co/blog. - Medium Blog Post:
@chad_in_the_house
shared the link to their blog post on Medium about understanding diffusion noise schedules and sample steps. The blog post was inspired by GitHub user@bghira
âs model based on the research paper âCommon Diffusion Noise Schedules and Sample Steps are Flawedâ. - Hugging Face Blog Post:
@chad_in_the_house
confirmed they will also create a blog post on Hugging Face, providing a simplified version for the presentation.
â· #core-announcements (1 messages):
- Segmindâs New SDXL Variants: â@sayakpaulâ announced the release of two new smaller versions of SDXL by Segmind. The first model is âSegmind-Vegaâ, which is a distilled version of Stable Diffusion XL (SDXL) and offers a 70% reduction in size and 100% speedup. Try out this model at Segmind-Vega. The model card can be seen here.
- The second model is âSegmind-VegaRTâ, which is another distilled model. Real-time inference of this model can be tried here and the API can be accessed here. The model card can be seen here.
â· #diffusion-discussions (2 messages):
- Domain Translation with Diffusion: User
@rapelaheitor
sought advice on learning domain translation conditioning - specifically translating Depth image to RGB image - using diffusion models. They requested any suitable resources or materials to study. - Blend Command: User
@alchemistaccelerator_22034
responded with a brief remark:/blend
.
â· #NLP (16 messagesđ„):
-
LLM/LORA Vocab Limitation: User
@opencuiguy
questioned how to ensure the LLM/LORA models generate only from a fixed vocabulary, such as [âtrueâ, âfalseâ].@vipitis
surmised that@opencuiguy
seemed to be using a decoder model for a classification task, suggesting to look at the probability of the two tokens and choose the highest. -
Speech and Language Processing Book:
@stroggoz
shared about a free draft book titled âSpeech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognitionâ by Daniel Jurafsky and James H. Martin.@vipitis
commented on the bookâs eternal draft status. -
Update on ctransformers library:
@aiman1993
pointed out that the ctransformers library hasnât been updated for the last 4 months, hence makingllama.cpp
tough to run. They also inquired about future updates to the library. -
Hugging Face Book on NLP:
@merve3234
mentioned a useful book from Hugging Face on Natural Language Processing to which@pomidorich_
requested a link.@cakiki
shared a link to the book on Oâreillyâs website. -
Training Query on GPT:
@exponentialxp
asked if the quality of text would improve more during GPT training when the loss decreases by 5% at 60k iterations compared to other instances where it decreased by the same percentage. They also inquired about the potential negative effects of changing the learning rate by 10x mid-training.
â· #diffusion-discussions (2 messages):
- Domain Translation Conditioning: User
@rapelaheitor
asked for educational material on domain translation conditioning. They expressed a specific interest in converting Depth to RGB images, using the Depth image as a conditioning. No responses or resources were provided in the analyzed messages.
OpenAccess AI Collective (axolotl) Discord Summary
- Exploration of 4M: Massively Multimodal Masked Modeling shared by user
@lightningralf
along with this link. - Discussion on fine-tuning the Llama-2-7b-hf model with Huggingface for sequences of length = 8192 and possible workarounds like Mistral and altering the sequence length to 8k were talked about, though it was expressed that such adjustments might impact quality.
- Mention of a test with a dropout setting of 0.5 on a Mistral model by
@nruaif
and it didnât overfit at the start of the 2nd epoch. - Query by
@lightningralf
on adding new knowledge to a fine-tuned model like Hermes without affecting the fine-tuning and suggestion by@nruaif
to use RAG instead, with an emphasis on the risk of forgetting other knowledge. - Inquiry about an exception instance when multiple models were found in Llama CCP with no at-hand solution being provided.
- Brief exchange on 40gb A100s where
@faldore
stated their lack of usefulness and@yamashi
denying their existence. - Query by
@seungduk
on the possibility of Axolotl merging samples within the sequence length when sample packing is enabled and speculation on its use of binary search to find the next sample. - Inquiry by
@latentfog
on whether Fill In The Middle (FIM) is supported for fine-tuning code base models. - Question from
@wizmak
about a strategy for mapping models according to tasks where the user prompt determines the model to be used. - Inquiry from
@enima
on how to fine-tune a pretrained large language model (LLM) unsupervised, focusing on domain adaptation and suggestion by@noobmaster29
to continue pre-training with additional text data. - Showcase article on the potential of locally trainable LLMs was shared by
@visuallyadequate
emphasizing the possibility of injecting knowledge into models. - Endorsement of RAG (Retrieval-Augmented Generation) for adding specific information to a model by
@noobmaster29
who also recommended ChipNeMo from NVIDIA as a valuable resource. - Request by
@_awill
for help related to understanding the internals of llama.cpp.
OpenAccess AI Collective (axolotl) Channel Summaries
â· #general (34 messagesđ„):
-
4M: Massively Multimodal Masked Modeling: User
@lightningralf
shared a link to the 4M framework for training multimodal and multitask models and applies it to various tokenized modalities, adding that it could be interesting for@257999024458563585
and@208256080092856321
. -
Llama Model Fine-tuning with Huggingface: User
@staticpunch
raised a query about fine-tuning the Llama-2-7b-hf model with Huggingface for sequences of length = 8192, despite the modelâsconfig.json
file having"max_position_embeddings": 4096
. In response,@nanobitz
suggested that using Mistral could be an option, or altering the sequence length to 8k within yaml, although the quality might be affected. -
Mistral and Dropout:
@nruaif
revealed they were running a test with a dropout setting of 0.5 on a Mistral model, adding later that the model had not overfit at the beginning of the 2nd epoch. -
Inserting New Knowledge into Fine-tuned Models: User
@lightningralf
asked if it was possible to enhance a fine-tuned model like Hermes by inserting new knowledge in a pre-trained manner, without impacting the fine-tuning. In response,@nruaif
suggested using RAG for this purpose, and reiterated that attempting to insert knowledge into a pretrained model could lead to it forgetting other knowledge. -
Multiple Models Found Exception in Llama CCP: User
@dangfutures
asked for a workaround for an exception encountered when multiple models were found in Llama CCP. No solution has been provided within the given messages.
â· #axolotl-dev (5 messages):
- A100s 40gb Discussion:
@faldore
mentioned that 40gb A100s are not useful to him, however,@yamashi
countered that 40gb A100s donât exist. - Sample Packing in Axolotl:
@seungduk
questioned about the possibility that Axolotl might merge samples within the sequence length when sample packing is enabled, with different usage of position ids like 0, 1, 2, 3, âŠ, 0, 1, 2, 3⊠He also noticed that Axolotl seems to find the next sample to merge using binary search. - FIM Support for Fine-tuning Code Base Models:
@latentfog
raised a query about whether Fill In The Middle (FIM) is supported for fine-tuning code base models.
â· #general-help (19 messagesđ„):
- Model Mapping According to Task:
@wizmak
asked about an approach or framework for mapping models according to tasks where user prompt determines the allocation of the request to a specific model. - Unsupervised Fine-tuning of Pretrained LLM:
@enima
asked for advice on how to fine-tune a pretrained large language model (LLM) in an unsupervised way, focusing on domain adaptation.@noobmaster29
suggested continued pre-training with additional text data and mentioned the mixed consensus on adding knowledge to LLMs using lora/qlora tuning. - Examples of LLM Fine-tuning:
@visuallyadequate
shared an article showcasing the potential of locally trainable LLMs. It was recommended that despite potential challenges and pitfalls, it is possible to inject knowledge into models. - Use of RAG to Inject Knowledge: When it comes to adding specific information to a model,
@noobmaster29
highly recommended the use of RAG (Retrieval-Augmented Generation). ChipNeMo from NVIDIA was also mentioned as a favorite paper on the topic. - Understanding Llama.cpp internals:
@_awill
asked for help related to the internals of llama.cpp.
â· #rlhf (1 messages):
emperor: only 90%?
Latent Space Discord Summary
- Discussion on GPT-4 Fine-tuning originated by
@semantic_zone
, questioning the limited conversation on its fine-tuning capabilities. - Skepticism expressed by
@slono
regarding the usage of chat.openai.com, proposing alternatives like mixtral or 70b models that provide speed and code generation facilities. - Reference to RÎASON framework, an open-source Typescript backend for building LLM applications, shared by
@lightningralf
. - Introduction to Model Merging technique, highlighted by
@swyxio
who shared an article. - Probabilistic Programming discussed as a challenging first-class feature in programming languages, a viewpoint shared by
@swizec
. - Showcased application of AI in flying drones in latent space, as shared by
@kylemathews
in a blogpost. - Announcement of an upcoming discussion on the LlamaGuard/Purple Llama paper, led by
<@458440277548335125>
, and the release of a new podcast episode acknowledged by@Swyxio
. - Appreciation for a Meta research paper for its readability expressed by
@swizec
, alongside@swyxio
sharing a Tweet from Andrej Karpathy recommending several papers. - Clarification provided by
@swyxio
about meetings occurring on Discord, in response to@ayenem
âs request for a Zoom link. - Questioning by
@swizec
on the authorsâ decision to release Llama Guard weights without subsequent fine-tuning on ToxicChat. - Scheduled discussion of a paper recommended by Karpathy for the following week, with
@eugeneyan
considering presenting it.
Latent Space Channel Summaries
â· #ai-general-chat (12 messagesđ„):
- GPT-4 Fine-tuning:
@semantic_zone
raised a query about access to the GPT-4 fine-tuning API for reasoning and asked why there wasnât much discussion on its fine-tuning possibilities. - Chat.OpenAI.Com Usage:
@slono
expressed doubts about the utility of using chat.openai.com when alternatives like mixtral or 70b models offer speed and code generation capabilities. - RÎASON Framework:
@lightningralf
shared the RÎASON framework, which provides a backend open-source Typescript infrastructure for building applications using Large Language Models (LLMs). - Model Merging:
@swyxio
linked to an article about model merging, a technique people are beginning to explore. - LLMs As a Primitive:
@swizec
opined that probabilistic programming is difficult and that there have been various attempts to make it a first-class feature in programming languages. - Applying AI for flying drones in latent space:
@kylemathews
shared a blogpost on bricolage.io discussing the application of AI for flying drones in latent space.
â· #ai-event-announcements (2 messages):
- LlamaGuard/Purple Llama Paper Discussion: A session is to be led by
<@458440277548335125>
on the LlamaGuard/Purple Llama paper in 30 minutes as announced by@Swyxio
. The paper discussed was authored by an extensive team, including Hakan Inan, Kartikeya Upasani, Jianfeng Chi, and others. Interested members are advised to join<#1107320650961518663>
to receive discord notifications. - New Podcast Episode:
@Swyxio
announced the release of a new podcast episode, thanking<@194927177265840128>
for their contribution. The podcast can be listened to on this link.
â· #llm-paper-club (9 messagesđ„):
- Discussion on a readable paper:
@swizec
expressed their enjoyment of a paper from the Meta research team, complimenting the readability of their writing. - Paper Recommendations:
@swyxio
shared a Tweet from Andrej Karpathy providing several paper recommendations, mentioning that Karpathy suggested looking at this paper. - Request for Zoom link:
@ayenem
asked for a Zoom link to join the discussion. However,@swyxio
clarified that meetings now happen on Discord in a specific channel. - Query on Llama Guard Weights:
@swizec
questioned the decision of the authors to release the weights for Llama Guard without the weights after further fine-tuning on ToxicChat. - Next Weekâs Paper:
@swyxio
announced that the paper for next weekâs discussion will be this paper applauded by Karpathy, and encouraged new participants to lead the discussion.@eugeneyan
found the paper interesting and is considering presenting it.
LangChain AI Discord Summary
- Issues identified in using OpenAIâs
return_direct
while using a callback method for streaming, described as unpredictable âFinal Answerâ. Also, a query on integrating OpenAI Assistant API with databases without creating a full script for the entire Assistant API. - Shared training resources and practice prompts for Language Learning Models (LLMs), with a desire for a LeetCode-like platform specifically for LLMs identified.
- Challenges reported with adjusting
max_tokens
parameter in local LLM models, observed application of setting max_tokens failing to yield expected token length. - Request for assistance on how to determine the number of indexed and added documents in PGVector in LangChain, issue described in this StackOverflow post.
- Interest in devising a system for collating results from multiple LLM requests into a comprehensive result for user inquiries, with requests for project templates or starters involving
RunnableParallel
. - Difficulties encountered in using LangServe, such as parsing output from template applications, with particularly a case of chat history disappearing during attempts to filter JSON object display, the pertinent template application available here. Identified issues also with adding routes in LangServe and subsequent unexpected errors.
- Identification of
output_key="output"
as a necessary setting inConversationBufferMemory
for a functional LangServe, albeit the standalone `AgentExecutorâ can operate without this. - Shared article on Analytics Vidhya exploring the transformative role of ChatGPT plugins in digital storytelling and user engagement.
- Cryptocurrency Jobs: Invitation to a Discord channel focusing on roles within the crypto field shared by gauravmandal, likely relevant to a broad audience.
LangChain AI Channel Summaries
â· #general (12 messagesđ„):
- Handling
return_direct
and Streaming: User@a404.eth
is struggling with handlingreturn_direct
while using a callback method for streaming. They stated that if they use thestream
method thereâs no predictableFinal Answer
. The issue hasnât been resolved yet. - OpenAI Assistant API Experience:
@refik0727
is seeking assistance on how to connect with their database using the OpenAI Assistant API without writing a script for the whole Assistant API, specifically within the OpenAI platform. - LLM Training Resources:
@seththunder
shared a link to a practice prompt injection/task for those interested in Language Learning Models (LLMs). Meanwhile,@schimazing
inquired if there exists a LeetCode type of site specifically for LLMs. - Trouble Adjusting Max Tokens in LLM Model:
@ninamani
, a beginner, is encountering issues when adjusting themax_tokens
parameter value in a locally hosted LLM model. Specifically, when they setmax_tokens
to 600, the generated output still tends to stay around 400 tokens. - Seeking Help with PGVector:
@alekseyr1987
is seeking help regarding PGVector in LangChain, specifically on how to find out how many documents have been indexed and added. This user provided a link to the specific issue on Stack Overflow. - Chain Query Project Templates:
@squadzero.
mentioned an interest in developing a chain that will gather results from multiple LLM requests into one whole for user queries. They are looking for any project templates or starters, possibly involvingRunnableParallel
.
â· #langserve (8 messagesđ„):
-
Parsing Output in Langserve: User
@rodralez
encountered an issue when trying to parse the output from a template application in Langserve. They wanted to display only the âoutputâ key of a JSON object in the Output window instead of the entire JSON object. Trying to achieve this by usinglambda x: x["output"]
resulted in disappearing of the chat history. They were seeking possible solutions to this problem. The template application being used can be viewed here. -
LangServe Route Adding Issue:
@vilelaone
had a problem adding routes using LangServe with theirAgentExecutor
. While their agent execution worked standalone, it failed when added with LangServe. An attempt to use a custom input and output model led to a ValueError of expecting one output key but receiving more. -
ConversationBufferMemory Impact: A small discussion suggested that the use of
ConversationBufferMemory
could be causing the above issues, as the chat history was disappearing for@rodralez
and theLangServe
route addition was failing for@vilelaone
. -
Solving Problem with output_key:
@vilelaone
was able to resolve their issue by usingoutput_key="output"
inConversationBufferMemory
. It was noted that this is necessary forLangServe
, even though the standaloneAgentExecutor
works fine without it.
â· #share-your-work (2 messages):
@gauravmandal
shared a link to a discord group focusing on cryptocurrency jobs.- ChatGPT Plugins:
@soumyadarshani
posted a link to an Analytics Vidhya article, which discusses transforming user interactions with ChatGPT plugins. It suggests that these plugins are revolutionizing digital storytelling and user engagement.
â· #tutorials (1 messages):
- Crypto Jobs Discord Invite: User
@gauravmandal
shared an invite link to a Discord channel focused on jobs in the crypto industry. He tagged@everyone
and@here
, signaling that the information might be of broad interest to the group.
DiscoResearch Discord Summary
- Dialogue regarding fine-tuning challenges with 7b models, with observations that 7b models can be hard to fine-tune using LoRA due to a propensity for performance degradation and catastrophic forgetting. User
@.calytrix
speculated this could be due to mixtralâs dense yet low-redundancy 7b models. - Comparative conversation between Foundation 7b vs Old 7b models. User
@fernando.fernandes
mentioned that the new 7b models seem more challenging for everyone despite the old 7b models being denser and more sensitive to catastrophic forgetting. - Theory by
@fernando.fernandes
discussing self-attention orthogonality and performance, proposing that performance could be linked to the information volume in the self-attention layers. Especially, in poorly performing models such as the undi95 mixtral fine-tune, the self-attention layers are more orthogonal. - Proposed solutions for fine-tuning from
_jp1_
and@fernando.fernandes
. They recommend methods such as higher dropout rates, freezing router layers, and potential freezing of the embed_token layer. All suggestion aim to improve the performance of models like mixtral 7b. - Discussion around Disco Research by
@le_mess
and@.pathos
, focusing on the 1970s disco music impact. - Update on the leoIm Preprint release as shared by
@bjoernp
. Despite the delay in the preprint availability due to ongoing improvements and evaluations, the release is confirmed for the future. - Detailed information about LEOIm Training was provided by
@bjoernp
. The training of Mistral 7b was on A100s at ~3000 tokens/s/GPU, utilizing approximately 65 billion tokens.
DiscoResearch Channel Summaries
â· #mixtral_implementation (9 messagesđ„):
- Fine-tuning Challenges with 7b Models: User
@.calytrix
has observed that the 7b models are particularly challenging to fine-tune with LoRA without causing performance degradation or catastrophic forgetting. This user speculates that the issue might be tied to mixtralâs dense, low-redundancy 7b models which might be less tolerant to LoRA fine-tuning. - Foundation 7b vs Old 7b:
@fernando.fernandes.
shares the observation that everyone, regardless of their finetuning methods, appears to be struggling with the new 7b models. This is contrary to the experience with the older 7b models, which are even denser and thus more susceptible to catastrophic forgetting. - Self-Attention Orthogonality and Performance:
@fernando.fernandes.
proposes that the amount of information stored in self-attention layers, conceptualized as databases, is related to their rankings and orthogonality. He noted that for models with poor performance, such as the undi95 mixtral finetune, the self-attention layers tend to be more orthogonal. Here, orthogonality is measured via the Frobenius norm calculated between the weights of self-attention modules from diverse experts. - Potential Solutions for Fine-tuning: User
_jp1_
proposes that QLoRA fine-tuning may not work well with router (or gate) layers, thus requiring higher dropout rates. Future rounds of finetuning, incorporating frozen router layers and additional bugfixes/improvements, could significantly improve performance.@fernando.fernandes.
agrees with this and suggests that it may be necessary to freeze the embed_token layer as well, though the reason for its potential positive impact still needs to be understood.
â· #general (9 messagesđ„):
- Disco Research: Participants
@le_mess
and@.pathos
mentioned that they are conducting research on the impact of disco music in the 1970s. - leoIm Preprint Release: User
@zitronesimo
inquired@bjoernp
about the release of the leolm preprint.@bjoernp
responded stating that the preprint is delayed due to ongoing work on improving the contribution and additional evaluations along with other projects but would certainly be released in due course. - LEOIm Training Details: Upon further queries from
@zitronesimo
,@bjoernp
provided specific details about the training of Mistral 7b. He stated that the training was conducted on A100s with speeds of about 3000 tokens/s/GPU and ~65 billion tokens were used for training.
LLM Perf Enthusiasts AI Discord Summary
- Conversation about Cursor vs. VS Code Copilot and decision-making criteria including code output quality, context construction, and codebase-oriented discussions. An outline of recent enhancements in Copilot was provided, punctuated by a YouTube link demonstrating these capabilities.
- Raising performance concerns with the Assistants API and GPTs integration. Discussion exploring possible speed improvements, including caching results, waiting for OpenAI remediation, and ingenious solutions. A sudden enhancement in the productâs speed was also mentioned humorously.
- A suggestion by
@dongdong0755
offered an interesting experiment in prompt splitting and an existing issue regarding extractions in their work. The acclaimed usefulness of a potential embeddings search functionality for Discord was also highlighted.
LLM Perf Enthusiasts AI Channel Summaries
â· #general (1 messages):
joshcho_: llamaindex most likely
â· #opensource (1 messages):
joshcho_: holy
â· #offtopic (4 messages):
- VS Code Copilot vs Cursor: User
@robhaisfield
expressed uncertainty over whether to stick with Cursor or switch back to using VS Code Copilot exclusively due to recent feature and UX changes in Copilot. The user considered Cursor to have an edge in yielding better outputs and having advanced context construction, despite Copilotâs improvements. - Benefits of Cursor:
@robhaisfield
highlighted one advantage of using Cursor is that all conversations about a code base are grouped within that codebase, creating a more organized system compared to having the conversations distributed across all ChatGPT conversations. - Questions about new features:
@jeffreyw128
inquired about the recent features added to VS Code Copilot. - Explaining new features in VS Code Copilot: In response,
@robhaisfield
cited several enhancements including inline chat, workspace search commands, capability to load documentation from sites or repositories, and editing code chunks through chat commands. A detailed demonstration of these capabilities was linked through a YouTube video.
â· #speed (5 messages):
- Performance of Assistants API + GPTs:
@joshcho_
expressed concerns about the slow speed of Assistants API and GPTs integration, and inquired if there were ways to overcome this like caching results. - OpenAI Product Release Concern:
@jeffreyw128
suspected that OpenAI might have shipped products prematurely, causing the slowness, and suggested either waiting for OpenAI to rectify the issue or building oneâs own solutions to speed up the process. @joshcho_
noted an apparent improvement in product speed and was somewhat amused by it.
â· #feedback-meta (1 messages):
joshcho_: i think retrieval would be useful. like an embeddings search for discord
â· #prompting (2 messages):
- Experiment with Prompt Splitting: User
@dongdong0755
suggested an experiment of splitting the prompt into two parts to see if there would be any difference in the performance. - Issues with Extractions: User
@dongdong0755
also mentioned facing a dilemma regarding extractions in their work.
Skunkworks AI Discord Summary
Only 1 channel had activity, so no need to summarizeâŠ
- Custom Filtered Dataset for Enhancing Reasoning:
@far_el
discusses that they have trained on a custom filtered dataset that is formatted to enhance reasoning. The dataset is also trained on multiple prompt formats. Far_el is interested in feedback on this model. (source) - Use Case - Understanding Large Code Bases:
@spirobel
shares their use case of understanding large code bases and expanding them. Spirobel experimented with the Phind codellama 34b and Mistralic, and realized that Mistralic performs better than Mistrallite by Amazon for their specific use case of detecting important function names from a git diff output. Spirobel wishes to understand why mistralic performs better for this specific retrieval task. (source) - Superiority of Mistralic over Mistrallite for Code Retrieval:
@spirobel
notes that Mistralic performs better at the task of code retrieval than Mistrallite, even though Mistrallite was supposedly optimized for retrieval. Spirobel speculates that the concept of âretrievalâ may vary in different contexts. (source) - Better Generalization of Mistralic:
@far_el
hypothesizes that Mistralicâs better performance may be due to their use of multiple prompt formats, which could potentially enable it to generalize better. Far_el will be investigating this further and plans to open source whatever they have for Mistralic-1. (source) - Axolotl Docker Image for H100:
@tcapelle
inquires about the availability of an Axolotl docker image compatible with H100. (source) - Mistralic vs OpenHermes 2.5 Performance:
@spirobel
states that upon experimenting, they found Mistralic to be more robust and produce better quality output compared to OpenHermes 2.5. Mistralicâs output was found to be often perfectly formatted in markdown. (source)
Alignment Lab AI Discord Summary
- Discussion in âai-and-ml-discussionâ on a new tool introduced via the article Introducing Text-to-CAD by
@entropi
. - Announcement of available collaborations on open source and research projects by users
@algomancer
and@rabiussany
in âlooking-for-collabsâ, with mention of specific areas of interest and openness for private message discussions. - Shared source code for Fine-tuning project in âgeneral-chatâ by
@teknium
, with the GitHub repository link provided by@propback
found at openchat.
Alignment Lab AI Channel Summaries
â· #ai-and-ml-discussion (1 messages):
entropi: https://zoo.dev/blog/introducing-text-to-cad
â· #looking-for-collabs (2 messages):
-
Open Source Contribution: User
@algomancer
offered to contribute to open source and open research projects over the holiday season. Their areas of interest include variable rate compute at inference, non-standard classes of generative models, Jepa style models with anything beyond an autoregressive decoder, and conditioning schemes for enhanced controllability. They expressed comfort with writing Triton/PyTorch and data pipelines. -
Research Project Collaboration: User
@rabiussany
is offering help with any deep learning research projects. They are open to private messages for collaboration discussions.
â· #general-chat (3 messages):
- Fine-tuning Code Source: User
@teknium
relayed that the entire fine-tuning code for their project is hosted on GitHub but didnât provide the link. User@propback
followed up with the link to the openchat repository, containing the training instructions for the project.