And people are rightfully cheering. They also announced their API platform today.
[TOC]
Nous Research AI Discord Summary
- The guild discussed options for GPU hardware for training transformer models and fine-tuning language models, considering options like two RTX 4070s, a single A4500 or RTX 3090s with an nvlink, with a YouTube video showcasing running a Mistral 8x7B Language Learning Model on an A100 GPU being shared.
- There were key conversations on whether the Mixtral model by Mistral AI could rival GPT-4 due to its high-quality Sparse Mixture of Experts (SMoE) output, and on fine-tuning Mixtral and generating quantized versions of the model.
- An interest in curating a high-quality coding dataset was expressed, with potential tasks ranging from generating code to debugging, translating, commenting, explaining, and expanding/cleaning/transforming it.
- Guild members shared and discussed key resources: a YouTube video discussing the deployment of open source models, an Arxiv paper, GitHub resources, a blog post on the Mixture-of-Experts (MoE), and a link to a GGUF format model file for Mistralâs Mixtral 8X7B v0.1.
- Discussions concerning potential future releases like an open-source version of GPT-3.5 Turbo by OpenAI and LLama-3, with sources hinting towards such developments being shared, such as tweets from (@futuristflower), (@apples_jimmy), and The Information article.
- Clarifications regarding running OpenHermes 2.5 with fine-tuning on a Mac M3 Pro, and the VRAM requirements for running Mixtral 8x7b, alongside the sharing of an HuggingFace link to the Llava v1.5 13B - GPTQ model.
Nous Research AI Channel Summaries
â· #off-topic (13 messagesđ„):
- Hardware Discussion: User
@airpods69
asked the community for advice on selecting GPU hardware for training transformer models and fine-tuning language models. The discussion revolved around the choice between two RTX 4070s or a single A4500, the former arising due to concerns about the high price of the A4500. The idea of sourcing an RTX 4090 was also floated by@giftedgummybee
. - Alternative GPU Options:
@giftedgummybee
proposed the possibility of two RTX 3090s with an nvlink as an alternative, and pointed out that the A4500 seemed overpriced in comparison to these options. - Nvidiaâs EULA: User
@kazamimichiru
brought up that Nvidiaâs EULA restricts the use of RTX series GPUs in data centers, but this was refuted by@airpods69
who clarified that the setup would be at a home location, not a much stricter data center environment. - Running Mistral on A100:
@pradeep1148
shared a YouTube video showcasing the processes of running a Mistral 8x7B Language Learning Model (LLM) on an A100 GPU. - Open Source AI Web Development:
@.plot
offered to help build a website for the community, showing interest in the open source AI niche and mentioning past projects such as open-neuromorphic.org and aimodels.org.
â· #interesting-links (13 messagesđ„):
- Should You Use Open Source Large Language Models?:
@teknium
shared a YouTube video discussing the deployment of open source models on WatsonX. - Arxiv Paper -
@euclaise
shared a link to a research paper though its specifics werenât discussed. - Awesome-llm-role-playing-with-persona:
@kazamimichiru
pointed out a GitHub repository providing resources for using large language models for role-playing with assigned personas. - Mixture-of-Experts Discussion:
@.beowulfbr
shared their blog post discussing the Mixture-of-Experts (MoE) as the future of Large Language Models. - Mistralâs Mixtral 8X7B v0.1 - GGUF:
@cyborgdream
shared a link to a GGUF format model file for Mistralâs Mixtral 8X7B v0.1. They mentioned the model can be run on one 3090 or on any CPU with 32GB of CPU RAM.
â· #general (545 messagesđ„đ„đ„):
- There is an ongoing discussion about the performance and potential of the Mixtral model, a high-quality Sparse Mixture of Experts (SMoE) model recently released by Mistral AI. Users are particularly interested in its fine-tuning capabilities, multilingual support, and potential to rival GPT-4 in performance.
- Various users express their intentions to experiment with fine-tuning Mixtral and generating quantized versions of the model. Challenges and issues regarding quantization are mentioned, with a particular focus on understanding the VRAM requirements of Mixtral.
@nagaraj_arvind
discusses the router auxiliary loss in mixture of experts models and points to a recent PR in huggingface transformers that adds mixtral moe support. This PR reportedly includes a setting that automatically calculates the auxiliary loss which according to@euclaise
helps balance the use of experts in MoE models.@wlrd
proposes the idea of curating a high-quality coding dataset.@teknium
supports the idea and outlines the types of tasks that such a dataset could contain, including generating code, translating code to other programming languages, debugging code, commenting code, explaining code, and expanding/cleaning/transforming code.- Kenshin9000âs thread claiming that GPT-4 beats Stockfish (the best chess engine) is discussed. The user said they will post evidence for this claim in two weeks.
â· #ask-about-llms (50 messagesđ„):
- Future Open Sourcing of GPT-3.5 Turbo: User
@cyborgdream
shared several leaks hinting towards the possibility of OpenAI releasing an open source version of GPT-3.5 Turbo. Some sources include tweets from (@futuristflower), (@apples_jimmy), and The Information article. The discussion indicated that such a release could boost OpenAIâs reputation amongst developers. - LLama-3: User
@cyborgdream
mentioned LLama-3, a model predicted to outperform GPT-4 and expected to be multimodal. The release of this model is reportedly set for February. - Inference and Fine-tuning on a Mac M3 Pro: Users
@teknium
and@night_w0lf
responded to@httpslinus
â question on whether the M3 Pro machine could run OpenHermes 2.5. Both suggested that the computer could run it via inference but not fine-tuning. - Inference with Mixtral and Llava 13B: User
@chhillee
inquired about the best tokens/sec achieved with Mixtral.@papr_airplane
shared an HuggingFace link to the Llava v1.5 13B - GPTQ model during a discussion on running inference with Llava 13B model. - VRAM Requirement for Mixtral 8x7b: User
@gerred
asked if 96GB VRAM would be sufficient to run Mixtral 8x7b. The conversation didnât provide a concrete answer.
OpenAI Discord Summary
- Extensive discussions pertaining to the philosophy of truth in scientific communities, fairness metrics, and algorithmic biases in AI methodology. Debate initiated by
@light.grey.labs
with contributions from@whynot66k20ni
,@lhc1921
,@solbus
, among others. - Shared concerns and issues regarding the performance and functionality of OpenAIâs services, including ChatGPT message limit, GPT4 subscriptions, and video color grading. Users
@robg2718
,@rjkmelb
,@lyrionex
,@marcelaze
,@lumirix
,@croc_cosmos
,@null_of_gehenna
,@thunder9289
,@prncsgateau
,@solbus
,@gd2x
,@elektronisade
,@swedikplay
, and@lugui
participated in these discussions. - Various concerns about GPT-4, including long waitlist, issues with contextual understanding and inconsistent performance.
@kyper
,@eveiw
,@Rock
,@.pywiz
,@drcapyahhbara
,@pietman
,@solbus
,@chotes
,@napoleonbonaparte0396
added to the discussion. - Extensive queries and issues related to prompt engineering, focusing on improvement of model performance, instruction comprehension, censorship issues, the understanding of markup languages, and metrics for measuring success in prompt engineering. This line of discussion saw contributions from
@m0bsta
,@cat.hemlock
,@tp1910
,@you.wish
,@madame_architect
,@exhort_one
,@pfakanator
, and@bambooshoots
. - Various technical issues related to GPT-4, including problems with donations for access, network issues, dissatisfaction over performance and functionalities, possible account suspension, and frustrations over creating custom GPTs were reported by
@strange073
,@inspectorux
,@lucianah
,@slo_it_down
,@kurailabs
,@lumirix
,@michaelyungkk
,@rjkmelb
,@maledizioni
,@panospro
,@maticboncina
,@chealol
,@digitallywired
, and@mfer.pirx
. - AI art was a notable topic of discussion, with debates over modifying rules, evaluation of AI art tools such as
Bard
andGemini Pro
, and resources for AI news suggested by users like@rchap92
,@lugui
,@fluffy_dog__
,@staniscraft
,@avalani
,@thunder9289
, and@julien1310
. - Concerns over potential copyright violation raised by
@swedikplay
, with@lugui
confirming OpenAIâs awareness of the issue and urging the forwarding of additional information. - Heated debates and diverse viewpoints on matters relating to prompt style, DALLe policy, AI performance, expanded AI context, resulting in significant interactions among channels, with
@m0bsta
,@cat.hemlock
,@pfakanator
,@mysticmarks1
,@fluffy_dog__
,@bambooshoots
,@madame_architect
,@eskcanta
amongst others contributing to the discussion.
OpenAI Channel Summaries
â· #ai-discussions (69 messagesđ„đ„):
- Philosophy of Truth in Science: Conversation started by
@light.grey.labs
questioning the motivation behind the quest for truth in the scientific community. The conversation evolved into a broader discussion about reality, observability, and the nature of quantum physics, with contributions from@whynot66k20ni
,@lhc1921
,@solbus
, and others. - Fairness Metrics and Algorithmic Bias: Offhand remarks made by
@whynot66k20ni
related to algorithmic fairness and bias, with no detailed discussion on the topic. - AI Art Modification Rules: A short discussion between
@rchap92
and@lugui
about the guidelines for generating images of real people using AI tools, with mention of various platforms like BING AI and Krea.ai. - Evaluation of AI Art Tools: Positive comments about
Bard
andGemini Pro
by@fluffy_dog__
and@staniscraft
respectively, as well as a brief interaction about Grok AI between@avalani
,@lugui
, and@thunder9289
. - Resources for AI News:
@julien1310
inquires about best resources for AI news, to which multiple users suggested sources like Ycombinator, Perplexity, and arXiv. In particular,@shadowoftheturing
shared direct links to recent submissions on arXiv in the fields of Computational Linguistics (cs.CL) and Artificial Intelligence (cs.AI). - Upscaling AI Art: Concluded with a discussion initiated by
@sgsd_
about upscaling AI art, with several suggestions from@elektronisade
including free and paid services like Stable Diffusion, Magnific AI, and Topaz.
â· #openai-chatter (131 messagesđ„đ„):
-
Speech Feature on Android: User
@gd2x
asked about the absence of a speech feature for ChatGPT on Android devices.@elektronisade
suggested that ad blocker DNS or services might be intervening, after disabling which,@gd2x
confirmed the issue was resolved. -
Various Issues and Discussons about OpenAI: Numerous users brought up their concerns and discussions about various aspects of OpenAIâs services. The topics included unclear information about the ChatGPT message limit over time (
@robg2718
,@rjkmelb
,@lyrionex
), the waitlist and availability of GPT4 subscription (@marcelaze
,@lumirix
,@satanhashtag
), the possibility of using GPT4 for video color grading (@croc_cosmos
,@null_of_gehenna
,@thunder9289
), and concerns about accessibility issues on the iOS app for voiceover users (@prncsgateau
,@solbus
). In these discussions, various users, including@lugui
,@offline
, and@mrcrack_
provided information or referred to appropriate help resources. -
Potential Copyright Violation:
@swedikplay
discussed their concern about a third-party bot on Discord that potentially violates OpenAIâs identity.@lugui
confirmed OpenAIâs awareness of the issue and encouraged@swedikplay
to pass any supporting information via DM. -
Features and Updates on OpenAI: Various users inquired about the rumored upcoming announcements (
@merpnderp
), ability to upgrade to ChatGPT Plus through iOS (@alpha33589
), and the awaited launch of the GPT store (@emiliaaaaa_
). However, definitive responses were not available. -
Confusion and Complaints About GPT Usage and Performance: Users
@becausereasons
,@mrcrack_
, and@Meme Popperz
expressed dissatisfaction with the performance of GPT services, bringing up issues with instruction following, decreasing creativity, limiting message quotas, and website lagging during use.
â· #openai-questions (97 messagesđ„đ„):
- Accessing GPT-4:
@strange073
and@inspectorux
discussed about donation criteria for accessing GPT-4. However, there was no clarification provided in the chat about how to make a $1 donation for access. - Performance Issues:
@lucianah
and@inspectorux
expressed frustration with network errors and slow processing times, with@lucianah
suspicious of possible usage throttling due to high numbers of Plus users.@slo_it_down
also mentioned recurring error messages, especially after file inputs. Minimal troubleshooting was provided by the the chat community. - Use of Custom GPT for complex tasks:
@kurailabs
expressed frustration over GPT-4âs reticence to fully generate law papers in response to specific instructions, compared to GPT-3.5âs willingness to do so.@lumirix
provided some explanation and shared OpenAIâs usage policies concerning high risk government decision-making. - Subscription issues:
@michaelyungkk
reported problems with multiple credit card denials during attempted subscription.@rjkmelb
suggested subscribing via the iPhone App, then recommended contacting OpenAI support via their website when this didnât work. - Account suspension:
@maledizioni
requested urgent help for account reactivation after a mistaken age verification error, but was redirected to openai support by@rjkmelb
. - Creating Custom GPT: Questions and issues with creating custom GPTs were raised by
@panospro
,@maticboncina
,@chealol
,@digitallywired
, and@mfer.pirx
. Assistance was provided by@mysticmarks1
. - Corrupted and long chat threads:
@maf2829
discussed an issue of getting a âmessage in conversation not foundâ error.@elektronisade
suggested the possibility of thread corruption and asked if@maf2829
was using any unofficial browser extensions for ChatGPT. The issue remained unresolved.
â· #gpt-4-discussions (69 messagesđ„đ„):
- Limitations on Tools for Custom GPT:
@kyper
raised a question about the limitations on the number of functions a custom GPT can handle and whether including tools consumes tokens. - Issues with GPT-4: Several users including
@eveiw
,@Rock
,@.pywiz
, and@drcapyahhbara
expressed concerns about the performance of GPT-4 including difficulty in remembering context, inconsistent performance and a long waitlist. - Instructions in a Custom GPT: There was a discussion about whether itâs better to include instructions in the config of a custom GPT or in the file placed in knowledge, with suggested strategies from
@smilebeda
and@offline
. - Creating a Variable in GPT:
@pietman
asked for advice on creating variables in a GPT to reference in instructions.@solbus
and@chotes
offered strategies and resources to read in order to achieve this. - Limits on Creating GPTs:
@napoleonbonaparte0396
asked whether there was a limit on how many GPTs one can create.
â· #prompt-engineering (93 messagesđ„đ„):
-
Prompts and Model Performance:
@mysticmarks1
shared their concerns over bias and issues with the DALLE3 model and suggested that theyâve tweaked certain codes to improve its performance. However, not everyone agreed with their viewpoints. There was a discussion about how instruction prompts can be modified to get more accurate results, as articulated by@pfakanator
and@bambooshoots
.@cat.hemlock
also shared a detailed guide on markdown format for instructing models. -
GPTâs Understanding of Instruction:
@tp1910
asked about the difference between adding instructions to the configuration or knowledge section of a custom GPT. There was no clear answer given in the chat. -
OpenAI GPTâs Gaming Queries:
@you.wish
asked for advice on tweaking a game-related query (Dead by Daylight) that was being censored by OpenAI.@madame_architect
provided a suggestion that appeared to suit the userâs needs. -
Markup Language Inquiry:
@exhort_one
sought clarification about Markdown, a markup language. -
Measurement of Prompt Engineering Success:
@madame_architect
initiated a discussion on metrics for measuring success in prompt engineering, concentrating on converting qualitative aspects of language into quantitative metrics.@cat.hemlock
suggested evaluating consistency for measuring success.
â· #api-discussions (93 messagesđ„đ„):
- Discussion about Prompt Style:
@m0bsta
expressed difficulties with creating effective prompts due to comprehension issues, while@cat.hemlock
provided examples of how to create effective prompts using Markdown and suggested not to delay this process. - Dalle Policy:
@cat.hemlock
shared detailed instructions for the DALLe (a tool for image generation) usage policy. These instructions covered various points, such as its image generation limitations along with contextual restrictions and ethical guidelines.@cat.hemlock
further provided an example of a default prompt for DALLe in TypeScript, asking for user@215370453945024513
âs thoughts. - Feedback and Interactions about AI Performance:
@pfakanator
shared that instructing the agent to âunderstand things in a way that makes senseâ improved responses.@mysticmarks1
expressed dissatisfaction with current prompt set-ups and shared an improved version.@fluffy_dog__
asked for thoughts on the performance of Bard compared to ChatGPT, which@eskcanta
redirected to a different channel. - Expanded AI Context:
@bambooshoots
discussed the implementation of cross-conversation context management for more coherent and extended conversations with the AI. - Intense Personal Interactions:
@bambooshoots
and@mysticmarks1
engaged in a heated debate, with different viewpoints expressed regarding code contribution and personality traits. - Quantitative Measures for Prompt Engineering:
@madame_architect
was trying to understand how to convert qualitative aspects of language into quantitative metrics for measuring prompt engineering success, and solicited advice/input from others.
DiscoResearch Discord Summary
- Discussion primarily revolved around the implementation, performance, and future expectations of the Mixtral model, with various technical issues and proposed solutions being discussed. Key topics included output issues, VRAM requirements, multi-GPU compatibility, model quantization, and auxiliary losses, as found in Mixtral on Huggingface. â
@nagaraj_arvind
pointed out that base MoE models use the standard language model loss functionâŠâ - Users shared varying experiences regarding Mixtralâs performance, with consensus around its capable handling of extensive contexts, despite its inferior translating ability. â
@goldkoron
mentioned that the modelâs translation abilities were inferior to other models like GPT-3.5.â - Technical updates included Huggingfaceâs addition of Mixtral model support with a pull request, and the report of an in-progress pull-request on the vllm from the Mistral side in the #benchmark_dev channel.
- Individuals in the #general channel discussed the differences between LeoLM 70b Chat and DiscoLM 70b. â
@bjoernp
clarified that Leo 70b chat is finetuned on only German instruction data, while DiscoLM includes mostly English instructions.â - The spotting of a refined version of the Mistral-7B model dubbed âMistral-7B-v0.2â elicited community interest. â
_jp1_
spotted âMistral-7B-v0.2â on the Mistral AI models pageâŠâ - A callout was made by
@tarikoctapm
for potential collaborators on a distributed computing project focused on training an LLM during idle periods. - The community also engaged in more casual discussions, such as celebrating the birthday of
@fernando.fernandes
in the #mixtral_implementation channel.
DiscoResearch Channel Summaries
â· #disco_judge (1 messages):
nagaraj_arvind: They are the same
â· #mixtral_implementation (322 messagesđ„đ„):
-
Mixtral Implementation Issues and Resolutions: Users in the channel faced various issues in implementing the Mixtral model. Some of the issues were related to model performance, VRAM requirements, multi-GPU compatibility and the need for model quantization. Several solutions, including the use of specific versions of libraries and the inclusion of auxiliary losses, were proposed to tackle these problems.
-
@goldkoron
stated that running the DiscoLM Mixtral model was causing output issues but found disablingexllama
might solve it. ErrorResponse issues about memory allocation were also reported by@goldkoron
. -
@nagaraj_arvind
pointed out that base MoE models use the standard language model loss function, and if you setoutput_router_logits = True
, aux loss is calculated automatically. If you want to add your own losses, you can import from the switch transformer and use returned logits to compute it, as seen in this section of the Mixtral model on the Huggingface repository. -
@datarevised
noted that Mixtral currently lacks multi-GPU support for inference but this issue is being addressed as seen in this pull request. -
@armifer91
indicated that they are trying to run Mixtral using the LLaMa implementation provided here. -
@fernando.fernandes
suggested that Mixtral v0.1 can work in low contexts of RAM like 8k using 4-bit quantization as seen in here, where GGUF format model files with instructions like âpython -m venv venvâ can be installed.
-
-
Mixtral Performance Evaluation: Users have shared their experiences with Mixtralâs performance, with
@goldkoron
mentioning that the modelâs translation abilities were inferior to other models like GPT-3.5. The modelâs capability to handle more extensive context has been valued by users like@goldkoron
and@fernando.fernandes
. -
Mixtral on Huggingface Transformers: Discussion about a Mixtral pull request on Huggingface was shared by
@flozi00
. It added support for the Mixtral model to Transformers. The PR has been merged into Transformers as reported by@le_mess
. -
Birthday Celebration:
@fernando.fernandes
stated that it is his birthday week and members of the channel, such as@datarevised
and.grey_
, wished him a happy birthday. -
Mixtral Model Size Speculations and Future Expectations:
@dyngnosis
sparked a discussion about the size of experts in the future release of the Mixtral model. Users speculated the size could range from 30 to 70.@fernando.fernandes
mentioned that a âall-in-oneâ model like Mixtral 7B could be very beneficial for specific tasks.
â· #general (15 messagesđ„):
- DiscoResearch Discord Link Fix:
@philpax
reported that the Discord invite link on the DiscoResearch page on HuggingFace had expired. This issue was resolved by_jp1_
. - Difference Between LeoLM 70b Chat and DiscoLM 70b:
@apepper
inquired about the differences between LeoLM 70b Chat and DiscoLM 70b.@bjoernp
clarified that Leo 70b chat is finetuned on only German instruction data, while DiscoLM includes mostly English instructions. - Translation Models Recommendation:
@apepper
asked which model would be suitable for translation from English to German.@bjoernp
suggested that both LeoLM and DiscoLM might not be the best fit as translation data isnât explicitly included in datasets. However,@noobmaster29
shared a link to a GitHub resource which might be helpful for finetuning translation. - Mistral-7B-v0.2 Spotting:
_jp1_
spotted âMistral-7B-v0.2â on the Mistral AI models page, and noted that although this is not an improved 7b base, it is a better fine-tuning of the initial Mistral-7B. - Collaboration Inquiry:
@tarikoctapm
did a callout for potential collaborators in their distributed computing project, where they plan to train an LLM when nodes are idle and not being rented.
â· #benchmark_dev (2 messages):
- Adding Backend for llama.cpp:
@rtyax
reported that adding a backend for llama.cpp to run models is straightforward, but integrating with other backends poses a challenge due to their utilization of Hugging Face configuration and tokenizer features. - Progress on the vllm PR:
@flozi00
mentioned that there is a work in progress on the vllm pull request from the Mistral side, according to the documentation.
HuggingFace Discord Discord Summary
- Graph Extension using Fuyu: User
@heumas
faced challenges in extending graphs.@doctorpangloss
proposed usingFuyu
and provided a demonstration through a Google Colab link. - Audio Classification Models:
@doctorpangloss
suggestedaudioclip
andwav2vec
for audio classification in response to@vara2096
âs query. - Accelerate Framework on Mistral:
@ghimiresunil
shared an error encountered when using the Accelerate framework on Mistral model and shared the error along with a code sample via a GitHub Gist. - Decentralized Pre-training: user âneuralinkâ shared about implementing 0.01% of DiLoCo decentralized pre-training.
- Webinar on LLM-based Apps & Their Risks:
@kizzy_kay
announced a webinar by Philip Tannor on âEvaluating LLM-based Apps & Mitigating Their Risksâ.Registration link shared. - HuggingFace Summarization Models: User
@kaycebasques
shared experience of using HuggingFace summarization models for Sphinx site pages via a blog post. - TV Show Quote Scraping: âjoshuasundanceâ shared a TV quote dataset available on HuggingFace link here.
- AI Model Badging System: User
@.plot
suggested an open-source badging system for AI models available here. - Reading group discussions focused on Magvit2, Eliciting Latent Knowledge (ELK) paper here, and Text-to-image (T2I) diffusion models paper here .
- Difficulties Running mm_sdxl_v10_beta.ckpt with Animatediff:
@happy.j
reported difficulties running this implementation and had to resort to using the implementation from the animatediff GitHub repo. - Computer Vision Discussions: Topics included excessive text extraction from bounding boxes, photogrammetry, and mesh extraction Link to Sugar project.
- Chatbot Architecture, LLM on Amazon EC2 G5g, and Sentiment Analysis were main topics in the NLP channel. Issues such as CUDA incompatibility and memory errors were addressed.
HuggingFace Discord Channel Summaries
â· #general (70 messagesđ„đ„):
- Use of Fuyu for Graph Extension: User
@heumas
was having problems extending or creating graphs using AI models.@doctorpangloss
suggested usingFuyu
to extract data from graphs, although it doesnât have the capability of adding new image data to graphs coherently. He also offered a demonstration through Google Colab. - Discussion on Audio Classification Models: User
@vara2096
asked for an open source model that can use raw vocal audio as input effectively, with the aim to classify a large pile of audio files.@doctorpangloss
suggested tryingaudioclip
orwav2vec
. - Issues with Accelerate Framework on Mistral Model: User
@ghimiresunil
posted problem about an error he encountered while using the Accelerate framework to train the Mistral model across seven A100 GPUs, seeking help to fix this error. The error and code sample were shared via GitHub Gist. - Compression of Large Datasets: User
@guactheguac
sought advice on using ML/DL for compression of large datasets collected from LiDAR, large format photogrammetry, and multispectral.@doctorpangloss
replied saying the expectations for neural approaches should be moderate, but provided no specific suggestions or resources. - Fine-tuning Llama-2 with PPO: User
@harrison_2k
mentioned he was using PPO for fine-tuning theLlama-2
, and was looking for suggestions or documentation regarding the appropriate reward range for this process.
â· #today-im-learning (1 messages):
neuralink: the last three days i learned: implemented 0.01% of DiLoCo decentralized pre-training
â· #cool-finds (1 messages):
- Upcoming Webinar on LLM-based Apps & Mitigating Their Risks:
@kizzy_kay
from the Data Phoenix team announced a free webinar titled âGPT on a Leash: Evaluating LLM-based Apps & Mitigating Their Risksâ. The speaker for this webinar is Philip Tannor, Co-founder and CEO of Deepchecks. - Webinar Date and Time: The event is scheduled for December 12, at 10 am PST.
- Learning Opportunities: Attendees can expect to learn about evaluating and mitigating risks in LLM-based applications, testing AI systems involving text and unstructured data, and how to navigate the complexities of providing contextually appropriate responses.
- Registration: Interested parties are encouraged to register before the event to secure a spot.
- Q&A Session: The webinar will also include a Q&A session for attendees to explore specific concerns relating to the topic.
â· #i-made-this (6 messages):
- Page Summarization for Sphinx Sites:
@kaycebasques
shared his exploration of using HuggingFace summarization models to generate summaries for Sphinx site pages. Based on his experimentation, the endeavor seems promising but inconclusive. The blog post explains the potential advantages of implementing page summarization on technical documentation sites. - TV Show Quote Scraping:
@joshuasundance
has scraped quotes from TV shows from wikiquote.org. These quotes are available on HuggingFace as a datasetjoshuasundance/wikiquote_tv
and contain 103,886 rows of data available here. - AI Model Badging System: User
@.plot
suggested a badge-style open-source information system for AI models, similar to Creative Commons badges, and sought public feedback. The proposed system consists of badges such asOpen Model (OM)
andOpen Model - Open Weights (OM-OW)
, among others, to foster transparency and collaboration. - Positive Reception: The AI model badging system received positive feedback from
@tonic_1
. - Possibilities with TV Show Quotes:
@joshuasundance
and@tonic_1
brainstormed potential applications for the scraped quotes, such as fine-tuning language models or creating a âRag-typeâ bot that can assume any character from TV shows.
â· #reading-group (2 messages):
- Magvit2 Discussion:
@chad_in_the_house
suggests presenting on Magvit2 and shares the paper link here. - ELK Research:
@chad_in_the_house
also considers discussing a paper on Eliciting Latent Knowledge (ELK). The abstract indicates research on âquirkyâ language models and the paper is accessible here. - Text-to-image Diffusion Models: Finally,
@chad_in_the_house
shows interest in a paper on Text-to-image (T2I) diffusion models which talks about the computational costs of these models. The paper can be accessed here.
â· #diffusion-discussions (1 messages):
- Difficulty Running mm_sdxl_v10_beta.ckpt with Animatediff:
@happy.j
reported difficulties running mm_sdxl_v10_beta.ckpt with the diffusers animatediff implementation and had to resort to the implementation from the animatediff GitHub repo.
â· #computer-vision (15 messagesđ„):
-
Extracting Text from Bounding Boxes:
@navinaananthan
inquired about how to extract text from a set of bounding boxes, specifically from a newspaper.@boriselanimal7
suggested using Optical Character Recognition (OCR) and provided a Medium article as a resource.@merve3234
also highlighted that text extraction is the objective and recommended@947993236755054633
for professional guidance. -
Photogrammetry and Mesh Extraction:
@n278jm
and@individualkex
discussed the use of photogrammetry for 3D modeling, especially concerning interior design applications for non-local or budget-conscious clients.@n278jm
demonstrated concerns over the level of precision attainable without lidar. They later shared a link to a project named Sugar focusing on precise and fast mesh extraction from 3D Gaussian Splatting. -
Machine Learning/Deep Learning in Computer Vision:
@guactheguac
asked if anyone was using machine/deep learning in computer vision, sparking potential further discussion in the channel.
â· #NLP (34 messagesđ„):
- Discussion on Chatbot Architecture:
@ppros666
and@vipitis
discussed about transformers in chatbot implementation, where@ppros666
clarified that the paper they referred to mentioned a transformer with some modifications, not âdropping the encoderâ as in many chatbot applications. - Running LLM Application on Amazon EC2 G5g Instances:
@lokendra_71926
inquired about running an LLM application on Amazon EC2 G5g instances using auto-gptq.@merve3234
clarified that auto-gptq can be used to quantize the model if itâs too big to fit into the EC2 instance. - Troubleshooting GPU Availability on Torch:
@lokendra_71926
faced an issue wheretorch.cuda.is_available()
returned false, despite the GPU being visible with thenvidia-smi
command.@merve3234
suggested that there might be a mismatch between CUDA-related packages and GPU requirements, or the GPU might not be CUDA-compatible. - Sentiment Analysis with TinyBERT Model:
@blood_bender64
sought advice for a sentiment analysis problem using a TinyBERT model, which was performing poorly on the validation set despite various learning rate adjustments.@merve3234
and@9alx
suggested checking the distribution of data in validation and test sets, investigating class noise, and monitoring loss changes from epoch to epoch to understand if the model was underfitting. - RuntimeError: CUDA out of memory:
@blood_bender64
encountered a CUDA out of memory issue during training.@vipitis
suggested checking if optimizer states were kept on the GPU and inquired about the number of batches and gradient accumulation steps.
â· #diffusion-discussions (1 messages):
- Running mm_sdxl_v10_beta.ckpt with animatediff: User
@happy.j
enquired about difficulties in runningmm_sdxl_v10_beta.ckpt
with thediffusers
animatediff implementation. They mentioned that various attempts were unsuccessful and they had to revert to using the implementation from the animatediff GitHub repository.
OpenAccess AI Collective (axolotl) Discord Summary
- Concerning the Axolotl project, a work-in-progress branch that centers on Mixtral Sharded was shared by
@caseus_
. View the specifics of the branch here. - Questions surrounding Distributed Policy Optimization (DPO) and Reinforcement Learning with Human Feedback (RLHF) were raised. The Hugging Face TRL DPO Trainer documentation was cited as evidence that the two closely relate.
- Queries on hosting a LLava 13b model on a server or on VLLM were broached, having trouble making a Python request to pass the image to the API.
- The feasibility of training the Mixtral-MoE model from Axolotl using qLora on a singular, 24GB GPU was debated, citing ~27GB as the requirement to infer it in 4 bits.
- A comparison was drawn between V100s and 3080s, evaluating how a finetune of opt-350m on a single GPU results in about 3.5 iterations per second (it/s) on a 3080 and 2it/s on the V100.
- The script to train Mixtral using tinygrad was shared, as was related conversation about training mixtral with openorca on 3xA40, specifically using the DiscoResearch model.
- Updates on Mistralâs Mixtral 8x7B release, a high-quality sparse mixture of experts model (SMoE) with open weights, were mentioned here.
- Queries about the high VRAM requirements for Mixtral were addressed, and it was shared that with adequate quantization and optimization, it should align to the resources needed to run a 12B model.
- Transformers now support llava natively, simplifying integration processes.
- Questions about multipack usage and clarity in relation to token packing, positional encoding, loss computation were raised â a documentation page was offered for reference.
- Members encountered a FileNotFoundError issue during a mixtral training run involving checkpoints â in response to this, a potential solution mgan were shared and was advised to monkeypatch the local file in the virtual environment.
- Lastly,
@joshuasundance
created a dataset of more than 103,886 rows of quotes from various TV shows, accessible here.
OpenAccess AI Collective (axolotl) Channel Summaries
â· #general (64 messagesđ„đ„):
- Axolotl Work-In-Progress Branch:
@caseus_
shared a link to a work-in-progress branch of the Axolotl project on GitHub related to Mixtral Sharded. The details of the branch can be seen at this link. - DPO and RLHF:
@noobmaster29
was curious if DPO is a form of RLHF.@nanobitz
pointed to the Hugging Face TRL DPO Trainer documentation as proof that they might be the same thing. - Hosting LLava 13b Model on a Server: Responding to
@papr_airplane
, who inquired on hosting LLava 13b model either on a server or on VLLM but was having trouble doing a Python request to pass the image to the API. - Axolotl Training of Mixtral-MoE:
@gururise
wondered if the Mixtral-MoE model from Axolotl can be trained using qLora in a single 24GB GPU.@whiskeywhiskey
doubted itâs possible, as it took ~27GB to infer it in 4 bits. - Training Speeds of V100s: In a discussion with
@gururise
about training speeds,@whiskeywhiskey
mentioned that a finetune of opt-350m on a single GPU results in about 3.5 iterations per second (it/s) on a 3080 and 2it/s on the V100.
â· #axolotl-dev (16 messagesđ„):
- Mixtral with Tinygrad:
@caseus_
shares the script for training Mixtral using tinygrad from its official GitHub page mixtral.py at mixtral · tinygrad/tinygrad. - Training Mixtral with Openorca:
@whiskeywhiskey
discusses about training mixtral with openorca on 3xA40. He further mentions that he has used the DiscoResearch model that works with transformers@main. - Mistral Mixtral Release:
@jaredquek
and@nanobitz
discuss about Mistralâs release of Mixtral 8x7B, a high-quality sparse mixture of experts model (SMoE) with open weights Mixtral of Experts. - VRAM Requirement and Optimization:
@jaredquek
expresses concerns over high VRAM requirement (~28GB from Lelio).@_dampf
assures that with quantization and optimization, the resources needed would be more aligned with handling a 12B model, which should allow users to run Mixtral if they can operate 13B models. - Native Llava Support in Transformers:
@caseus_
informs that transformers now support llava natively which could make integration easier.
â· #general-help (9 messagesđ„):
- Multipack Understanding:
@tim9422
asked for clarification on how multipack works in relation to token packing, positional encoding, and loss computation, referencing a documentation page on the topic. - Mixtral Training Issue:
@colejhunter
encountered a FileNotFoundError during a mixtral training run involving checkpoints.@nanobitz
mentioned that another user had a similar issue previously. - Possible Solution for FileNotFoundError:
@caseus_
suggested a potential solution to the FileNotFoundError, referencing a GitHub issue and advising that the user monkeypatch the local file in their virtual environment.@colejhunter
and@whiskeywhiskey
showed appreciation for the advice.
â· #datasets (1 messages):
- TV Show Quotes Dataset:
@joshuasundance
scraped quotes from TV shows from wikiquote.org and compiled them into a dataset, which is publicly available on Hugging Face under the name âwikiquote_tvâ. The dataset contains quotes from various TV series, and comprises more than 103,886 rows. Dataset link: huggingface.co/datasets/joshuasundance/wikiquote_tv.
LangChain AI Discord Summary
- Implementation of kork package in pandas: User
@xstepz
engaged in discussions on how to incorporate the kork package to limit the functions of pandas accessed by bots. - Integration of LangChain with React.js and Python: Queries about the importance of Python in AI development while using LangChain with React.js, with an active project on diagnosing plant diseases in agri-tech being discussed. Suggested resource for learning PythonLangChain: deeplearning.ai courses.
- Criticism on using LangChain: user
@philipsman
shared a Reddit post, illustrating criticism about LangChain implementation, advising caution. - Issues with
ChatOpenAI
API:@a404.eth
expressed difficulty regarding theChatOpenAI
API when usingConversationalChatAgent.from_llm_and_tools
along with customTools
. - Model performance measurement methodologies:
@martinmueller.dev
initiated discussion on approaches to gauge model performance changes and specific codes with automation in view. - Connection timeouts with
Chroma.from_documents()
API:@wei5519
reported connection timeout errors when using theChroma.from_documents()
API. - Avoiding redundancy in RAG responses:
@b0otable
discussed ways to eliminate repeated phrases in RAG responses, suggesting prompt hints as a potential solution. - Understanding
AgentExecutor
operation:@egeres
sought understanding on the functioning ofAgentExecutor
- if actions were planned ahead or chosen in real time. - Use of Tracers in LangChain:
@lhc1921
recommended using tracers like LangSmith and Langfuse in LangChain for clearer understanding over console logs. - Performance comparison of models:
@_egeres
raised a query about times when a 7B model outperforms larger models (34B/70B), asking whether it was due to evasion of evaluation processes or unique fine-tuning techniques. - TV Show Quote Scraping for Deep Learning:
@joshuasundance
shared about scraping TV show quotes for deep learning, making the dataset with approximately 103,886 rows available on Hugging Face with samples from the TV series â10 Things I Hate About Youâ.
LangChain AI Channel Summaries
â· #general (69 messagesđ„đ„):
- Implementing kork package for limiting pandas functions: User
@xstepz
requested examples on how to implement the kork package to restrict the pandas functions accessed by the bot. - Discussion about Using LangChain with React.js:
@yasuke007
raised a question about the necessity of Python in AI development when using LangChain with React.js. The discussion extended to include a project on agricultural tech for diagnosing plant diseases using AI.@lhc1921
suggested resources for learning PythonLangChain: deeplearning.ai courses. - LangChain Critique on Reddit:
@philipsman
shared a Reddit post, criticizing LangChain implementation and advised caution. - Issues with the
ChatOpenAI
API:@a404.eth
expressed confusion over using theChatOpenAI
API withConversationalChatAgent.from_llm_and_tools
and customTools
. - Bedrock API Calls and Model Performance Measurement:
@martinmueller.dev
inquired about methodologies to measure the performance of models and specific codes as they evolve, with the aim of automating the process. - Error with
Chroma.from_documents()
API:@wei5519
experienced errors related to connection timeouts when using theChroma.from_documents()
API. - Eliminating Redundancies in RAG Responses:
@b0otable
discussed an issue regarding redundant phrases in the responses from a RAG workflow, and shared a potential prompt hint as a solution. - Understanding the Operation of
AgentExecutor
:@egeres
sought clarification on how theAgentExecutor
operates - whether it first makes a plan of the actions to take, or chooses actions on the go. - Utilizing Tracers in LangChain:
@lhc1921
suggested the use of tracers like LangSmith and Langfuse in LangChain for better comprehension instead of console logs. - Discussion on Model Performance:
@_egeres
posed a question about the instances when a 7B model beats larger models like 34B/70B, inquiring whether this could be attributed to tricking the evaluation process or innovative fine-tuning approaches.
â· #share-your-work (1 messages):
- Scraping Quotes from TV Shows for Deep Learning: User
@joshuasundance
shared that they have scraped quotes from TV shows from wikiquote.org. The dataset, containing around 103,886 rows, is available on Hugging Face. They provided several examples from the TV series â10 Things I Hate About Youâ.
Latent Space Discord Summary
- Discussion regarding various topics in the AI field, including the popular theme of efficiency at the NeurIPS Expo Day, as shared by
@swyxio
in a recap. - A question by
@aristokratic.eth
on creating personal datasets for fine-tuning ML models, but without any evident responses. - Sharing of a Twitter post by
@swyxio
that provided insights into Mixtral, sparking a discussion. - Positive feedback from
@kaycebasques
on the utility of Latent Space Benchmarks 101, with requests for the future 101 series. Response from@fanahova
indicating Algorithms 101 may be the next topic. - A tweet shared by
@swyxio
on November recap of AI events. - Mention of Humanloop by
@swyxio
in an unclear context, leading to a discussion without many specifics.
Latent Space Channel Summaries
â· #ai-general-chat (8 messagesđ„):
- NeurIPS Expo Day Recap: User
@swyxio
shared a recap of NeurIPS expo day 0, highlighting the popular theme of efficiency during the event. - Humanloop Inquiry: User
@swyxio
started a discussion about Humanloop but didnât provide any specific question or context. - Creating Own Datasets:
@aristokratic.eth
posed a question to the community about creating own datasets for fine-tuning ML models. - Mixtral Breakdown:
@swyxio
shared a Twitter post from Guillaume Lample providing a breakdown of Mixtral. - Latent Space Benchmarks 101 Feedback and Future 101s:
@kaycebasques
found the Latent Space Benchmarks 101 useful and inquired about future 101 releases.@fanahova
replied theyâll send out a survey for 101 requests, considering Algorithms 101 as the next topic.
â· #ai-event-announcements (1 messages):
swyxio: Nov recap here! https://fxtwitter.com/latentspacepod/status/1734245367817093479
Alignment Lab AI Discord Summary
- Conversation about a Mixtral-based OpenOrca Test initiated by
@lightningralf
, with the reference to a related fxtwitter post from the OpenOrcaâs development team. - Speculation on the speed of the machine learning process, proposed solution includes using server 72 8h100 to enhance performance.
@teknium's
declaration of testing an unidentified model and the need for further clarification of the said model.- Inquiry from
@mister_poodle
on ways to extend or fine-tune Mistral-OpenOrca for specific tasks, namely boosting NER task performance using their datasets and generating JSON outputs.
Alignment Lab AI Channel Summaries
â· #oo (5 messages):
- Discussion about a Mixtral-based OpenOrca Test:
@lightningralf
asked@387972437901312000
if they tested Mixtral based on OpenOrca, linking a fxtwitter post - Question about Process Speed:
@nanobitz
expressed surprise about the speed of the process, with@lightningralf
suggesting the use of server 72 8h100. - Unidentified Model Testing:
@teknium
mentioned testing some model, but being uncertain about which one.
â· #open-orca-community-chat (1 messages):
- Extending/Fine-tuning Mistral-OpenOrca for Specific Tasks: User
@mister_poodle
expressed interest in using their datasets to boost Mistral-OpenOrcaâs performance on an NER task with JSON outputs. They sought examples or suggestions for extending or fine-tuning Mistral-OpenOrca to achieve this goal.
Skunkworks AI Discord Summary
- Discussion occurred about the potential exploration of instructing tune mixtral, mentioned by zq_dev.
- @lightningralf shared a Tweet about the development of a fine-tuned chat version built on slim openorca.
- A YouTube link was shared by pradeep1148 without providing additional context.
Skunkworks AI Channel Summaries
â· #finetune-experts (1 messages):
zq_dev: Anybody attempting to instruction tune mixtral yet?
â· #moe-main (1 messages):
- Fine-tuned Chat Version Based on Slim openorca:
@lightningralf
shared a Tweet about a fine-tuned chat version based on slim openorca.
â· #off-topic (1 messages):
pradeep1148: https://www.youtube.com/watch?v=yKwRf8IwTNI
MLOps @Chipro Discord Summary
- An event notification was shared by user ty.x.202. in #events channel, with an invitation link: https://discord.gg/FKYww6Fn?event=1183435830711296032
- User fehir in #general-ml channel mentioned a new EU legislation without providing further details or context. This topic will be omitted due to insufficient context provided.
MLOps @Chipro Channel Summaries
â· #events (1 messages):
ty.x.202.: @everyone https://discord.gg/FKYww6Fn?event=1183435830711296032
â· #general-ml (1 messages):
fehir: New EU legislation in nutshell
The Ontocord (MDEL discord) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI Engineer Foundation Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Perplexity AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it