The top charts are good to know for mindshare:
[TOC]
OpenAI Discord Summary
- Issues surrounding use of Mixtral model raised by
@eljajasoriginal
, including inconsistencies in responses and comparison with Poe’s Mixtral. - Noticeable decline in GPT-4 and ChatGPT output quality, and behavioral unpredictability disclosed by OpenAI, according to users
@eljajasoriginal
and@felly007
. - Comparative performance indications of Playground GPT-4 and ChatGPT GPT-4, indicated by
@eljajasoriginal
. - Anomalous behavior in Bing’s response consistency and user concerns about hallucination or cached information usage, initially reported by
@odasso.
. - Discussions on user issues and performance with ChatGPT platform: including restriction on number of messages, finishing complex responses, typewriter-like response lags, coding with GPT-4, and financial constraints affecting subscriptions raised by
@superiornickson5312
,@sieventer
,@afayt
,@the_boss7044
, and@clockrelativity2003
respectively. - User-reported issues and queries on the OpenAI platform: including chat lags, inaccessibility of voice settings on Android, password reset failures, problematic 2FA activation, non-clickable links in ChatGPT4, GPT’s misunderstanding of symmetry, file upload failures, GPT-3.5 sidebar access issue, and repetitive pattern errors.
- Discussion of restricted access to ChatGPT Subscription Gear by
@nicky_83270
. - Observation of OpenAI Server Limitations by
@.cymer
and@satanhashtag
, and the possibility of content policy and censors affecting ChatGPT performance by@.cymer
. - Inquiries and dialogues on the use of ‘Knowledge’ in GPT by
@Rock
and@cat.hemlock
. - Techniques and approaches to guide outputs for specific requirements in GPT-4, shared by
@eskcanta
,@stealth2077
,@jeziboi
,@rendo1
, and@seldomstatic
. - Request for advice on creating a full-body perspective of a character with GPT-4 from
@neurophin
and exploration of minimalistic artstyle creation with DALL-E and ChatGPT-4 discussed by@errorsource
and@seldomstatic
. - Queries on earning opportunities with ChatGPT and the cost of using GPT tokens highlighted by
@antonioguak
.
OpenAI Channel Summaries
▷ #ai-discussions (52 messages🔥):
-
Consistency of Mixtral’s output:
@eljajasoriginal
noted that no matter how many times response is regenerated in Perplexity AI playground, Mixtral model gives same response to same prompts. The same user also compared Poe’s Mixtral to be less consistent in responses. -
Issues with GPT-4 and ChatGPT: Users like
@eljajasoriginal
and@felly007
discussed noticeable quality decline in GPT4 and ChatGPT.@eljajasoriginal
cited a statement from OpenAI that the model behavior can be unpredictable and they are looking into fixing the issue. -
Comparison of GPT-4 in Playground vs GPT-4 in ChatGPT:
@eljajasoriginal
shared an opinion that Playground GPT-4 might provide better results due to the absence of internal instructions that are present in ChatGPT model. This point was further debated on aspects such as safety measures and context length. -
Anomalous behavior in Bing:
@odasso.
shared an unusual experience with Bing’s ability on holding over query context across different conversations. The case was considered hallucination or usage of cached information by some users including@brokearsebillionaire
and@lugui
. -
Weird setup in Bard AI model:
@eljajasoriginal
reported finding strange assertions in Bard’s responses like it bringing up past non-existent conversations and including unnecessary location details.
▷ #openai-chatter (115 messages🔥🔥):
-
Limit on GPT4 Messages: User
@superiornickson5312
voiced a concern regarding a restriction on the number of messages they could send to GPT4, which was later clarified by@toror
stating that there is a limit on the number of messages a user can send per hour. -
GPT4 Errors and Completion: User
@sieventer
expressed frustration over GPT-4’s inability to finish responses due to complex prompts or errors, to which@z3wins
advised using shorter prompts and questioning one step at a time. -
Clearing Delayed ChatGPT Response: User
@afayt
complained about issues with the typing animation in ChatGPT on PC, ultimately concluding it was due to a very long conversation history. -
ChatGPT for Code Writing: User
@the_boss7044
queried about writing actual code with GPT-4. User_@jonpo
suggested that one could just tell it to code, specifying not to chat too much. -
Subscription & Performance Issues: Some users voiced dissatisfaction with the performance of ChatGPT, citing frequent network errors (
@fanger0ck
). However, others defended the system, suggesting these were temporary problems due to server load (@aminelg
).@loschess
expressed satisfaction with the service despite minor issues, while@clockrelativity2003
shared the need to cancel the subscription due to financial challenges.
▷ #openai-questions (66 messages🔥🔥):
-
Long Chat Issues: User
@imythd
asked for solutions when a chat becomes too long and lags.@solbus
suggested summarizing crucial information in a new chat, or creating a custom GPT for storing critical context if the user has a Plus subscription. -
Voice Setting Issue on Android App:
@davidssp.
experienced an issue accessing voice settings on the Android app of ChatGPT.@solbus
clarified that the voice feature is only available in apps and suggested checking in the Android app or downloading it from the Play Store. -
Inability to Reset Password:
@vijaykaravadra
reported an issue with not receiving a reset password email. The solution to this issue wasn’t discussed in the messages provided. -
2FA Activation Issue: User
@palindrom_
reported an issue activating two-factor authorization after deactivating it.@satanhashtag
linked to an OpenAI article explaining that 2FA might be temporarily paused. -
Non-Clickable Links in ChatGPT4:
@mouad_benardi_98
experienced an issue with ChatGPT4 providing non-clickable links.@satanhashtag
suggested trying a new chat with GPT4 without custom instructions and plugins, or asking for solutions in a separate channel. -
GPT Misunderstands Symmetry:
@neonn3mesis
reported that GPT confuses horizontal and vertical symmetry. The solution to this issue wasn’t discussed in the messages provided. -
Inability to Upload File:
@askwho
reported an issue with not being able to upload any file to ChatGPT4. The solution to this issue wasn’t discussed in the messages provided. -
Desktop GPT-3.5 Sidebar Access Issue:
@d_smoov77
had trouble accessing the left tab options on the desktop version of GPT-3.5.@solbus
directed them to the little arrow on the far left-center of the page. -
Repetitive Pattern Error:
@slip1244
reported aBadRequestError: 400
error when calling the same system message and function multiple times. The solution to this issue wasn’t discussed in the messages provided. -
Gemini Testing:
@ttmor
reported testing Gemini and experiencing some bugs but overall considered it okay. Further discussion on this was not present in the given messages.
▷ #gpt-4-discussions (33 messages🔥):
- Issues with ChatGPT Subscription Gear:
@nicky_83270
reported a problem where they cannot access ChatGPT 4 even after paying for a subscription, to which@solbus
offered troubleshooting assistance. The options presented included checking whether the subscription renewal was successful and trying out different browsers/devices. - Discussion about OpenAI Server Limitations:
@.cymer
and@satanhashtag
discussed the potential reasons behind GPT’s slowness during peak times, including server limitations and the need for more servers or optimization. - Content Policy and ChatGPT Performance:
@.cymer
proposed a theory that ChatGPT’s policy updates and content censors could be causing it to become slower and less efficient over time. - Use of Knowledge Files in Custom GPT:
@jobydorr
asked about the behavior of knowledge files in Custom GPT, seeking clarity on whether the model only searches the files when specifically prompted or if it will use files for open-ended queries.@solbus
clarified that knowledge files exist as reference documents for a GPT and not as permanent context data. They can be queried and return data relevant to the specific query. - Disappearance of GPT Model:
@redash999
raised a concern about their GPT model disappearing without any notification or email.@Rock
suggested loading up any test chats that they may have had with the GPT, which might restore it.
▷ #prompt-engineering (62 messages🔥🔥):
- Engaging with GPT-4 for Detailed Scenes and Specific Requests:
@eskcanta
engaged GPT-4 for a detailed scene involving complex character relationships and arguments. They demonstrated how to guide the model through prompts by providing specific details and instructions for the narrative, emphasizing the need for clear guidance within the model’s comfort range and avoiding negative prompting. A link to a chat example was provided. - Concerns about ChatGPT’s Output and Contaminating Context:
@stealth2077
expressed worries about unwanted parts in output that may contaminate context.@brokearsebillionaire
cited how providing more context reduces the tokens available for output resulting in shorter replies, a problem that could be solved by using larger models, targeted context, or retrieval. - Generating Specific Script Style:
@jeziboi
sought help with generating a specific style of scripts, providing examples for reference.@alienpotus
suggested a structured approach that focuses on narrative structure, character development, context, and other crucial elements for generating such scripts featured in the given examples. - Approach to Negative Instruction and Mismatched Outputs:
@rendo1
recommended requesting GPT to stick closely to prompts, cautioning that GPT might modify prompts slightly.@seldomstatic
shared an approach to create tailored outputs based on artstyle using GPT-4, which sparked a discussion with@errorsource
on the inconsistency of outputs between Bing and GPT-4. - Utilizing ‘Knowledge’ in GPT:
@Rock
and@cat.hemlock
had a discussion about how to make the best use of ‘Knowledge’ in GPT. They discussed the limit of 2m tokens and the challenges around GPT’s tendency towards summarization, skipping, and inference.
▷ #api-discussions (62 messages🔥🔥):
- Generating scripts with GPT-4:
@jeziboi
asked for help in creating scripts with a specific style of storytelling, using GPT-4. The scripts contained clear narrative structures, well-developing characters, emotional depth, descriptive detailing, and surprise endings (like his pasted examples).@alienpotus
proposed a structured approach for creating such scripts. - Building Character Perspectives with GPT-4:
@neurophin
sought assistance with creating a full-body perspective of a character through GPT-4. This got responses on ways to guide the AI model to stick more closely to the given prompts. - Artstyle Generation with DALL-E and ChatGPT-4: Several discussions ensued between
@errorsource
,@seldomstatic
, and others about recreating a certain minimalistic yet detailed artstyle for landscape generation using DALL-E in ChatGPT-4. The results varied between the models. - Earning from ChatGPT and the Cost of Tokens:
@antonioguak
wanted to know how to make money with ChatGPT and commented on the cost of using GPT tokens. - Study of Knowledge:
@Rock
shared that they have been studying the use ofKnowledge
with GPT models and shared their findings. They mentioned that knowledge usage has a 2 million token limit and inference from GPT results can be frustrating.@cat.hemlock
also added that asking GPT to draw from more than one knowledge file at the same time was a challenge they were yet to find a workaround for. - Links of interest:
@eskcanta
shared a ChatGPT prompt example link : link, and@cat.hemlock
shared a study link on Guidance Teacher: link
Nous Research AI Discord Summary
- Discussions about the potential and the capabilities of various AI Models. In-depth conversations happened around BAAI’s Emu2 model and UNA Solar’s yet-to-be-released model. The community also witnessed the launch of OpenDalle, a new AI model by
datarevised
. - Detailed conversation on improving AI model performance. Strategies such as the application of the “Starling” method on OpenChat models and the merger of multiple Mistral LoRAs were proposed.
- Social Media Contact:
@pogpunk
asked@764914453731868685
if they have Twitter.@maxwellandrews
responded, confirming their Twitter handle as madmaxbr5. - Resource sharing and recommendations were common, with links ranging from interviews with researchers like Tri Dao and Michael Poli, to promotional offers such as free Discord Nitro, and API related resources like CursorAI.
- Several queries emerged related to AI, including inquiries embracing tool for ambiguous image prompts, systematic prompt engineering, and improving latency efficiency in NLP. Conversation around the prospects of locally running and fine-tuning language models also ensued.
- Free the Compute statement was raised by
@plbjt
without any divulged context.
Nous Research AI Channel Summaries
▷ #ctx-length-research (2 messages):
- Social Media Contact:
@pogpunk
asked@764914453731868685
if they have Twitter.@maxwellandrews
confirmed they have a Twitter account with the handle madmaxbr5.
▷ #off-topic (3 messages):
- Image Prompt Tool Inquiry:
@elmsfeuer
asked if there is a tool that allows image prompts to be ambiguous, permitting different interpretations at different optical resolutions (for example, viewing an image as a tree at low resolution, but seeing a scuba diver at high resolution). - Free the Compute:
@plbjt
expressed a brief statement: “free the compute”. The context and meaning behind this statement were not provided in the message history.
▷ #interesting-links (55 messages🔥🔥):
- Interview sharing:
@atgctg
shared a link to an interview with researchers Michael Poli and Tri Dao, pointing out the value of getting firsthand insights from researchers in the AI field. - Emu2 model discussion: A number of members including
@yorth_night
and@coffeebean6887
had a lengthy conversation around the performance of the BAAI’s Emu2 multmodal model, discussing its capabilities, limitations, and potential uses. - Free Discord Nitro offer:
@jockeyjoe
shared a link to a promotion for a free month of Discord Nitro for Opera GX browser users, though it sparked a lively debate over its legitimacy. - Subscription vs API key:
@.beowulfbr
and@night_w0lf
discussed the benefits and drawbacks of using a subscription service like CursorAI over directly using API keys, with the latter user recommending DIY alternatives like using open source UI’s such as the tools found on GitHub and Unsaged. - Documentation recommending:
@night_w0lf
recommended reading the Emu2 model’s documentation, despite it initially seeming unappealing, and offered insight into how to use it.
▷ #general (128 messages🔥🔥):
- Anticipation for UNA’s Performance: Users discussed anticipation for UNA’s (“UNA Solar”) performance, where
n8programs
,nonameusr
, andyorth_night
mentioned numerical expectations and feedback on preliminary test results. - Discussion on Model Merging with LoRAs:
ldj
andcarsonpoole
discussed the idea of merging multiple Mistral LoRAs.ldj
suggested saving the weight differences between every finetuned Mistral model and base model as a “delta”, and then merging those deltas. He raised concerns about potential loss of information incarsonpoole's
method of converting full fine tunes into LoRAs before merging. - OpenChat Model Testing and Improvement Suggestions:
.beowulfbr
sought advice after failing to improve OpenChat Model’s performance using his own config and datasets (one of which belonged totokenbender
).tokenbender
advised him to apply the “Starling” method on the new OpenChat model due to its previous success. - Launch of OpenDalle by DataRevised:
datarevised
announced his new model, OpenDalle, which he developed by applying his custom slerp method to SDXL. He requested feedback and shared two versions of the model (v1.0 and v1.1) on HuggingFace (OpenDalle and OpenDalleV1.1). - Anticipation for Multi-modal AI:
ldj
andyorth_night
discussed the future of LLMs in the context of multi-modality.ldj
expressed excitement for end-to-end multi-modal audio AI, suggesting it could surpas image-based multi-modal AI in terms of significance. The idea of image-based AI aiding in design tasks was also entertained.
▷ #ask-about-llms (9 messages🔥):
- Systematic Prompt Engineering:
@flow_love
asked if anyone has done systematic prompt engineering before and if there are any benchmarks or libraries for that. - Running Language Models Locally:
@leuyann
inquired whether running language models locally also meant being able to fine-tune them. - Fine-Tuning and QLoRA:
@atgctg
mentioned that fine-tuning is compute intensive and introduced QLoRA (Quantum Language of Resonating Actions), which can be run on consumer graphics cards. A relevant Reddit post was shared about the topic. - Latency Efficiency in NLP:
@pogpunk
asked if there was a more latency efficient way for NLP in building their search product.
Mistral Discord Summary
- Integrating Autogen and updated chat templates, with discussions led by
@cyb3rward0g
about implementations, possibilities, and challenges (HuggingFace Blog Post). - Differentiating between Mixtral-8x7B-v0.1 and Mixtral-8x7B-Instruct-v0.1, and their specialized use-cases clarified by
@netapy
and@cuzitzso
. - Benchmarking and comparing models performances dissected by
@asgnosi
, pointing out roles of fine-tuning and expected performance for GPT-4. - Addressing the rate and limits in the Mistral API inquired by
@michaelwechner
and potential solutions for maximum utilization. - Discussion about the implementation of stopwords in Mistral, including the alternative usage of the [END] token.
- There have been several conversations surrounding GPU requirements for model training and memory needs as shared by
@dutchellie
and@nootums
. A notable mention was the request for benchmark metrics for Mistral 7B v2 Model. - Detailed discussion on the performance of Mixtral on different systems, notably Apple M2 Max system,
@sublimatorniq
questioned the potential improvement on prompt processing stage; while@Epoc_
(herp/derp) shared specific performance details on their Windows system. - Mistral API queries persisted in the #finetuning channel, mostly driven by
@expectopatronum6269
and@lerela
. The focus was mainly about rate limits, the context window, the time-out limit, and guidance on API parameters. - Finetuning concerns and techniques also poured over into the
#ref-implem
channel; involving finetuning with qlora (@mjak
), confusion about the implementation process and necessary components (@duck
), and utilizing selected models from HuggingFace (@daain
). - The
#la-plateforme
channel focused on tackling API rate limit issues, with@robhaisfield
,@d012394
and@lerela
discussing possibility of miscalculation in token output and the subsequent investigation by Mistral staff. #showcase
channel featured Mistral Playground by@bam4d
, although without further details or context.
Mistral Channel Summaries
▷ #general (35 messages🔥):
- Mistral API with Autogen and Updated Chat Templates:
@cyb3rward0g
discussed using specific implementations of Mistral and linked blog post on HuggingFace, while considering updates to chat templates to include the “SYSTEM” role. Cyb3rward0g also sought advice on whether this updated chat template would be feasible for agents that require a “SYSTEM” prompt during their creation. - Difference between Mixtral-8x7B-v0.1 and Mixtral-8x7B-Instruct-v0.1: Regarding a question from
@hasanurrahevy
,@netapy
and@cuzitzso
clarified the distinction. The instruction-tuned models were described as being finetuned for analyzing a given instruction and formulating a proper response. - Benchmarking and Model Comparison:
@asgnosi
shared observations about the performance of different models on various tests (killers question and wizards question). Further discussions highlighted the role of fine-tuning and the performance of GPT-4. - Rate and Limits in Mistral API:
@michaelwechner
referenced the Mistral API documentation talking about the rate limit of 2M tokens per minute, acknowledging the potential for parallelizing requests to fully utilize this rate limit. Other users affirmed this solution. - Stopword Implementation in Mistral:
@tomaspsenicka
raised a query about using stopwords in Mistral and discussed using an [END] token after each message as an alternative approach. - Acquiring API Key:
@harvey_77132
inquired about the process of getting an API key and getting in touch with the customer success team.@brokearsebillionaire
provided the link to the Mistral console where the keys are typically found. - Error with Autogen:
@aha20395
reported an error when using Autogen.@brokearsebillionaire
provided some insights into the matter and suggested leveraging LiteLLM translator for Mistral API calls, linking to the relevant documentation. - Performance of Mixtral Instruct: Finally,
@ldj
reported that Mixtral Instruct was outperforming other models including Claude 2.1, Gemini Pro, and all versions of GPT-3.5-turbo based on human preferences rated through the LMSys Arena.
▷ #models (2 messages):
- GPU Requirements for Model Training:
@dutchellie
suggested that 24GB of VRAM may not be sufficient for model training and recommended looking into second-hand Nvidia P40’s, as they are cost effective and have 24GB of VRAM. - Request for Benchmark Metrics for Mistral 7B v2 Model:
@nootums
inquired about the availability of benchmark metrics to compare the performance of v1 and v2 of the Mistral 7B instruct model as they’re considering upgrading their self-hosted v1 model to v2.
▷ #deployment (5 messages):
- Mixtral Performance on Different Systems:
@sublimatorniq
shared performance details of Mixtral on different systems and they remarked that the system’s performance seems sluggish during the prompt processing stage, especially on their Apple M2 Max system. They mention that “The eval rate I'm getting is certainly fast enough. Just hoping the former (prompt eval rate) can be improved!
”. - Mistral Performance Metrics:
@Epoc_
(herp/derp) disclosed that “On my system, windows, LM Studio, Mixtral 8x7B Q8_0 uses 47.5GB VRAM and 50.5GB RAM, runs ~17t/s. 13900k, 64GB DDR4, A6000 48GB”. - Mixtral Performance Inquiry on Jetson Orin 64B:
@romillyc
asked if anyone is using Mixtral 8x7B Q4_K_M or Q6-K on a Jetson Orin 64B, mentioning that while “llama.cpp runs fine on smaller Jetsons”, their Jetson Xavier’s 16GB seemed to be a limiting factor. @derpsteb
seemed to begin a question or discussion, but it wasn’t completed in the provided conversation.
▷ #ref-implem (4 messages):
- Finetuning Mistral with narrative data:
@mjak
mentioned they are trying to finetune Mistral with narrative data using qlora. They added that they are uncertain if all data should be formatted in QA-pairs. - Reference Implementation and Deployment Steps:
@duck
asked if the reference implementation relies on deployment steps mentioned in the repo’s README. They expressed confusion whether the Python script interacts with a container service and were looking into running without the use of lmstudio and olama among others. - Cherry picking models from HuggingFace:
@daain
shared that they have been picking relevant models from HuggingFace such as Dolphin 2.1, Open Hermes 2.5, Neural Chat, etc, instead of fine-tuning themselves.
▷ #finetuning (4 messages):
- Max API Time-out Query: User
@expectopatronum6269
shared their plan to scale their newly built app powered by Mistral API for 10,000 requests in 1 hour. They requested more details on the maximum API time-out, the maximum context window, and the rate limit of requests when using Mistral medium. - Guidance on Mistral API Parameters:
@lerela
responded to the query, describing the context window(32k)
and rate limits(2M tokens/minute)
as outlined in the documentation. The timeout was mentioned to be comprehensive. The user was also urged to setmax_tokens
and leverage response headers to track token usage due to the lack of an API for this purpose. - Inquiry about System Prompt and Chat Template:
@caseus_
asked for advice on implementing the system prompt in the chat template, providing a link to a specific tokenizer configuration as a reference. - Question Regarding Fine-tuning for Function Calling:
@krissayrose
queried if anyone had performed fine-tuning for function calling.
▷ #showcase (1 messages):
bam4d: Mistral Playground
▷ #la-plateforme (14 messages🔥):
- API Rate Limit Issues: Multiple users including
@robhaisfield
and@d012394
are experiencing rate limit errors even though their token usage according to their dashboards doesn’t seem close to the 2 million tokens per minute limit.@robhaisfield
speculated that the issue might be due to how token output for the rate limiter is calculated (video showing the problem). - Investigation by Mistral Staff:
@lerela
requested affected users to DM their Profile IDs for further investigation and later announced that they had pushed changes to increase the reliability of the API. - Litellm Support for la-platforme:
@brokearsebillionaire
inquired about support for la-platforme in litellm, which@ved_ikke
affirmed by sharing litellm’s documentation on Mistral AI API.@brokearsebillionaire
later confirmed success in getting it to work.
OpenAccess AI Collective (axolotl) Discord Summary
- Active discussions on fine-tuning models occurred across channels, covering aspects such as model performance at shorter context lengths (raised by
@noobmaster29
) and procedures for personal chat data (raised by@enima
). Further,@yamashi
shared their plans to train Mixtral Moe on specific data. - The Huggingface/transformers PR was a key discussion point, concerning its potential impact on code adjustment, the need to use Flash Attention 2 effectively, and the support for LoftQ in PEFT. Direct links to the pull request and the LoftQ arXiv paper were shared.
- A new multimodal model was introduced by
@nanobitz
, linking to the respective source. Also, a resource related to Half-Quadratic Quantization (HQQ) was discussed but lacked substantial user feedback. - A range of technical questions about different methodologies and tools were raised, including the use of LLama.cpp internals, tokenizing Turkish text, the processing of LLM inferences and sliding windows for training with axolotl.
- Several suggestions for Axolotl feature improvements and future projects were provided. Notably, users discussed the incorporation of prompt gisting and the addition of chat templates to the tokenizer after fine-tuning. Ongoing experiments such as freezing *.gate.weight were mentioned with results expected to be shared soon.
OpenAccess AI Collective (axolotl) Channel Summaries
▷ #general (24 messages🔥):
- Finetuning Model with Shorter Context Length: User
@noobmaster29
raised a question about the effects of finetuning a model at a shorter context length than the base model.@nanobitz
conjectured the model would still work at the full length but might not perform as well. - Largest Model Sequence Length:
@noobmaster29
further asked about tuningMistral
at4096
instead of its max length of8192
, to which@nanobitz
reassured it would be totally fine. - New Multimodal Model Resource:
@nanobitz
shared a link to a new multimodal model developed by Beijing Academy of Artificial Intelligence, Tsinghua University, and Peking University. - Training Mixtral Moe on Specific Data:
@yamashi
mentioned planning to train aMixtral Moe
on their data in January, hoping for an 85% onmedqa
with possibly 90% by embedding answers in the prompt. - Half-Quadratic Quantization (HQQ) Resource:
@dangfutures
asked if anyone had used the Official implementation of Half-Quadratic Quantization (HQQ), however no feedback was provided. - Prompt Gisting within Axolotl: User
@lightningralf
raised a point on incorporating prompt gisting withinAxolotl
, to which@caseus_
replied that while potentially useful, it would likely be slow to train due to how attention masking operates for token gisting. - Adding Chat Template to Tokenizer After Finetuning:
@touristc
asked ifaxolotl
is able to add chat templates to the tokenizer after finetuning, a feature@caseus_
agreed would be quite beneficial.
▷ #axolotl-dev (13 messages🔥):
-
Impact of the huggingface/transformers PR: User
@nanobitz
alerted users about a new pull request on the huggingface/transformers GitHub. This PR fixes the FA2 integration, and@nanobitz
pointed out possible need for adjustment in their code.@caseus_
acknowledged the potential impact of this change and plans to address it. -
Use of Flash Attention 2: In the context of the mentioned PR,
@nanobitz
provided recommendations on how to use Flash Attention 2, mainly suggesting not to passtorch_dtype
to thefrom_pretrained
class method when using Flash Attention 2 and ensuring the use of Automatic Mixed-Precision training. -
Adding support for LoftQ in PEFT: User
@nruaif
brought up support for LoftQ, a quantization method that improves Large Language Models (LLMs) fine-tuning.@nruaif
stated that LoftQ has been supported in PEFT since version 0.7, providing a link to the respective arXiv paper. -
Usage of LoftQ with LoRA:
@caseus_
hinted at the straightforward use of LoftQ with LoRa, providing an example code snippet. -
Ongoing Experiment:
@theobjectivedad
asked@faldore
about any significant observations after freezing *.gate.weight.@faldore
stated it’s too soon to share results but promised to provide them on the following day.
▷ #general-help (22 messages🔥):
- Fine-tuning Models: User
@enima
is considering fine-tuning a model with personal chat data, similar to the approach used in the Minotaur/Mistral-Dolphin projects. They acknowledge the need to rework their dataset on a small sample scale. - Help with llama.cpp Internals:
@_awill
is seeking help from anyone familiar with llama.cpp internals for a discussion. - Tokenizing Turkish Text: User
@emperor
encountered an issue while training a tokenizer for Turkish text using HF Tokenizers. Despite using a Turkish dataset with no Chinese text, the resulting vocabulary had an overwhelming number of Chinese characters. The issue reduced significantly when all Chinese characters and emojis were aggressively filtered out from a different dataset. The user still questions why less than 1% non-Turkish characters influenced the tokenizer to this extent. - LLM Inference Processing:
@JK$
queried about the processing approach of LLM inferences. According to@nruaif
, both parallel processing (speeds up the process but requires more VRAM) and a queue approach can be used, though the latter will have greater latency. - Sliding Windows for Training:
@marijnfs
inquired if axolotl supports sliding windows for training, to which@le_mess
replied affirmatively for Mistral. On asking why this feature is not a standard for all LLMs,@le_mess
mentioned that only Mistral was trained with it. The option is enabled by default and there might not be an option to disable it.
Latent Space Discord Summary
- Engaging discussions on Probabilistic Programming, touching on managing fuzzy outputs and the parallel evolution of Lambda Learning Machines (LLMs) and DBs. Notable quotes include: “…challenges of probabilistic programming…”, ”…LLMs should be designed in areas where probabilistic outputs…”.
- In-depth conversation about the evolution of AI, with emphasis on the critical role of fine-tuning GPT-4 to create OpenAI Functions, and the predicted importance of context management patterns in future AI development.
- Thorough discussion on the potential functionality of LLMs involving the generation and validation of JSON according to a particular schema, and the preference over grammar constrained sampling methods for valid token sampling.
- Grammar Constrained Sampling reference to Perplexity.ai given by
@slono
for further learning. - Sharing of multiple resources related to AI developments, including the LangChainAI “State of AI” report, GPT engineer’s hosted service announcement, and Time magazine’s overview of major AI innovations from 2023.
- Announcement of a NeurIPS recap episode preview with the request for feedback, with the content found here.
Latent Space Channel Summaries
▷ #ai-general-chat (29 messages🔥):
- Probabilistic Programming Discussion:
@swizec
sparked a discussion on the challenges of probabilistic programming and reasoning about programs with fuzzy outputs, sharing a concern about the rapid stacking up of error bars leading to a chaotic system.@slono
responded that unlike the high reliability required from unpredictable distributed systems, LLMs should be designed in areas where probabilistic outputs and constrained probabilities perform well.@optimus9973
compared the development of LLMs with DBs, mentioning the expectation of a similar maturation process. - AI Evolution Conversation:
@optimus9973
emphasized the importance of fine-tuning GPT-4 to create OpenAI Functions, defining it as an underrated step forward in 2023, almost on par with RAG in the conceptual tool chain.@slono
predicted the significance of context management patterns in future developments. - Json Schema Discussion:
@optimus9973
proposed a future LLM functionality where upon requesting a JSON with a certain schema, the LLM repeatedly attempts generation until it fits validation and is ready for user consumption.@slono
mentioned the grammar constrained sampling as a preferred method, as it allows only valid token sampling. - Grammar Constrained Sampling Reference: On
@swizec
’s request for further reading on grammar constrained sampling,@slono
provided a link to Perplexity.ai. - AI Developments Sharing:
@swyxio
shared multiple links, including one to the LangChainAI “State of AI” report, another to GPT engineer’s hosted service announcement, and finally a link to Time magazine’s overview of major AI innovations from 2023.
▷ #ai-event-announcements (1 messages):
- NeurIPS Recap Episode Preview:
@swyxio
provided a preview of their first NeurIPS recap episode and is looking for feedback. The preview can be accessed here.
LangChain AI Discord Summary
- In-depth discussions on batch upsert using the from_documents class in vector stores. User
@kingkkaktus
sought guidance on the implementation. - Concentrated dialogue on backend image management, specifically ways to manage caches for user uploaded images on a web-app backend.
@tameflame
explored possibilities including random server folders, in-memory caches like Redis, and other possible efficient solutions. - User
@vivek_13452
encountered a Vectorstore error using theFAISS.from_documents()
method and looked for troubleshooting insights. - Instance of streaming limitation in the ChatVertexAI model mentioned by
@shrinitg
. The user highlighted this and proposed a solution in a pull request. - Advice solicited for the architecture of a chatbot capable of performing calculations on large datasets. The specific example given by
@shivam51
involved calculating instances of cotton shirts in a broad product catalogue. - Noteworthy use of ConversationBufferMemory by
@rodralez
, showcasing output definition’s capacity to facilitate playground-specific displays. - Exciting work presented by
@cosmicserendipity
on server-side running and testing of Web AI applications, offering a GitHub solution for comparing and testing new vs. old models in a standardized setup. GitHub Link. @shving90
posted a link to a ProductHunt page, AI4Fire. Nonetheless, no elaboration or context was provided within the message, making it difficult to derive its importance.@emrgnt_cmplxty
unveiled AgentSearch, an ambitious open-core project designed to deliver a major portion of human knowledge to LLM agents by embedding resources such as Wikipedia, Arxiv, a filtered common crawl and more. Users were encouraged to try the search engine at AgentSearch and check out additional details on this Twitter post.
LangChain AI Channel Summaries
▷ #general (19 messages🔥):
- Batch upsert with from_documents class: User
@kingkkaktus
asked how to use the from_documents class for batch upsert in vector stores. - Backend Image Management:
@tameflame
was looking for the best way to manage a cache for user uploaded images on a web-app backend, and asked whether using a random server folder, an in-memory cache like Redis, or some other method would be the most efficient. - Vectorstore Error:
@vivek_13452
encountered an error while trying to use theFAISS.from_documents()
method withtexts
andembeddings
as parameters. They asked for help in understanding why they were getting aValueError: not enough values to unpack (expected 2, got 1)
. - Instance of Streaming Limitation in ChatVertexAI Model:
@shrinitg
reported that streaming is not currently supported in the ChatVertexAI model, and shared a pull request link that attempts to fix the issue. - Chatbot for Large Data Query:
@shivam51
sought advice on building an architecture for a chatbot that would be capable of performing calculations based on large datasets, such as determining how many shirts in a large product catalogue were made of cotton.
▷ #langserve (1 messages):
- Use of ConversationBufferMemory with output definition: User
@rodralez
noted the usage ofConversationBufferMemory
with theoutput
definition that allows for output display only on the Playground through modifications inchain.py
as shown below.chain = agent().with_types(input_type=AgentInput) | (lambda x: x["output"])
▷ #share-your-work (3 messages):
- Web AI Testing on Server Side: User
@cosmicserendipity
shared an update regarding running and testing Web AI applications such as TensorFlow.js, Onnx Runtime Web in a headless manner leveraging NVIDIA T4 GPU. The solution involves running the applications in real Chrome browser via headless Chrome. This can aid in the testing and comparison of new web AI models against older ones in a standardized server environment. The user has shared the GitHub link here. - AI4Fire: User
@shving90
shared a link to a ProductHunt page, AI4Fire. However, no additional context or discussion was provided in the message. - AgentSearch - Knowledge Accessibility for LLM Agents: User
@emrgnt_cmplxty
introduced AgentSearch, an open-core effort to make humanity’s knowledge accessible for LLM agents. The user has embedded all of Wikipedia, Arxiv, filtered common crawl, and more - totaling over 1 billion embedding vectors. The search can be tried here. More details can be found in this Twitter post.
Alignment Lab AI Discord Summary
- Announcement about the alpha launch of Text-to-CAD by
@entropi
, a new technology that allows conversion from text to CAD models, rather than the more common text-to-3D models. It was shared within the abovementioned URL. - Introduction of OpenPipe, a “fully-managed, fine-tuning platform for developers” as shared by
@entropi
. The platform reportedly has saved its users over $2m and has the Mistral 7B as its recommended model since its release. Further details can be obtained from OpenPipe. @entropi
also revealed that OpenPipe is based on a combination of Open Hermes 2.5 and Intel’s SlimOrca based Neural-Chat-v3-3 technologies.- User
@emrgnt_cmplxty
shared the release of their project AgentSearch, an open-core effort to curate humanity’s knowledge for LLM agents which includes databases from all over the internet. A total of more than 1 billion embedding vectors is apparently available at search.sciphi.ai, as mentioned in this tweet. @neverendingtoast
posed a question regarding how data for vector search is segmented in the AgentSearch project, but no response was included in the overview.- An inquiry made by
@imonenext
asking if anyone in the guild knows people from Megatron or Pytorch. @neverendingtoast
asked for pointers for a good repository to experiment model merges.
Alignment Lab AI Channel Summaries
▷ #ai-and-ml-discussion (3 messages):
- Introduction of Text-to-CAD: User
@entropi
shares about the alpha launch of Text-to-CAD, an innovation enabling the conversion of text to CAD models as opposed to the conventional text-to-3D models used predominantly for gaming assets. - Fine-tuning Platform - OpenPipe:
@entropi
introduces OpenPipe, a fully-managed, fine-tuning platform for developers, that has saved its users over $2m. The Mistral 7B model has been the recommended model since its release in September. - Open Hermes 2.5 & Intel’s SlimOrca Based Neural-Chat-v3-3 Merge:
@entropi
comments that the platform is built on top of a merge of Open Hermes 2.5 and Intel’s SlimOrca based Neural-Chat-v3-3.
▷ #general-chat (2 messages):
- AgentSearch Project Release: User
@emrgnt_cmplxty
shared about a project they’ve been working on, AgentSearch, an open-core effort to embed humanity’s knowledge for LLM agents which includes all of Wikipedia, Arxiv, filtered common crawl and more. The project has resulted in over 1 billion embedding vectors, available at search.sciphi.ai - as cited through their tweet. - Inquiry on Data Segmentation: User
@neverendingtoast
inquired about how the data for vector search is being segmented in the AgentSearch project.
▷ #oo (2 messages):
@imonenext
asked if anyone in the chat is acquainted with people from Megatron or Pytorch.@neverendingtoast
requested recommendations for a good repository to conduct model merges, expressing an interest in experimenting with them.
Skunkworks AI Discord Summary
- Feedback & Code in Instruct Format: User
@far_el
shares their appreciation for constructive feedback and acknowledges a significant amount of code using an instruct format was present in #general. - Successful Model Utilization: User
@far_el
in #general expresses satisfaction about their AI model effectively catering to a user’s specialized application. - User
lightningralf
inquires about trying ‘prompt gisting’ within the group in #finetune-experts.
Skunkworks AI Channel Summaries
▷ #general (3 messages):
- Feedback & Code in Instruct Format:
@far_el
expresses gratitude for feedback received and notes the presence of a significant amount of code using an instruct format. - Successful Model Utilization:
@far_el
expresses happiness that their AI model worked effectively for a user’s specific use case.
▷ #finetune-experts (1 messages):
lightningralf: Has anybody tried to do prompt gisting in this group?
LLM Perf Enthusiasts AI Discord Summary
- Discussion regarding the Exploration of Retrieval Strategy, with
@daymanfan
sharing their experience about improved response quality despite facing similar issues. - Dialogue on Prompt Functionality across Models, as
@dongdong0755
questioned the consistency of same prompt results across different models, wondering about potential variations.
LLM Perf Enthusiasts AI Channel Summaries
▷ #openai (1 messages):
- Exploration of Retrieval Strategy:
@daymanfan
questioned if anyone was exploring their own retrieval strategy, indicating that they encountered a similar issue, however, the response quality was superior to other options.
▷ #prompting (1 messages):
- Prompt Functionality across Models: User
@dongdong0755
raised a query regarding the performance consistency of the same prompt across different models. The user wondered if variations in prompt outcomes might be noticeable.
DiscoResearch Discord Summary
- Conversation about model differences, specifically focusing on the potential causes.
@calytrix
suggested that changes in router layers could be a factor, and recommended a two-step fine-tuning process with varying parameters for the router layers. - Request by
@datarevised
for feedback on their OpenDalle model, which includes a custom Slerp method applied to SDXL. The user is receptive to both positive and negative comments.
DiscoResearch Channel Summaries
▷ #mixtral_implementation (1 messages):
- Possible Causes for Model Differences:
@calytrix
posits the differences seen in recent models could be due to factors that weren’t present in earlier versions, with the router layers singled out as a probable cause. They suggest a two-stage fine-tuning process where the second stage fine-tunes the router layers with different parameters.
▷ #general (1 messages):
- OpenDalle Model Feedback Request:
@datarevised
requested feedback on the OpenDalle model they created using a custom slerp method applied to SDXL. The user welcomed both positive and negative critiques.
MLOps @Chipro Discord Summary
Only 1 channel had activity, so no need to summarize…
- True ML Talks Episode - Deploying ML and GenAI models at Twilio: User
@Nikunjb
discussed an episode of True ML Talks with Pruthvi, a Staff Data Scientist at Twilio. Topics included the X-GPT concept, Twilio’s efforts to enhance Rack flow, and the different models Twilio is developing beyond GenAI. - The discussion also touched on the intricacies of various embeddings used for the vector database, and how Twilio manages Open AI rate limits.
- The episode was praised for its insightful coverage of different aspects of Machine Learning and its infrastructure within the Twilio ecosystem.
- Link to the episode: YouTube - Deploying ML and GenAI models at Twilio
AI Engineer Foundation Discord Summary
Only 1 channel had activity, so no need to summarize…
- Introduction and Interest in ML & AI: User
@paradoxical_traveller09
introduced himself and expressed an interest in connecting with other users who are passionate about Machine Learning (ML) and Artificial Intelligence (AI). They are open to discussing topics focused on ML.
The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.