12/8/2023 - Mamba v Mistral v Hyena

Happy Friday. 3 new models are the talk of the town today:

Mistral’s new 8x7B MoE model (aka “Mixtral”) - a classical attention model, done well. Andrej’s recap here.
Mamba models, a range of models up to 3B by (former guest) Tri Dao of Together
StripedHyena 7B - a descendant of the subquadratic attention replacement Hyena out of Stanford’s Hazy Research lab released earlier this year, that is finally competitive with Llama-2, Yi, and Mistral 7B.

This is all very substantial and shows what happens when you ship model weights instead of heavily edited marketing videos.

[TOC]

Nous Research AI Discord Summary

In-depth discussion around AI models such as Anthropic’s Claude 2.1, including the exploration of its prompting technique, evaluation of AI models as a “needle in haystack” scenario and the workaround for alignment approach needed for Claude 2.1. Plans for a meeting between users @maxwellandrews and @raddka were also discussed.
Dialogue about the applications and hacking of Dreambooth for image training, along with a request for assistance in this process. Announcement of Gemini, a new AI from Google believed to be potentially superior to GPT-4 was shared.
Sharing of AI-related tweets, resources, and models such as the DialogRPT-human-vs-machine model on Hugging Face, which predicts if a response seems more likely to be from a human or a machine. A Colab Notebook Demo was provided for practical engagement with the model.
Numerous factors raised during conversation about AI model building and maintenance, including data extraction, model relases especially the “cybertron 7b v2 GGUF” and Mistral 8x7B MoE models, chatbot model StripedHyena-Nous-7B, open-source plan for the StripedHyena’s training code and other topics such as GPU requirements for running the Mistral 8x7B MoE model, evaluation of memory requirements and debugging of model inference.
Various issues and possibilities regarding the deployment of Large Language Models (LLMs). Points included deployment of LLM models on CPUs, setup of such projects, potential of LLM for exam marking and grading, and also hosting of embedding models. Reference to resources like Falcon 180B initial CPU performance numbers and Transformer Batching were provided.
User engagement in the memes channel, with @nonameusr sharing a Twitter link and expressing a humorous confusion towards the content of the linked post.

Nous Research AI Channel Summaries

▷ #ctx-length-research (8 messages🔥):

Discussion about Anthropic’s Claude 2.1: @if_a shares a link to the Anthropic Claude 2.1’s prompting technique study, highlighting how Claude 2.1 well it recalls information across its 200,000 token context window. However, @if_a also considers that testing the long context capability will require more work on prompt engineering.
Evaluating AI Models: @gabriel_syme comments on the evaluation of AI models, specifically criticizing such evaluations as a “needle in haystack” scenario which may not provide accurate performance measures.
Alignment Methods: @intervitens finds it interesting that the team at Anthropic had to use a workaround to bypass their own alignment approach for Claude 2.1. @raddka sees it as an inevitable development given the stringent compliance restrictions that may have hampered the development of the model.
Upcoming Meeting: @maxwellandrews and @raddka plan a meeting for the following week, with the tentative meeting time proposed to be between 12-5 PM. Further details are to be finalized later.

▷ #off-topic (8 messages🔥):

Dreambooth Discussion: @gezegen and @wlrd discussed about the usage of Dreambooth in image training and the possible reason for the time it takes to get back results.
Hacking into Dreambooth: @yorth_night shared their experience of hacking into Dreambooth for the past week to conduct long prompt training, noting some complex aspects of the code.
Request for Assistance: @gezegen suggested @yorth_night to share their process of hacking into Dreambooth, offering potential help.
Gemini Announcement: @pradeep1148 shared a YouTube video announcing Gemini, a new AI from Google that’s potentially better than GPT-4, with features for multimodal reasoning across text, images, video, audio, and code.
General Interaction: @gabriel_syme greeted the channel with a good morning message.

▷ #benchmarks-log (1 messages):

nonameusr: https://huggingface.co/allenai/tulu-2-dpo-70b

▷ #interesting-links (5 messages):

Sharing AI-related tweets: @yorth_night and @nonameusr shared tweets and tweets on AI responsible tech and CAIS theory.
DialogRPT-human-vs-machine: @euclaise posted a link to the DialogRPT-human-vs-machine model on Hugging Face, which aims to predict if a response is more likely to come from a human or a machine. They included a Colab Notebook Demo for users to interact with the model.

▷ #general (375 messages🔥🔥):

Discussion amongst developers on developing a similar pipeline for extracting data from GitHub repositories and .tex files. User @zakkor mentioned improving the output with additional LLM (Language Learning Model) filtering pass. User @wlrd showed interest in collaborating on this effort and brought up a related ongoing discussion thread. Link to discussion thread
User @nonameusr announced the release of the “cybertron 7b v2 GGUF” model and that it performs close to the “yi-34b” model. Testing of the model was also discussed.
Notable discussion on the new model Mistral 8x7B MoE, with users having queries about its memory requirement, how to run inference, and discussion on pc hardware required for running this model.
Chatbot model StripedHyena-Nous-7B was announced by the user @weyaxi and caused a discussion among users, leading to @theemozilla explaining the model’s architecture, development, and shared plans for the model’s future iterations.
Users discussed posting Models from Nous Research on Huggingface, with @nonameusr discussing “una-xaberius-34b-v1beta” as the potential best model and referring to “xaberius” explicitly.
User @theemozilla mentioned plans to open-source their custom training code upon approval from Together Research. The training code is related to the StripedHyena-Nous-7B (SH-N 7B) model.
Further conversation about the MoE (Mixture-of-Experts) distribution and memory requirements of running the Mistral 8x7B MoE model, concluding with @bjoernp suggesting that 2 A100s in 80GB seem to be enough.
Users talked about the decoding methods with @gabriel_syme referring to a Twitter link throwing light on recent advances and new decoding methods. A discussion ensued about the compatibility and the need for these methods.
Gradio image inference errors with OpenHermes were discussed. After running into server errors, @qnguyen3 suggested using the code in Github and editing the llava/model/multimodal_encoder/builder.py file to include ‘ikala’.
User @euclaise shared his experience with Echo-3B, a model he created for RP/creative tasks. Some technical issues causing the model to output nonsense were mentioned.
Gemini API for generating datasets was brought up by @jaredquek, followed by a discussion about the potential repercussions of using it due to Google and OpenAI account termination policies.
User @raddka commented on the newly released Mistral 8x7B MoE model, speculating it as a candidate for developing lightweight models to improve Language Learning Model (LLM) shortcoming and suggesting the creation of one specific for coding to evaluate/improve its answers.
Issues with openhermes vision server startup were discussed and resolved with the help of @qnguyen3.
Members briefly mentioned the plan for training Mixtral MoE on dialogue datasets and compared the model’s benchmark performance. They also speculated on how the benchmarking could potentially be biased since the posted values were too extreme to be deemed legitimate.
Users discussed possible_gpu setups for the Mistral 8x7B MoE model, including debating over collecting multiple Nvidia RTX 3090 cards, the necessity of PCIe lanes, and how to optimally cool the GPUs.
StripedHyena-Nous-7B’s evaluation utility was highly praised by users, citing its reduced memory footprint and better performance. Canvassing for its integration with various APIs and deployment platforms were also discussed.
Continuous Debugging and Discussions around running Mixtral model's inference and the issues surrounding it. The need for its codebase and users’ efforts to run the inference in various ways were also highlighted.

▷ #ask-about-llms (80 messages🔥🔥):

LLMs (large language models) and CPUs: @.beowulfbr initiated a conversation about deploying a 7B or 70B LLM model purely on CPUs. @raddka mentioned it would be incredibly slow. @coffeebean6887 and @decruz reiterated that commercially viable LLMs would require a GPU and also discussed the advantages of batched infra on GPUs. They referred to Falcon 180B initial CPU performance numbers and Transformer Batching for relevant discussions.
Project Setup for LLMs: @akhxl discussed troubleshooting issues with Litellm and Ollama getting a 404 error. This was resolved by ignoring the api_base line.
LLMs Learning Capability: A substantial debate was initiated by @akhxl around if and how an LLM could understand new information and topics it had not previously encountered in its training data. @adamsvoboda explained this could be achieved through RAG by incorporating the external information into the prompt as context. However, the user still had further queries about the LLM’s interpretation and contextual understanding, which were not fully addressed in the given conversation.
LLMs in Exam Marking: User @.___init___ suggested the potential use of LLMs in exam marking and grading, while @coffeebean6887 indicated that the math part might be complicated.
Hosting Embedding Models: @coco.py asked about the existence of tools to host embedding models similarly to how vLLM hosts language models. @decruz recommended open source solutions like gte-small, which could be hosted on an edge function. However, no specific information about hosting embedding models in an OpenAI API-compatible manner was provided.

▷ #memes (2 messages):

A user @nonameusr shared a Twitter link from Vega Holdings on the channel. Further, the user expressed confusion or surprise towards the content of the linked post with the comment: “wtf is this 😭”.

OpenAI Discord Summary

Cross-channel discussion highlighted the anticipation, skepticism, and discussions surrounding Google’s Gemini AI model and future versions of OpenAI’s GPT model. Users voiced their opinion about Google’s launch video for Gemini, and speculation is made about the groundwork OpenAI has already laid for versions up to GPT-7 (quotes by @DawidM and @aiden4729).
Users in various channels reported server issues, performance problems and communication challenges with OpenAI products like ChatGPT and GPT-4. This included messages disappearing, decreased performance, especially with strange conversation names, long wait times in getting support replies, payment system issues, and non-responsiveness of chatbots (quotes by @_steeven_, @maybedara, @dabuscusman, @mrcrack_, @seliathas, @bubbarob19, @ryanxcharles, @spider7124, @exobyt, and @solbus).
The community also discussed AI’s potential impact on scientific research and debated its capability to outcompete human researchers, especially in terms of mathematical discovery (conversation between @axiomaticself and @feltsteam0).
Discussion of VPN usage with OpenAI technologies emerged across the channels, with users stating their VPNs weren’t being detected or blocked and further clarifying that multiple users from the same IP could be filtered out (discussion by @youraveragedev, @satanhashtag, @lugui, and @olympusdev).
Numerous technical questions, suggestions, and clarifications were made across the channels, spanning topics such as GPT token limit, interactions among custom GPTs, the use of a VPN to access certain AI models, prompting ChatGPT for specific response formats, API usage for web search capabilities, the performance of GPT4 with extended token length, and extracting a DALLE prompt from an image among others.
Users shared and critiqued resources, notably @zelia13 prompted for feedback on an AI Knowledge Management Video and @iyioio shared a link to OpenAI’s function-calling document. Explorative prompts and their outputs were also shared through links.
Lastly, reminders about the OpenAI general conduct emerged, detailing incidences such as violation of posting rules and potential consequences.

OpenAI Channel Summaries

▷ #ai-discussions (92 messages🔥🔥):

Gemini vs GPT-4: Users expressed mixed feelings about the anticipated launch of Google’s Gemini AI model, with some excited about its potential while others expressed skepticism due to Google’s alleged misrepresentation in the demo video. @DawidM claims that the video was post edited to present the product in an overly positive manner, calling it a “shady marketing tactic”.
OpenAI Model Updates: There’s speculation about the release of future versions of OpenAI’s GPT model, with user @aiden4729 pointing out that OpenAI has already laid the groundwork for versions up to GPT-7 according to trademark filings. @thepitviper predicts the announcement of GPT-4.5 in competition with Google’s Gemini.
Concerns over AI in Scientific Research: User @axiomaticself expressed concerns about the potential for AI to outclass human researchers in the field of mathematics. @feltsteam0 provided reassurance, stating that we are still years away from AI fully automating scientific discovery and that likely the future lies in augmented human-AI research teams.
Feedback Request on AI Knowledge Management Video: User @zelia13 shared a video on knowledge management using AI for project managers and requested feedback on it.
Challenges with AI/Chatbot Access: Some users discussed difficulties in accessing certain AI and chatbot models due to regional restrictions. User @offline suggested using a VPN as a workaround.

▷ #openai-chatter (246 messages🔥🔥):

Server Issues with ChatGPT: Multiple users including @_steeven_, @maybedara, and @dabuscusman reported experiencing server issues with ChatGPT. User @_steeven_ mentioned messages disappearing upon hitting enter, and several users assumed server overload might have caused it. However, some users like @pruo and @dabuscusman later confirmed that they had successful access to ChatGPT.
ChatGPT Performance: @mrcrack_ expressed dissatisfaction with the current state of ChatGPT, mentioning a decrease in performance over the previous few months, specifically citing strange conversation names such as “Ebola virus and flu”.
VPN Usage: @youraveragedev asked whether OpenAI now permits VPN usage, as their VPNs weren’t being detected or blocked. This was confirmed by @satanhashtag and clarified by @lugui and @olympusdev who suggested that VPNs have always been allowed, but multiple users from the same IP could be filtered out.
OpenAI Support: User @seliathas expressed frustration with OpenAI’s support system, stating their communication wasn’t answered for a prolonged period. @elektronisade explained that depending on the topic, replies may take a substantial amount of time.
Product Availability and Usage: @Foufou flagged a translation error on the website, citing a miscommunication of chat restrictions in French. User @theyonatan asked for suggestions on making ChatGPT repaste complete code. Meanwhile, @goni0755 inquired about the reliability of the DALLE image generation API compared to others. Also, @zyrqlo and @thepitviper discussed about using “Bard” with “Gemini Pro”.

▷ #openai-questions (69 messages🔥🔥):

API usage and Web Search Capabilities: User @shikhar1209 inquired about the possibility of using web search with GPT-4 through the API, though no clear answer was provided.
Issues with GPT-4 Performance and Functionality: Several users including @spider7124, @exobyt, and @solbus reported experiencing errors and non-responsiveness with chatgpt. Additionally, user @yoruiopz expressed dissatisfaction with GPT-4’s declining performance and recurring browsing time-outs.
Fine-Tuning GPT Token Limit: @pearlyeti raised a query regarding the token limit per example for fine-tuning, specifically if it remained at 4096 for GPT-4, but no clear response was provided.
ChatGPT Misunderstandings and Miscommunications: Several users including @treytay and @maguiresfuture expressed concerns about misleading information or promises being given by ChatGPT during certain interactions.
Billing Issues and Subscription Problems: Users @bubbarob19 and @ryanxcharles reported experiencing continuous issues with the payment system on the platform, hindering access to services.

▷ #gpt-4-discussions (31 messages🔥):

Issues with Custom GPT Disappearance:
- Users @_interstitialism_, @a2jhagrhbm92awno, and @budadude reported the disappearance of their custom GPTs. The issue seemed to resolve on its own as @a2jhagrhbm92awno later reported that their GPTs came back.
Utilizing Images with GPT:
- @thermaltf asked if there exists a GPT that can generate a Dalle prompt from an uploaded image, but received no direct response.
- @demo_02817 needed help with image editing using OpenAI and shared code snippets, but no solution was provided in this chat history.
Interaction Among Custom GPTs:
- @moraj123 inquired about the possibility of creating interactions between custom GPTs, to which @rjkmelb responded it was not directly possible.
Clearing Conversation History:
- @sjjj. asked if there was a way to delete some chat conversations. @pietman revealed that there was no such feature at the moment and shared information about a helpful Chrome extension for this purpose.
Larger Token Length and Model Performance:
- @the_unbeatable_gusty_chatterbox raised a question about the performance of GPT4 with 128K token length, commenting on the lower quality outputs with longer inputs. No answer was given in the discussion history.
Setting Up Public GPTs:
- @serenejay asked if creating public GPTs is only possible for those who host them. @solbus clarified that it can be enabled by verifying a website or adding a billing name on the user’s Builder profile. There is no official central repository for GPTs at the moment, but one is expected next year.

▷ #prompt-engineering (11 messages🔥):

Prompting Assistant’s API for a Specific Response Format: User .jkyle queries about prompting the assistant’s api to provide a response formatted as a json for parsing and output. User iyioio suggests using functions and provides OpenAI’s function-calling document as a resource. .jkyle intends to use the model output as the actual desired output.
Discussion on Functions feature in the API: Among .jkyle and iyioio, there’s a discussion of understanding the functions feature. The dialogue involves how the assistant defines an input specification for an external function, ensuring output aligns with the format, and how the execution process occurs.
Interesting Prompts: Users mindfultatiana and eskcanta share and explore interesting prompts, including explaining concepts ‘as if to an AI’, ‘as if I were a fish’ or ‘as if I were a a library book of poetry’. They experiment with these prompts on the model, sharing the resulting output via links to OpenAI chat shares.
Directing AI’s Behavior: User clumsylulz experiments with directing the AI’s behavior by asking it to respond as if it were a human with schizoaffective disorder.

▷ #api-discussions (11 messages🔥):

API Prompting for JSON Outputs: User @jkyle asks for advice on prompting the assistant’s API to always provide a JSON response. User @iyioio suggests using the function feature of the API and describes how it works. @jkyle then interprets this functionality to mean that it provides an output object that can be fed to an external function, then returned to close the loop. @iyioio confirms that this is almost right, highlighting that the function call received from the assistant is just another message in the thread with special formatting.
Instructive Prompts: User @mindfultatiana shares an effective prompt: "Can you explain to me like I'm 5 years old...".
Interactive Prompts: User @eskcanta demonstrates the use of prompts to instruct the model to provide explanations in various perspectives such as ‘as if to an AI’, ‘as if I were a fish’ and ‘as if I were a a library book of poetry’, providing corresponding chat links for each.
Conditioned Prompts: User @clumsylulz proposes a conditional response scenario, instructing the model to act as if it were a human with schizoaffective disorder.
Links:

OpenAccess AI Collective (axolotl) Discord Summary

AI Models Training: Various AI models were discussed through users @noobmaster29, @le_mess, @nanobitz, @metaldragon01, @casper_ai, and @yamashi. Mamba’s sluggish performance compared to Mistral was highlighted and an evaluation problem with Mamba was identified. Interest was expressed in the announcement of a new model from Mistral. Users also compared the performance of Llama models and GPT-3.5.
AI in Different Languages: The need for effective AI models for different languages, including French and Dutch, was pointed out.
Unified Memory and Mac’s M Processor Advantage: The advantages of using Mac’s M processor and a unified memory system for AI inference and possibly training were discussed. A recent release by Apple demonstrating how to train models on their machines was mentioned.
Challenges in Model Deployment and Adjustment: A dialogue took place around the issues faced in deploying and adjusting AI models, particularly those from Mistral. It was hinted that users wouldn’t be able to finetune the new 70B model.
Mistral’s New Model Release: The release of the new Mistral model was discussed extensively, with speculations that the model could feature a Mixture of Experts (MoE). Transition steps to use the model, including converting the model to PyTorch and adding megablocks as a requirement, were mentioned.
Tokens’ Impact on Model Training: A discussion unfolded about whether a response with 120 tokens would have more influence compared to a response with 60 tokens during model training. It was suggested that tokens trained is probably a better metric for measuring the influence on the model.
DPO Fine-tuning Support: Questions were raised regarding the possibility of axolotl offering support for DPO fine-tuning. Evidence that DPO is possible was offered with a link to DPOpenHermes-7B on Hugging Face, along with detailed YAML configuration code.
Segmenting in User Input: Interest was shown in segmenting in the user input and the possibility of using the same process for segmenting retrieved docs was confirmed.
Multi-GPU Training Issue: Problems were reported with training on runpod using 8 A40 GPUs. The issue seems to be with finetuning where it hangs after the first eval pass showing 100% GPU utilization on all 8 GPUs. A mismatch between collective operations in PyTorch distributed was identified as a possible reason.
Relevant URLs: Various links were shared to aid discussions and provide resources:
Reminder on Guild Conduct: Members were reminded about guild conduct, with a note on repetitive advertising.

OpenAccess AI Collective (axolotl) Channel Summaries

▷ #general (179 messages🔥🔥):

Discussion on Training Various AI Models: Over the course of this conversation, several users, including @noobmaster29, @le_mess, @nanobitz, @metaldragon01, @casper_ai, and @yamashi, discussed the training of various artificial intelligence models such as Mamba and Mistral. It was pointed out by @caseus_ that Mamba seems to be slow compared to Mistral, and @nanobitz mentioned an evaluation issue with it. There was also a discussion on comparing the performance of Llama models and GPT-3.5. The participants expressed interest in the announcement of a new model from Mistral.
Application of AI in Different Languages: Various users discussed the application of AI models in different languages. @yamashi pointed out the need for an effective model to work in French and Dutch.
Advantages of Unified Memory and Mac’s M Processor for Training: Conversation between @yamashi and @noobmaster29 highlighted the benefits of using Mac’s M processor and unified memory system for AI inference and possibly even training. @yamashi mentioned a recent release by Apple demonstrating how to train models on their machines.
Challenges with Model Deployment and Adjustment: Several users, including @faldore, @yamashi and @noobmaster29, discussed the challenges they face in deploying and adjusting AI models, particularly those from Mistral. @faldore mentioned a conversation with a Mistral developer who indicated that users wouldn’t be able to finetune the new 70B model.
Relevant URLs:
- GitHub repository for a Production-ready Reinforcement Learning AI Agent Library by Meta https://github.com/facebookresearch/pearl
- Dataset on Huggingface that could potentially affect model rankings https://huggingface.co/Q-bert/Optimus-7B
- YouTube interview with Arthur Mensch about Mistral AI https://www.youtube.com/watch?v=auQBhg692Js
- Open-source UI chat interface to use with OpenAI backend https://github.com/huggingface/chat-ui
- Twitter post by Mistral AI announcing the release of a new model https://twitter.com/MistralAI/status/1733150512395038967
- GitHub repository to the Mistral Model https://huggingface.co/someone13574/mixtral-8x7b-32kseqlen/tree/main

▷ #axolotl-dev (56 messages🔥🔥):

New Mistral Model Release: The team discussed the release of a new Mistral model which was shared by @casper_ai ^Twitter link^. Members speculated that the model, named 8x7b, could feature Mixture of Experts (MoE).
Model Specifications and Download: @yamashi shared the model’s specifications denoting it had 32k sequence length and appeared to feature MoE. The download speed of the model varied with @casper_ai initially reporting a slow download speed of 239 kb/s while @yamashi managed to download at a speed of 11MB/s.
Conversion and Use of the Model: Multiple members, including @caseus_, discussed the possible steps needed to use the new model, including the conversion of the model to PyTorch and adding megablocks as a requirement. @bjoernp was in the process of working on a conversion script and invited anyone interested to help out by joining a call via his Discord.
Implementing Mixtral in Llama: @casper_ai shared a GitHub link to a forked Lama repo where someone had implemented Mixtral into it, potentially helpful for the conversion process.
Content Detailing the New Model: Various links to content detailing the new model were shared including a GitHub link by @_jp1_ and an arXiv link shared by @c.gato.

▷ #general-help (12 messages🔥):

Clarification on train_loss: @nanobitz explained that the single dot labeled as ‘train/loss’ represents the final train_loss, and it’s different from ‘train/train_loss’. The latter is presented as a graph during training.
Impact of tokens on model training: @c.gato had a discussion about whether a response with 120 tokens would have more influence compared to a response with 60 tokens during model training. @le_mess advised that tokens trained is probably a better metric for measuring the influence on the model.
Constant improvement: @c.gato also expressed an understanding of the need for continuous refinement to keep up with improvements in their model.

▷ #rlhf (3 messages):

Possibility of DPO Fine-tuning Support: User @josh.sematic.dev asked about the potential for axolotl to provide support for DPO fine-tuning.
Zhengler]: No Current Implementation: User @le_mess stated that the topic has been discussed many times but indicated there hasn’t been an implementation yet.
Evidence of DPO Capability: @noobmaster29 shared that it’s understood DPO is already possible and supported this with a link to DPOpenHermes-7B on Hugging Face, along with a detailed YAML configuration code.

▷ #community-showcase (2 messages):

Discussion on Segmenting: User @gabriel_syme showed interest in user input segmentation and speculated if the same process could be utilized for segmenting retrieved docs. In response, @le_mess affirmed this possibility.

▷ #runpod-help (10 messages🔥):

Multi-GPU Training Issue: User @vastolorde95 reported issues with training on runpod using 8 A40 GPUs. The issue appears to be with finetuning where it hangs after the first eval pass showing 100% GPU utilization on all 8 GPUs.
Error Analysis: A possible NCCL error was suspected by @vastolorde95. After activating detailed debug information with environment variables TORCH_DISTRIBUTED_DEBUG=DETAIL NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,P2P, a mismatch between collective operations in PyTorch distributed was identified: RuntimeError: Detected mismatch between collectives on ranks. Rank 0 is running collective: CollectiveFingerPrint(OpType=ALLGATHER, TensorShape=[4], TensorDtypes=Float, TensorDeviceTypes=TensorOptions(dtype=float (default), device=cuda, layout=Strided (default), requires_gr ad=false (default), pinned_memory=false (default), memory_format=(nullopt))), but Rank 2 is running collective: CollectiveFingerPrint(OpType=ALLGATHER).
Successful Single GPU Training: @vastolorde95 noted that the same setup works with a single H100 GPU, though it was too slow, indicating that the issue was mainly with the multi-GPU setup.
Checkpointing Lag: @casper_ai suggested that it could also be related to checkpointing being slow on 8 GPUs, possibly due to the machine having a slow disk speed.

▷ #docs (1 messages):

le_mess: Bro stop advertising this. I’ve said it before 😅

Latent Space Discord Summary

Jetbrains AI Assistant appreciation by @slono highlighting its consistent docstring style saving significant time.
Anticipation for Duet AI from Google gotten from @swyxio’s discussions about an upcoming technology release, referring to the release of Mistral 8x7B.
Importance of AI compression and introduction to Mixture of Experts (MoE) paradigm, instigated by links and a YouTube video shared by @coffeebean6887 on compressing the 1.6 trillion parameter SwitchTransformer-c2048 model.
Mention of the promising AI startup scene in France by @slono.
Debate comparing Cursor vs VS Code with varying opinions: tollbooth presenting an OpenAI API (@btdubbins), beneficial AI search functionality (@guardiang), VSCode made for enterprise users considering data security (@mitch3x3), and the effectiveness of such tools attributed to integration with codebases and the surrounding UI (@slono).
Announcement of a new podcast episode shared by @swyxio, sharing a Twitter link on the ai-event-announcement channel.
@swyxio’s share on Latent Space University Day 4 content about Image Generation via the link.
Detailed discussion on the impact of formatting on model training with a specific mention of the space sign impacting model’s output, brought up by @eugeneyan’s anecdote.
@slono’s elaboration on the role of whitespace in code and its influence on tokenization and learning. It also noted the significant differences in token counts due to varying use of whitespace.
@slono’s query and @eugeneyan’s clarification regarding the usage of [INST] context
@__chef__’s humorous remark about the complexity involved in training large models, speculating “How many parameters does it take to learn a space? Over 7 billion”.

Latent Space Channel Summaries

▷ #ai-general-chat (45 messages🔥):

Jetbrains AI Assistant: User @slono mentioned they were appreciating the Jetbrains AI chat assistant and highlighted that it had been fine-tuned to a consistent docstring style thus saving a significant amount of time.
AI Upcoming Technology Release and Current States: @swyxio discussed the release of Mistral 8x7B, giving an indirect hint about the release of Duet AI from Google, and referring to a tweet as an interesting artifact of modern data.
AI Compression and New Research Papers: User @coffeebean6887 shared few links related to compressing the 1.6 trillion parameter SwitchTransformer-c2048 model, along with a YouTube video about the Mixture of Experts (MoE) paradigm.
AI Startup Scene in France: User @slono mentioned considering moving back to France due to potentially interesting opportunities in the AI startup scene there.
Cursor vs VS Code Discussion: There was an ongoing debate on the benefits and drawbacks of Cursor versus VS Code. @btdubbins showed some skepticism towards cursor, feeling it was simply a tollbooth presenting an OpenAI API. Still, @guardiang found advantage in its AI search functionality and @mitch3x3 reinforced that MS makes VSCode for enterprise users, taking into account data security. @slono believed that the usefulness of such tools was more related to integration with codebases and the UI surrounding that.

▷ #ai-event-announcements (2 messages):

New Podcast Episode: @swyxio announced the release of a new episode by sharing a Twitter link.
Latent Space University Day 4: @swyxio shared the fourth day’s content about Image Generation from Latent Space University. The provided link covers how to leverage DALL-E API for image generation.

▷ #llm-paper-club (8 messages🔥):

Impact of Formatting on Model Training: @eugeneyan described a scenario where fine-tuning a 7B model to generate responses in a specific JSON format yielded unexpected results, with model outputs varying significantly depending on whether the starting training data had a space following the starting syntax (<s> [INST] versus <s>[INST]).
Discussion on the Role of Whitespace in Code: @slono indicated that whitespace in code (such as present in languages where whitespace is significant) can have a strong impact equivalent to key symbols such as {, potentially influencing tokenization and learning.
Token Counts and Whitespace: @slono also noted that varying the use of whitespace can lead to significant differences in token counts.
Confusion about “[INST]” Context: @slono asked for clarification on the [INST] context used by @eugeneyan, who explained that it is part of Mistral’s prompt format.
Learning Parameter Space: @__chef__ humorously pondered, “How many parameters does it take to learn a space? Over 7 billion”, hinting at the complexity and scale involved in training large models.

LangChain AI Discord Summary

There were discussions regarding technical difficulties with LangChain, including concerns about embedding models for JS/TS, issues with viewing old documentation, compatibility inquiries about Llama2 model with SQLDatabaseToolkit, web scraping and source retrieval questions, and installation problems with the langchain python package on Python versions 3.10 and below. A user also brought up worries about consistent document retrieval from Mivus standalone. Another user experienced cache issues with InMemoryCache() or SQLAlchemyCache in the Conversational Retrieval Chain. A solution for converting a Document to JSON was sought due to a serialization issue.
A lively exchange over database preferences was observed, with users expressing support for LanceDB.
In the share-your-work channel, one user, @marvinbraga, demonstrated his Voice Virtual Assistant on WhatsApp, as well as promoting a discount for his book and suggesting his GitHub repo.
Another user, @bigansh, announced the launch of version 2 for myGPTBrain, with a plethora of new features, an updated landing page, document parsers and the introduction of a subscription model. This was accompanied by a user guide on Loom and a public launch blog post. Feedback on potential new features and design suggestions were requested.

LangChain AI Channel Summaries

▷ #general (40 messages🔥):

Support for Embedding Models: User @jungle_jo was initially unsure about JS/TS support for embedding models. However, the user later updated their query saying that they’ve found langchain to have a lot of support for this.
Viewing Old Documentation Versions: User @ellen0609 inquired about how to view previous versions of langchain’s documentation. @.trouble_ offered to help the user via Direct Message.
Compatibility of Llama2 model with SQLDatabaseToolkit: User @daksana_40937 asked about the compatibility of Llama2 model with SQLDatabaseToolkit.
Web Scraping and Source Retrieval: User @abed7053 asked if there’s a way to perform web scraping with TypeScript/JavaScript in langchain. This user also couldn’t find how to retrieve source context in API response.
Langchain Installation Issues on Python v3.10 and Below: User @infinityexists. was having issues with the langchain python package not installing beyond version 0.0.27 on Python version 3.8.0. @quantumqueenxox suggested upgrading Python to 3.10 and above.
Consistent Document Retrieval Concerns: User @ranjith8249 was having issues about retrieving consistent documents from Mivus standalone using langchain conversational chain.
Cache Issue in Conversational Retrieval Chain: User @seththunder encountered an issue with InMemoryCache() or SQLAlchemyCache, stating that neither worked in storing the provided answers in the cache while using Conversational Retrieval Chain in LangChain.
Declared Support for LanceDB: User @hucki_rawen.io and @timcarambat discussed their preference for LanceDB, a solution they found preferable for their needs.
Converting Document to JSON: User @b0otable asked for help converting a Document to JSON, citing an issue with the error message “Object of type Document is not JSON serializable”.

Creation of Voice Virtual Assistant: @marvinbraga shared a YouTube video explaining how he created a voice virtual assistant that interacts via WhatsApp. The video covers topics such as OpenAI integration, building an API that stores conversations by ID, integration with WhatsApp via Facebook API, and audio processing with Pygame.
- Additionally, Marvin shared a discount coupon for his book ‘Python, ChatGPT e Django REST’, and suggested visiting his GitHub repository which contains the project’s source code.
Update Launch of myGPTBrain: @bigansh announced the launch of version 2 for myGPTBrain, including new features, updated landing page, document parsers, and an introduction of user subscriptions. A product update guide is available on Loom, and a public launch blog post. The user also requested feedback on potential new features and design suggestions.

LLM Perf Enthusiasts AI Discord Summary

Discussion on ChatGPT’s Performance led by users @jeffreyw128, @res6969, and @pantsforbirds. Three possibilities were proposed: significant “lobotomization”, an update to prioritize using fewer tokens, or infrastructure issues leading to a loss of parameters.
Inquiry by @robhaisfield in the #finetuning channel about procuring a JSON for fine-tuning the 3.5-turbo model.
Conversations around UNA’s ability to align the Mixture of Experts (MoE) at any level within a neural network. The standout model Xaberius 34B v1 “BETA” was specifically mentioned. Further discussion on future focus on Mixtral was raised by @the.palmik. @the.palmik also inquired if anyone had successfully run Mixtral, followed by @robhaisfield asking about the requirements to get Mixtral running. Hacker News Post
User activity notifications and engagements in the #irl channel; @thisisnotawill announced a temporary leave, @psychickoala checked for active users, @frandecam confirmed user presence but announced their departure, and @res6969 expressed positive retrospection about their time spent.

LLM Perf Enthusiasts AI Channel Summaries

▷ #gpt4 (8 messages🔥):

ChatGPT Performance Discussion: Users @jeffreyw128, @res6969 and @pantsforbirds express concerns about the perceived performance drop in ChatGPT. @res6969 speculated that ChatGPT has been significantly “lobotomized”. @pantsforbirds proposed that the system might have been updated to prioritize using fewer tokens or that there could be infrastructure issues resulting in loss of parameters.

▷ #finetuning (1 messages):

robhaisfield: Anyone have a JSON I can use to fine-tune 3.5-turbo so I can just see how it works?

▷ #opensource (3 messages):

UNA aligning the MoE: @the.palmik mentioned that UNA can align the Mixture of Experts (MoE) at almost any level within a neural network. They specifically mentioned Xaberius 34B v1 “BETA” as a noteworthy example. They also expressed a future focus on Mixtral. Related Post on Hacker News
Inquiry regarding Mixtral Implementation: @the.palmik asked if anyone had successfully run Mixtral. This was followed up by @robhaisfield asking about the requirements to get Mixtral running.

▷ #irl (4 messages):

User @thisisnotawill announced they would be away for a bit.
User @psychickoala later asked if people were still present in the chat.
@frandecam responded to specify that people were indeed still active but they would be leaving.
User @res6969 expressed their enjoyment for their time on the chat retrospectively.

Skunkworks AI Discord Summary

Discussion on the GPU requirements for Bf16 was triggered by @ufghfigchv, noting that it requires the utilization of newer GPUs such as a6000 or a100 for optimal performance.
@teknium shared a Twitter link in the general channel without providing further context.
An exploration into the Megablocks Research Paper shared by @.mrfoo, with additional insight being provided into different GitHub repositories related to Megablocks: MistralAI’s version which includes custom code, and the official version by Stanford Futuredata which was stated to have more recent updates.
The announcement of Mistral’s new 8x7B Model by @moonlightgarden in the moe-main channel.
A shared link towards Mistral AI’s status by @huevosabio which, unfortunately, led to an error page.
Pradeep1148 shared a YouTube video in the off-topic channel without any additional context.

Skunkworks AI Channel Summaries

▷ #general (2 messages):

GPU Requirements for Bf16: @ufghfigchv mentioned that Bf16 is faster, but it requires the use of newer GPUs like a6000 or a100.
Twitter Link Shared by teknium: @teknium shared a Twitter link with no further context provided.

▷ #papers (3 messages):

Megablocks Research Paper: @.mrfoo shared a research paper titled Megablocks which can be found at the following link.
Megablocks GitHub Repo by MistralAI: The repository by MistralAI for Megablocks was shared by @.mrfoo.
Official Megablocks GitHub Repo: @stereoplegic noted that the official Megablocks repository (owned by Stanford Futuredata) was updated more recently. The repo can be found at this link.
Custom Code on Mistral’s Version of Megablocks: @.mrfoo noted that Mistral’s Megablocks repository contains custom code focusing on their new MOE. He also mentioned there is a separate branch for this as well.

▷ #moe-main (2 messages):

Mistral’s new 8x7B Model: User @moonlightgarden informed the channel that Mistral has released a new 8x7B model.
Mistral AI Status Link: User @huevosabio shared a link to Mistral AI’s status, but the page showed an error message: “Something went wrong, but don’t fret — let’s give it another shot.”.

▷ #off-topic (1 messages):

pradeep1148: https://youtu.be/mAGLD5598cs

MLOps @Chipro Discord Summary

An upcoming event titled Novus #14 was shared by @jaskirat with a link for details and registration, prompting @Raigon to inquire about the possibility of event recording.
There was a discussion on segmentation models selection with @mattrixoperations recommending models like FastSAM, MobileSAM, SAM, and Yolo-seg, specially endorsing the YOLOV8-seg model. He particularly cautioned against using SAM for microscopy tasks, urging for the use of a smaller model and its fine-tuning for tagged data, with the source mentioned.
@erisianrite announced their plan to use YOLO as a baseline for performance comparison with their own model, also showing interest in studying segmentation models and expressing gratitude for @mattrixoperations’ advice.

MLOps @Chipro Channel Summaries

▷ #events (2 messages):

@jaskirat posted a link to an event titled Novus #14. The post included a cover image.
Afterward, @Raigon asked if this event was recorded, but the response wasn’t provided in this dataset.

▷ #general-ml (2 messages):

Segmentation Models Suggestion: User @mattrixoperations shared some insights on segmentation model choices, suggesting FastSAM, MobileSAM, SAM, and Yolo-seg. He particularly recommended the YOLOV8-seg model but advised against using SAM for microscopy tasks. He advised for the use of a smaller model and fine-tuning it on some tagged data (source).
Use of YOLO for Baseline Comparison: User @erisianrite mentioned their plan to utilize YOLO as a baseline to compare performance with the model they develop. They also expressed their intent to study segmentation models, expressing gratitude for @mattrixoperations’ suggestions.

Alignment Lab AI Discord Summary

The OpenML Guide was introduced to the community by @severus_27. This guide provides an array of open-source, free resources related to AI, covering various topics like computer vision, NLP, deep learning, AI in healthcare, robotics, and the mathematical principles underpinning AI.
The OpenML Guide is accessible via its website, and its GitHub repository is available for contributions or support via a GitHub star.

Alignment Lab AI Channel Summaries

▷ #open-orca-community-chat (1 messages):

OpenML Guide Introduction: User @severus_27 introduced the OpenML Guide, mentioning that it offers a wealth of free resources such as books, courses, papers, guides, articles, tutorials, notebooks and many more for learning topics related to AI like computer vision, NLP, deep learning, AI in healthcare, robotics and the mathematics behind AI’s core principles. Additionally, all the resources are open source and freely accessible.
OpenML Guide Website and Github Repo: The OpenML Guide can be accessed at their website and also has a Github repository where the users can contribute or show their support by giving it a star.

▷ #looking-for-workers (1 messages):

Introduction to OpenML Guide: @severus_27 introduced the OpenML Guide, an open-source, free resource offering a considerable array of AI-related content such as books, courses, papers, guides, articles, tutorials, notebooks, and more. The guide caters to a variety of AI interests including computer vision, NLP, deep learning, AI in healthcare, robotics, and the mathematics behind AI’s core principles. The OpenML Guide website and the project’s GitHub repository were shared.

The Ontocord (MDEL discord) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI Engineer Foundation Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Perplexity AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it