Shameless plug time - the Latent Space Pod shipped the [first NeurIPS Best Papers recap pod](https://twitter.com/latentspacepod/status/1738709627829883346)!

image.png

3hrs of the best papers of 2023. enjoy.

[TOC]

Nous Research AI Discord Summary

  • Discussions regarding optimizing AI performance with a focus on using shorter contexts for better results, potentially transitioning these insights to GPT-4 and publishing them as blogs.
  • An active exchange about malware security in relation to HuggingFace, with a user sharing personal experience of a potential malware threat via email.
  • Continued interest in video and music content, with users sharing YouTube links of various genres, including a discussion on the possible upscale of YouTube. Issues around clickbait in AI reporting also revealed, calling for more honest representations in AI-related media.
  • In-depth conversation on technical advancements in machine learning, featuring research papers about DYAD, a novel alternative to linear layers, and Apple’s newly launched ML Ferret. Users also navigated the process of accessing PALM2 through an API key, with plans to discuss Markovian-type planning for agent LLMs.
  • Discussion on Large Language Models (LLM) centered around building specialized models, handling data scaling (with the sharing of relevant code), embedding and vector database management, exploring model merging strategies, and LLM interpretability resources. The community also shared various AI model performance results (e.g., Hermes 2.5, GPT-4, and Mistral).
  • Active exploration of Striped Hyena Architecture and quantization in the ask-about-llms channel, discussing quantization challenges, RMSNorm issues, and potential fixes. Users also brought attention to an added red hint to “Attention is All You Need” and discussed issues with a NousResearch model.

Nous Research AI Channel Summaries

▷ #ctx-length-research (3 messages):

  • Using Shorter Context for Better Results: User @cognitivetech shared their view that shorter context yields better results when working with chatbot AI, rather than trying to use a long context and summarize a large amount at once. This observation holds true even when transitioning to GPT-4, according to them.
  • Request for Published Insights: @cognitivetech expressed a wish for these insights to be published in a blog for easy reference in future discussions.

▷ #off-topic (32 messages🔥):

  • Possible YouTube Upscale Discussion: User @fullstack6209 shared a YouTube link and expressed surprise about a possible upscale of all of YouTube. The subject matter of the video is Gabrielle Drake’s portrayal in the SHADO UFO series.
  • Malware through HuggingFace Reference: User .beowulfbr shared their personal experience of receiving an email from a supposed South Korean researcher, offering a $15 Amazon Gift Card in exchange for completing a study form. Beware of potential malware. The email was received due to the user’s activity on HuggingFace.
  • Song Recommendation and Appreciation: @fullstack6209 shared another YouTube music video link to the song “Anvil” by Lorn. User .beowulfbr expressed admiration for the shared tune and requested @fullstack6209 to share their playlist.
  • AI YouTube Channels Discussion: User @Error.PDF expressed their disdain for YouTube channels reporting on AI but using misleading clickbait thumbnails, especially images of robots from the ‘Black Mirror’ series. @n8programs expressed a desire for an AI YouTube channel delivering non-clickbait actual news.
  • AI Explained - YouTube Channel Recommendation: @henriqueln7 shared the YouTube link to the “AI Explained” channel, suggesting it as a good source for AI news, albeit slightly clickbait.
  • Discussion on DYAD: @euclaise shared a link to a research paper about DYAD, a layer designed to serve as a faster and more memory-efficient alternative to linear layers (nn.Linear() in Pytorch). This is used in common subcomponents like the ff module of Transformers. Link to the research paper.
  • ML Ferret by Apple: @tofhunterrr shared a link to Apple’s Machine Learning repository called ML Ferret. The solution is described as an End-to-End Machine Learning Language Model that can accept any form of referring and ground anything in response. Link to ML Ferret.
  • Access to PALM2 through the API key: @night_w0lf and @fullstack6209 had a discussion about accessing PALM2 through an API key provided at https://makersuite.google.com/app/apikey.
  • Planning for Markovian-type planning for agent LLMs: @gabriel_syme suggested setting up a meeting to discuss the Markovian-type planning for agent LLMs.
  • Reflection on the Power of Scale in Language Modelling: @gabriel_syme shared a blog post discussing how the power of scale revealed in language modeling leads back to compositionality. Link to the blogpost.

▷ #general (303 messages🔥🔥):

  • Building Specialized Models and Data Scaling: User @nanowell brought up the topic of building a set of specialized models that function differently but work together, to which @n8programs suggested the idea of training each expert model on different areas. @emrgnt_cmplxty also shared experiences and challenges in managing large amounts of data (4TB database) and talked about the necessity of more scalable strategies to handle about 100TB of high quality data. User @tokenbender discussed the balance between cost, latency, and accuracy in data management (Link to code).
  • Embedding and Vector Database Discussion: Users @fullstack6209, @gabriel_syme and @emrgnt_cmplxty had an in-depth discussion on various vector database solutions and embedding generation at scale. They shared experiences with solutions like Qdrant, pg-vector, Weaviate, Chroma/Pinecone, and Jina, highlighting challenges in managing and scaling vector databases.
  • Model Merging: The chat saw an ongoing discussion regarding model merging, specifically using methods like SLERP, Ties and others for merging pretrained large language models. Tools like MergeKit were suggested for those looking into model merging.
  • Large Language Model (LLM) Interpretability: User @mayhem1349 shared a repository dedicated to resources on LLM Interpretability (GitHub link). This collection includes open-source tools, papers, articles, and groups focused on interpreting LLMs.
  • Model Performance: Different AI models were discussed across various aspects. @weyaxi shared results from slerp merge without additional training. The community also reflected on models such as Hermes 2.5, GPT-4, and Mistral regarding their performance in coding and reasoning tasks.

▷ #ask-about-llms (25 messages🔥):

  • Red Hint Added to “Attention is All You Need”: User @ex3ndr noticed a red hint added to “Attention is All You Need”. @fullstack6209 speculated it could be due to legal reasons while @lightningralf considered it a part of Google’s push about transformer’s ownership.
  • Issues with NousResearch-Yarn-Llama-2-7b-64k.Q8_0.gguf: User @cognitivetech reported issues with NousResearch-Yarn-Llama-2-7b-64k.Q8_0.gguf model, wondering if there was a specific prompt template to use. .beowulfbr suggested possibly using ChatML.
  • Striped Hyena Architecture: @casper_ai enquired about the Striped Hyena Architecture drawing to use in AutoAWQ. User @teknium pointed him to the main contributor of Striped Hyena.
  • Striped Hyena Quantization: A detailed discussion took place around quantizing Striped Hyena. User @casper_ai mentioned various challenges such as the inability to quantize the filter layer, though attention and MLP layers could be quantized. @zymrael provided helpful insights on sensitivity to quantization and the elements that can’t be quantized.
  • Problem with RMSNorm in Striped Hyena: @casper_ai mentioned encountering an AttributeError related to the 'RMSNorm' object in the context of Striped Hyena, and considered creating a new scaling function for RMSNorm. @zymrael confirmed that the scale is equivalent to the weight in RMSNorm.

OpenAI Discord Summary

  • Debates and knowledge exchange on OpenAI’s tools, particularly GitHub Copilot’s compatibility with JetBrains and its effectiveness for inline coding. “GitHub Copilot works well with JetBrains” -@infidelis and @jessicant.
  • Critiques and recommendations concerning AI applications such as Bing being referred to as one of the worst AI apps for schoolwork due to its failure and illogical content – @caledordragontamer.
  • Exploration of the untapped potential of quantum computers in AI training and the suggestion to read research papers to gain a deeper understanding – @michael_6138_97508.
  • A sudden ban occurrence raised by @0718.eth, highlighting the need for moderation and account security.
  • Discussion on the use of Mixtral 8x7b and Mistral Medium models in OpenAI utilities by @eljajasoriginal.
  • Users reporting GPT-4’s limited capability to analyze data and extract data from files, with some speculating this could be linked with Bing integration.
  • Users sharing their experiences with error messages counting towards their usage limits and the suggestion to implement user feedback to solve errors – @busybenss, @xv_versus, @lugui.
  • User discussion on features expected to be introduced in the future versions of ChatGPT, such as the “My ChatGPT” feature – @jaicraft, @dino.oats.
  • Users sharing dissatisfaction on current GPT-4’s responses and suggestions on how to get improved responses – @gionta21, @rendo1.
  • Dialogs on challenges with OpenAI API connections and potential solutions, with @bluehipp0. sharing their experience on resolving OpenAI API issues.
  • Discussions on problems encountered while upgrading ChatGPT PLUS subscriptions and speculation on possible cause – @ixtatica, @7877.
  • User queries on prompt engineering and feedback for DALL-E image generation and potential issues and improvements in chatbots – @eskcanta, @.shaw93, @madame_architect.
  • Conversations about Dall-E image generation to match specific user requirements, the importance of clear instructions, and potential conversion to pixel art for game usage@eskcanta and @suptrox.
  • Suggestions for better system message structure for chatbots and discussions about using a knowledge base for extensive system information – @madame_architect, @.shaw93.

OpenAI Channel Summaries

▷ #ai-discussions (11 messages🔥):

  • OpenAI Word Blacklisting: @afterst0rm and @i_am_dom_ffs discussed the word ‘openai’ being originally filtered in the UI, but pointed out that the filtering has been fixed and no longer necessary.

  • GitHub Copilot with JetBrains: @infidelis and @jessicant noted that GitHub Copilot works well with JetBrains. Meanwhile, @exx1 added that Copilot is effective for inline completions.

  • AI Apps for School: @caledordragontamer voiced a critical opinion about Bing, stating it’s one of the worst AI apps for schoolwork due to frequent freezing and nonsensical information.

  • Quantum Computers and AI: @moldy21 expressed interest in AI training on quantum computers despite being not fully developed. @michael_6138_97508 advised reading research papers and consulting with ChatGPT for a more solid understanding.

  • Account Banning Issue: @0718.eth reported their account was suddenly banned while they were using it for code completion, seeking guidance on where to get help.

  • Preference for Mixtral Models: @eljajasoriginal commended the performance of Mixtral 8x7b and Mistral Medium models, noticing they have fewer restrictions and can even provide opinions on a variety of subjects.

▷ #openai-chatter (108 messages🔥🔥):

  • Issues with data analysis capability in GPT4: User @rendo1 reported issues with GPT-4’s ability to analyze and extract data from files. It was a functionality that worked a month ago but seems to be experiencing issues now. User @cozy_artist also suggested that this could be related to Bing integration.
  • Errors counting towards usage limit: Users @busybenss, @xv_versus, and @lugui had a discussion about error messages counting towards their usage limit. Despite @lugui’s claim that error messages had never counted towards the limit, other users reported contrary experiences. User @offline suggesting incorporating user feedback to resolve errors and possibly refund usage.
  • Anticipated feature updates: Users @jaicraft and @dino.oats discussed upcoming updates for ChatGPT, particularly a feature known as “My ChatGPT” that was briefly rolled out last month. This feature purportedly personalizes ChatGPT based on user conversations.
  • Restricted access: User @hra42 reported an issue of not being able to access the ChatGPT website without a VPN, suggesting a potential IP or regional issue. @_univurse also highlighted that error messages show when trying to access AI text classifier.
  • ChatGPT’s quality of response: User @gionta21 expressed dissatisfaction with GPT-4’s responses, stating that GPT-3.5 provided more full and insightful responses. @rendo1 suggested that the user be more specific in their prompts to GPT-4.
  • ChatGPT availability: Several users expressed concerns over the functionality of ChatGPT. While @kaveen and @dino.oats asserted that ChatGPT isn’t broken, @jaicraft humorously suggested that it never existed to begin with. @lumirix jokingly termed ChatGPT as an “optical illusion”.

▷ #openai-questions (148 messages🔥🔥):

  • Lang Chain Discussion: @openheroes expressed unfamiliarity with Lang Chain, a topic also unknown to gpt3.5.
  • GPT-4 Verification Hurdle: @Denis Volkov shared his experience of GPT 4 attempting to verify if he was human.
  • Access to GPT Lists with Disabled Chat History: @toutudouhou had a question about accessing GPT lists after disabling chat history. @openheroes confirmed that it’s not possible and history needs to be enabled.
  • OpenAI API Connection Issues: @bluehipp0. encountered and solved a problem regarding OpenAI API connection errors, which initially appeared to be an issue with using "https://api.openai.com/v1".
  • ChatGPT PLUS Subscription Issues: @ixtatica had difficulties upgrading to ChatGPT PLUS when their card, normally usable, kept getting declined. Although they hinted at the possibility of the company being compromised, another user, @7877, found the situation amusing.
  • Support for ChatGPT: @tuxmaster was looking for ways to open a support ticket for ChatGPT as they had been waiting for a response for 2 weeks. They got advised by @satanhashtag to seek help at help.openai.com, but @tuxmaster expressed dissatisfaction with the support service, suspecting it to be run by a limited AI bot.
  • User Verification Requests for Paying Members: @3daisy and @knowcryptoshow discussed the inconvenience of having to constantly verify their identities despite being paying members, speculating that such a process might be more relevant for freemium users.

▷ #gpt-4-discussions (34 messages🔥):

  • GPT-4 Features and Capabilities: User @mrkarumin inquired about the capabilities of GPT-4, specifically regarding its ability to access data from 2023. @jaicraft confirmed that GPT-4 Turbo can access up-to-date information and is enabled by default on ChatGPT 4. They also mentioned additional features in ChatGPT 4 including Dall-e 3 and Code Interpreter.
  • GPT-4 Speed: @mrkarumin noted the excellent response speed of GPT-4, compared to GPT-3.5 and @jaicraft suggested trying GPT-3.5 again with Plus for a super-fast experience.
  • ChatGPT Plus Access: @mrkarumin inquired about getting premium access, which according to @jaicraft will give access to the advanced features like web search, Dall-e 3, and the Code Interpreter mentioned earlier.
  • Actions Function in GPT: @froggy_chacko asked for explanations for the function of “Actions” in GPT, to which @jaicraft replied it enables GPT to access external things using APIs. @sudojames suggested checking out the ‘ACTIONS GPT’ from OpenAI for examples and potential use cases.
  • Disruption in GPT Functionality: @dystopia78 experienced an issue with custom GPT vanishing, while @happyg faced a problem with custom GPTs forgetting instructions but resolved it without external help.

▷ #prompt-engineering (15 messages🔥):

  • Prompt Engineering and Feedback Mechanism for DALL-E Generation: @eskcanta discussed their specific requirements for image generation with @suptrox. Feedback was provided to hone in on a precise output, including tweaking the details of the scene, focusing on elements within the garden, and eventually specifying the desired style as pixel art. Through iterative conversation, @suptrox managed to generate the desired image style.
  • Potential Issue and Improvement Suggestions for Chatbots: @.shaw93 raised a concern about a chatbot divulging information prematurely, before establishing necessary prerequisites like if the recipient is a new client. @madame_architect suggested moving the crucial “first message should ask” instruction to the top and repeating it at the end of the prompt, but also highlighted that an overall quality check may be needed due to the lengthy system message.
  • Utilization of the Knowledge Base: @madame_architect pointed out that significant parts of @.shaw93’s system instruction details might be more appropriate for a knowledge base.

▷ #api-discussions (15 messages🔥):

  • Dall-E Image Generation: User @eskcanta sought advice from @suptrox on generating specific images similar to their ideal preference using Dall-E, and specifically wanted to avoid elements like skies and trees. @suptrox guided on the importance of specific instructions to the AI, and added, “Your ability to communicate exactly what you want is how you succeed with AI.
  • Pixel Art Generation: @eskcanta later sought to convert these Dall-E generated images into pixel art suitable for game usage, which led to further discussions between @eskcanta and @suptrox.
  • Chatbot and Lead Generation: User @.shaw93 solicited help with utilizing the assistants API for a chatbot that initially provides information before asking particular questions. They wished to ensure that certain information is only unveiled to new clients after verifying that they are indeed new clients.
  • System Message Improvement: @madame_architect offered a quick fix to @.shaw93’s problem, advising them to move certain key instructions in their system message and repeat them at the end for better results, and suggested a quality check from a prompt engineer. They also noted that some of the system instruction details seemed more appropriate for a knowledge base.
  • Children Book Illustration Request: User @yantou. made a brief request for a children book illustration.

OpenAccess AI Collective (axolotl) Discord Summary

  • Extensive discussions about the use and specifications of Mixtral on AWQ, including the required amount of RAM and issues around loading large models. Additionally, there were talks about the usage of regular and instruct Mixtral with the same configurations. A GitHub link was shared to the tool named ml-ferret.
  • Gemini API Calls were queried by noobmaster29 who was then informed that they are not free.
  • In-depth deliberations regarding support for ROCm and the capabilities of the new AMD MI300x card. The conversation also touched upon the VRAM requirements for full model tuning with the mention of potential solutions to fit the model on 80GB card. There was a call for compute contribution to optimize Mixtral training with several members ready to contribute.
  • The community shared insights and problems regarding changes in the code for merging loras to models with the latest transformers, suggesting possible solutions like downgrading peft or using axolotl for merging. Also, they shared their experiences with testing merged models using airoboros and LDJnr/Puffin front-ends.
  • A discussion around the ways to lower the targeted parameters count when training certain layers using qlora, and also a query on canceling the completion in axolotl.cli.inference without terminating the entire application. Dangfutures asked for assistance in initiating dpo on the dpo branch.
  • Other topics included methods to encode math for a fine-tuning dataset and the use of a tool for a more human-readable dataset preview with Visual Code suggested as a potential tool. Confirmation that the output from Nougat is compatible with Mathpix.

OpenAccess AI Collective (axolotl) Channel Summaries

▷ #general (21 messages🔥):

  • Gemini API Calls: noobmaster29 asked about the availability of free Gemini API calls. However, @nafnlaus00 yamashi clarified that the API calls are not free.
  • Using Mixtral on AWQ: dangfutures had a query regarding the specific amount of RAM required to load Mixtral on AWQ. @dangfutures casper_ai indicated that 24GB RAM would be sufficient if the operating system doesn’t utilize any GPU, otherwise, at least 32GB would be needed. They further added that it fits the Mixtral model with 512 context length and 512 decoded tokens.
  • Large Model Loading Issues: dangfutures experienced kernel dying issues while loading Mixtral on AWQ. casper_ai explained that notebooks are usually not efficient in loading large models.
  • Use of Instruct Mixtral: dangfutures sought clarification on the usage of regular and instruct Mixtral with the same configurations. @dangfutures casper_ai confirmed the usability with the same settings.
  • Resource Sharing: dangfutures shared a GitHub resource named ml-ferret.

▷ #axolotl-dev (53 messages🔥):

  • Discussion on ROCm and AMD MI300x Support: User @yamashi initiated a discussion about the support for ROCm and the capabilities of the new AMD MI300x card, with an emphasis on its suitability for high-performance computing. User @noobmaster29 provided a press release link discussing the card’s application in inference contexts and expressed a desire for a 48GB consumer card.

  • Considerations for GPU Upgrade: A debate ensued between @yamashi and @noobmaster29 regarding the VRAM requirements for full model tuning, with an indirect suggestion for AMD as a potential solution due to their more generous VRAM provisions. @yamashi expressed the need to upgrade from 4xA100 setup for FFT mixing of Mixtral model.

  • Discussion on Fulltune vs LoRA: @dreamgen asked @yamashi about the differences perceived between Fulltune and LoRA, specifically in a medical context.

  • Potential Solution to Fit Model on 80GB Card: @nruaif suggested freezing the experts layers and using deepspeed 3 to fit the full model on 4 A100 GPUs, but @yamashi clarified that even a 7 billion parameter model requires around 70GB memory.

  • Call for Compute Contribution to Optimize Mixtral Training: User @casper_ai invited others to contribute compute for optimized training of Mixtral with plans to import stuff from MegaBlocks for efficient training. Both @le_mess and @caseus_ offered to help with available compute resources.

▷ #general-help (11 messages🔥):

  • Changes in Merging Loras to Models Code: @jaredquek mentioned that the code for merging loras to models has been greatly changed in the latest transformers and provided a link to the relevant documentation. They also found that their old code wasn’t working anymore. In response, @nanobitz suggested they might try downgrading peft or using axolotl for merging.
  • Testing Merged Loras with Front-ends: @self.1 communicated having successfully tested merged models using both airoboros and LDJnr/Puffin front-ends, and mentioned that the second new line and stop token may be unnecessary.
  • Freezing Layers in Training with Qlora: @xzuyn asked if there’s a way to lower the targeted parameters count when training certain layers using qlora. Their target count remained at 209M parameters despite setting fewer layers to train.
  • Axolotl CLI Inference Query: @marijnfs a inquired if there’s a way to cancel the completion in axolotl.cli.inference without terminating the entire application.
  • Initiating DPO: @dangfutures asked for assistance in initiating dpo on the dpo branch.

▷ #datasets (3 messages):

  • Encoding Math for Fine-tuning Dataset: User @noobmaster29 shared an interest in finding the best method to encode math for a fine-tuning dataset.
  • Dataset Preview Tool: @noobmaster29 expressed interest in utilizing a tool to preview encoded information in a human-readable format. Markdown preview feature in Visual Code being noted as potential tool.
  • Nougat Output Clarification: @noobmaster29 confirmed that the output from Nougat is in fact compatible with Mathpix, resolving any initial confusion.

Mistral Discord Summary

  • Users’ experience and technical discussions on accessing and running MistralAI’s models locally on various systems like M1 machine with 16GB memory and Lenovo Thinkcenter i5 with 32GB RAM were discussed.
    • User @djmango shared that they were successful in running the Mistral models using an M1 machine with 16GB of memory.
    • @ved_ikke added that they can run most Mistral models on a Lenovo Thinkcenter i5 with 32GB RAM.
  • Dialog on the performance of Mistral Medium specifically on creative writing in Chinese, and skepticism around its open source release. The idea of getting insights about Mistral models’ details for better understanding was also discussed.
    • User @ken70wtf expressed his admiration for Mistral Medium’s performance on creative writing in Chinese via poe.com, stating it’s faster than gpt-3.5-turbo.
    • User @tom_lrd questioned the importance of having the specifics of models that will never be run locally.
  • Discussion on optimization ideas to speed up MistralAI using an open-source package named Unsloth. Queries were raised about other optimization methods such as the use of float16 or 8-bit & 4-bit using bitsandbytes, and the use of Flash Attention 2 in context to MistralAI/Mixtral-8x7B-Instruct-v0.1 model.
    • Reddit Post about an open-source package named Unsloth that claims to make finetuning via QLoRA of the Mistral model 2.2x faster and use 62% less memory by leveraging OpenAI’s Triton language.”
  • Conversations on the necessity of an 8k token context window for AGI and possibility of creating AGI with 8k context window. Also, suggestions about using a graph database for efficient code generation.
    • @poltronsuperstar believes that an AGI could be created with an 8k context window, despite its inability to contain whole codebases.
    • @daain proposed using a graph database to hold a semantic understanding of parsed context, for efficient code generation.
  • Discussion on the need and potential benefits of finetuning, as well as the upcoming features in Mistral API – particularly, function calling support for platforms like MemGPT.
    • User @krissayrose poltronsuperstar asked whether finetuning is really needed as they had been using function calling with GPT-3 before RLHF simply through few-shots learning.
    • @flyinparkinglot clarified that although this feature is currently lacking, it is scheduled for future implementation.

Mistral Channel Summaries

▷ #general (49 messages🔥):

  • Accessing MistralAI’s Models: Several users including @antononcube and @sublimatorniq discussed how to access MistralAI’s models programmatically. @antononcube was initially having trouble using the GET method and API key set up to access the models, but eventually, managed to do so with help anddirect code examples from @sublimatorniq.
  • Running Mistral Locally: @rosethelocalfemboy shared that they successfully ran the Mistral model, specifically the 8x7b version, on their local machine. They found it to be of high quality, even though it was a quantized version.
  • Possibly Speeding Up MistralAI: User @red_code shared a link to a Reddit Post about an open-source package named Unsloth that claims to make finetuning via QLoRA of the Mistral model 2.2x faster and use 62% less memory by leveraging OpenAI’s Triton language.
  • Hardware Requirements for Running Mistral: In response to a query by @daveburstein, @djmango shared that they were successful in running the Mistral models using an M1 machine with 16GB of memory. It was pointed out that it’s mostly the memory speed that creates a constraint rather than CPU or GPU strength. @ved_ikke added that they can run most Mistral models on a Lenovo Thinkcenter i5 with 32GB RAM.
  • Fitting Mistral 8x7b on a 24Gb Card: @jdwebprogrammer asked if it’s possible to fit the Mistral 8x7b model on a 24Gb card. They noted that when quantized to 4-bit, the model maxed out and it seemed like it would fit in 25Gb.

▷ #models (8 messages🔥):

  • Mistral Medium’s Performance and Open Sourcing: User @ken70wtf expressed his admiration for Mistral Medium’s performance on creative writing in Chinese via poe.com, stating it’s faster than gpt-3.5-turbo. However, he expressed skepticism over the possibility of its release as an open source and open weight model.
  • LLM Learning Resources: User @Bharat requested for resources to learn LLM at the architecture level in order to contribute to the open source LLM community.
  • Processing Speed Comparison between Platforms: User @sublimatorniq inquired about the processing speed between mistral endpoints and the perplexity endpoint on poe.com in response to @ken70wtf’s statement on the superior performance of Mistral Medium.
  • Inquiry on Model Details: User @tom_lrd questioned the importance of having the specifics of models that will never be run locally during their exchange with @ken70wtf.
  • Reference to Eric Hartford’s Work: In response to @alex_deng’s inquiry, user @sublimatorniq provided links hosted by huggingface.co to uncensored models published by Eric Hartford.

▷ #deployment (5 messages):

  • Personal Interactions: User @alex_deng asked if @sublimatorniq was from Cambodia to which they responded affirmatively.
  • Dolphin 2.6 Mixtral 8X7B: @dutchellie shared a link to the model name Dolphin 2.6 Mixtral 8X7B. @jdwebprogrammer expressed surprise about finding the existence of the Dolphin model and it being already at version 2.6.

▷ #finetuning (1 messages):

  • Necessity of Finetuning: User @krissayrose poltronsuperstar asked whether finetuning is really needed as they had been using function calling with GPT-3 before RLHF simply through few-shots learning.

▷ #showcase (1 messages):

antononcube: https://rakuforprediction.wordpress.com/2023/12/23/wwwmistralai/

▷ #random (8 messages🔥):

  • Optimizations for MistralAI/Mixtral-8x7B-Instruct-v0.1: User @husain3739 initiated a discussion about the optimizations used for executing the MistralAI/Mixtral-8x7B-Instruct-v0.1 in the Hugging Face Chat. They asked whether the default model was executed in full precision or if there were modifications, such as using float16 or 8-bit & 4-bit using bitsandbytes, to reduce memory requirements. They also enquired about the use of Flash Attention 2.

  • Need for an Extended Context Window: @poltronsuperstar and @daain discussed the necessity of an 8k token context window for AGI. @poltronsuperstar believes that an AGI could be created with an 8k context window, despite its inability to contain whole codebases. @daain proposed using a graph database to hold a semantic understanding of parsed context, for efficient code generation.

  • Potential of Lower Tier Mistral Models: @jackson_97091 expressed interest in the new Mistral update on their API, with a 32k token limit. While it’s not on par with the top-tier models, they consider this move due to perceived shifting interests of the higher tier models towards corporate liability.

▷ #la-plateforme (4 messages):

  • Function Call Support in Mistral API: User @brokearsebillionaire inquired about the presence of function calling support in the Mistral API. @flyinparkinglot clarified that although this feature is currently lacking, it is scheduled for future implementation. This news was well-received by @brokearsebillionaire and @antoniofiliona, with the latter expressing eagerness to test the feature with MemGPT.

HuggingFace Discord Discord Summary

  • Revolving around Transformers library’s new llava support, practical experience gaining in RL for an undergrad, handling prompting responses by libraries like Langchain, suggestions for small quantized models for CPUs and mobile devices, and inquiries about Hugging Face’s APIs for Android application, as well as issues related to the application process at Hugging Face and problems with a neural network’s code and fine-tuning stable diffusion general.

  • Showcased member-made apps and tools such as an AI Emoji Translator app, the Mamba model architecture, an investment strategy video utilizing AI technologies, a Christmas-themed video, and a Chrome browser extension for Musicgen continuations i-made-this.

  • Discussions on the definitions recurring in DDPM and DDIM papers particularly the symbols alpha_bar_t and alpha_t, issue exploration, and seeking effective text embedding for coherent image generation diffusion-discussions.

  • Query about conducting key point detection on tooth X-ray images and calculating distances among detected points computer-vision.

  • Confusion regarding the differences between AutoModelForQuestionAnswering and AutoModelForSeq2SeqLM called for community-insight NLP.

HuggingFace Discord Channel Summaries

▷ #general (20 messages🔥):

  • Llava Support in Transformers: @meatfucker mentioned that Transformers library just added llava support.
  • Undergrad Seeking Advice on RL: @swadine, an undergrad student, is seeking advice on how to gain practical experience in Reinforcement Learning (RL) as the advanced course in Deep RL is not running in the upcoming semester at their university.
  • Langchain Library and Prompt Responses: @somefuckingweeb asked about how libraries like Langchain handle prompting responses to actual tool invocations.
  • Quantized Models for CPUs and Mobile Devices: @vishyouluck requested suggestions for small, quantized models which can run on CPUs and smartphones for basic Q&A and text generation tasks. @kubuxu suggested Quantized Mistral 7B.
  • Query About Hugging Face API: @abrarsharif made an inquiry regarding the existence of Hugging Face’s APIs like OpenAI for integration into an Android application.
  • Internship Application at Hugging Face: @_aabidk asked about application process at Hugging Face for multiple internship positions.
  • Code Issue with Neural Network: @power9799 sought help about a code issue with a neural network. The problem was related to mismatched dimensions in batches.
  • Difficulty in Fine-tuning Stable Diffusion: @isleepwhenimcomfortable requested assistance in fine-tuning stable diffusion on Collab due to directory errors.

▷ #i-made-this (9 messages🔥):

  • Emoji Translator AI App: User @gospace7575 shared the creation of an interesting AI app named Emoji Translator which is capable of translating text into emojis and vice versa. This app can generate entire stories with only a few emojis.
  • Mamba Model Architecture: User @qbert000 announced the successful implementation of the Mamba model architecture. They have made it available on GitHub and also on Hugging Face under the collection name Q-bert/Mamba-130M.
  • Investment Strategy Video: An investment strategy video utilizing Stable diffusion, Leonardo Motion, and Pika was shared by @andysingal. The video can be viewed here.
  • Christmas Vibes Video: A Christmas-themed video, seemingly created with AI, was shared by @andysingal to celebrate Christmas with the rest of the Hugging Face team. The video can be viewed here.
  • Chrome Browser Extension for Musicgen Continuations: User .bigdookie shared his project, a Chrome browser extension for Musicgen continuations that listens for your position in a YouTube track and starts a continuation from there, while ensuring the continuation stops at the end of a bar. He also mentioned a component for arranging remixes. The project’s updates were shared on this Twitter link.

▷ #diffusion-discussions (3 messages):

  • Understanding DDPM and DDIM papers: User @wukong7752 is learning the DDPM and DDIM paper and expressed confusion about the use of the symbol alpha_t in these papers. They noted that alpha_t is defined as a decreasing sequence in the DDIM paper, however, in many implementations, people define alpha_t in DDIM algorithms the same as the alpha_bar_t in DDPM. They are seeking clarification on this matter.
  • @pseudoterminalx informed @lorenz1392 about looking into a particular issue. The details of the issue were not given in the shared chat snippet.
  • @vipitis expressed a desire to @lorenz1392 to find a text embedding that generates coherent images which perform well under a specific differentiable xai evaluation metric.

▷ #computer-vision (1 messages):

  • Tooth X-ray Key Point Detection and Distance Measurement: User @navinaananthan inquires about the feasibility of performing key point detection on tooth X-ray images and determining the distance between the detected points using any pre-existing models.

▷ #NLP (2 messages):

  • Difference between AutoModelForQuestionAnswering and AutoModelForSeq2SeqLM: @opencuiguy asked for clarification on the differences between AutoModelForQuestionAnswering and AutoModelForSeq2SeqLM. The conversation awaits input from other participants who can offer further insight.

▷ #diffusion-discussions (3 messages):

  • Query About Text Embedding for Generating Coherent Images: @lorenz1392 sought advice on finding a text embedding that generates coherent images which perform well under a certain differentiable xai evaluation metric. @vipitis responded with a willingness to engage in this topic.

  • Question on DDPM and DDIM Paper Symbols: @wukong7752 raised a question regarding the symbols alpha_bar_t and alpha_t used in the DDPM and DDIM papers. They found that alpha_t in DDIM was defined similarly to alpha_bar_t in DDPM by many implementations, and inquired for clarity on whether this is coincidental or based on an unmentioned reason.


Skunkworks AI Discord Summary

  • A user reported being banned from both ChatGPT Discord and LM Studio Discord without given reasons; discussion on this matter might follow up.
  • A plasma physics application was shared by a user, adding to the compilation of AI-related tools and projects within the community.
  • An AI tool directed for education was noted to be in development, without further details provided.
  • The topic of multilingual models was surfaced, with an emphasis on the future support to be provided by AI assistant Aya.
  • Curiosity was expressed in the development of a model that can generate and answer questions from a large dataset, like RedPajama. The idea of a self-searching model or a Retriever-Augmented Generation method for filling knowledge gaps was also discussed.
  • There was reference to the significant potential of large corpora in improving long context comprehension in models, given the presence of questions, instructions, and insights.
  • A novel idea was proposed to use the Language Model itself as a hypernetwork, predicting parameters to expand or implement new layers, via a Layer-wise Relevance Propagation method specific to each task.
  • The Nasty Teachers paper was discussed, with queries raised on modifying output vs altering the loss function, and implications when probabilities of all classes need to be considered.
  • The challenges of monetizing AI apps were brought up, with a user stating involvement in a project aimed at simplifying selling API access.

Skunkworks AI Channel Summaries

▷ #general (7 messages):

  • Ban from ChatGPT Discord and LM Studio Discord: User @antdx316 stated that they got banned from both the ChatGPT Discord and the LM Studio Discord. No reason for the ban was provided in the messages given.

  • Link to Plasma Physics Application: User @anjor_20331 shared a link to a plasma physics application but did not provide any further information or context about it.

  • AI Tool For Education: User @fred_fups stated that he is building an AI tool for education. No further details about the project were provided in the messages given.

▷ #datasets (3 messages):

  • Multilingual Models: @stereoplegic mentioned the development of multilingual models and expressed hopes that Aya would assist with this task soon.

  • Question Generation/Answering from a Large Corpus: @stereoplegic has plans to develop a model able to generate and answer questions from a large corpus, like RedPajama. He also expressed interest in the model being able to self-search or use a Retriever-Augmented Generation method to fill in gaps in its knowledge based on a given prompt or a large corpus.

  • Long Context Comprehension in Large Corpora: He also mentioned the significant potential of large corpora in improving a model’s comprehension in long contexts, provided that relevant questions, instructions, and related insights are present.

  • Using LLM as its own Hypernetwork: @stereoplegic proposed a niche idea to use the Language Model itself as a hypernetwork. This would help in predicting parameters to expand its existing layers or add new ones, possibly by using a Layer-wise Relevance Propagation specific to that task. He noted it could be beneficial if the loader has surplus free Virtual RAM to utilize.

▷ #ft-study-main (1 messages):

  • Nasty Teachers Paper Discussion: @mootverick brought up the topic of the Nasty Teachers paper, summarizing the methodology as creating a random spike at a few incorrect labels to produce an appearance of accuracy. They raised two questions on this approach:
    • The method might not be helpful when probabilities of all classes need to be taken into account, not just the top class.
    • The user queried if the same result can be achieved by modifying the output, as opposed to altering the loss function.

▷ #off-topic (1 messages):

  • Monetizing AI apps: User @megazord initiated a discussion about challenges in monetizing AI apps and mentioned working on a project aimed at simplifying the process of selling API access.

LangChain AI Discord Summary

  • User @shivam51 requested for a LangSmith referral code in the general channel.
  • Various technical issues were discussed in the LangChain context:
    • @ninamani was facing problems while trying to run the Llama-2 chat model on LangChain, encountering errors with both ChatOllama and Llama2Chat methods.
    • @ninamani also explored the possibility of merging features of llama-cpp-python with LangChain, mentioning earlier unsuccessful attempts and discrepancies in chat prompt templates.
  • Importance of ensuring that all essential information is properly transmitted to Retrieval and Question Answer step in RAG chains was discussed by @a404.eth lhc1921.
  • @motaz_hashhoush made an inquiry regarding Prompt Acquisition in ConversationalRetrievalChain, particularly when using ConversationSummaryMemory. A function to count the number of tokens was proposed.
  • In the ‘share-your-work’ channel, @rajib2189 shared a YouTube video and a GitHub repository demonstrating how to use AWS Bedrock Agent programmatically. Furthermore, @rajib2189 opened a discussion on Prompt Optimization, inviting inputs from those who have attempted optimization.

LangChain AI Channel Summaries

▷ #general (8 messages🔥):

  • Langsmith Referral Code Request: @shivam51 has requested a LangSmith referral code.
  • Llama-2 Chat Model Issues on LangChain: @ninamani has been facing issues while attempting to run the Llama-2 chat model on LangChain. They are receiving errors while using both ChatOllama and Llama2Chat methods.
  • Merger of llama-cpp-python and LangChain: @ninamani also inquired about the potential to marry features of llama-cpp-python and LangChain, highlighting a previous failed attempt and mentions the discrepancies in chat prompt templates.
  • Retrieval and Question AnswerStep for RAG chains: @a404.eth lhc1921 discussed the importance of ensuring all necessary information is effectively passed to Retrieval and Question Answer step in RAG chains.
  • Prompt Acquisition in ConversationalRetrievalChain: @motaz_hashhoush inquired if it was possible to obtain a full prompt from ConversationalRetrievalChain before feeding it to the model, especially while using ConversationSummaryMemory. He further specified the need for a function that counts the number of tokens.

▷ #share-your-work (2 messages):

  • Using AWS Bedrock Agent Programmatically: User @rajib2189 shared a YouTube link to a video demonstrating programmatic access to AWS Bedrock Agent. The linked GitHub code repository was also provided.
  • Optimization of Prompts: @rajib2189 expressed a suspicion that the prompts are not optimized and welcomed input from anyone who may have tried to further optimize them.

Latent Space Discord Summary

  • In the AI General Chat, there was a resource sharing discussion. User @swyxio shared a link to an episode of “The Cognitive Revolution”, enriching the community with more knowledge on AI-related content.
  • An idea was proposed by @lightningralf about the utility of developing an IPTC metadata filler that could insert keywords, descriptions, and more.
  • User @gratchie1188 raised a question regarding the best practices for interfacing with time series databases due to the perceived lack of solutions for them compared to text and SQL databases.
  • @swyxio made an announcement in the AI Event Announcements channel about a special podcast episode which is a recap of NeurIPS (Part 1). A link to the tweet announcing the new episode was shared for easy access.

Latent Space Channel Summaries

▷ #ai-general-chat (4 messages):

  • Mamba Explainer: User @swyxio shared a link to an episode of “The Cognitive Revolution”, a podcast that provides explainers on various AI-related topics.
  • IPTC Metadata Filler Request: User @lightningralf suggested the utility of an IPTC metadata filler with features such as keyword insertion, descriptions, and more.
  • Time series DB Interfacing: @gratchie1188 asked for recommendations for interfacing with time series databases, noting a lack of solutions compared to those for text and SQL databases.

▷ #ai-event-announcements (1 messages):

  • NeurIPS Recap Part 1: User @swyxio announced the release of part 1 of their special weekend podcast episode - a recap of NeurIPS. They shared a link to the Tweet announcing the new episode for easy access.

Alignment Lab AI Discord Summary

  • A curated Github repository for Large Language Model (LLM) Interpretability, featuring open-source tools, papers, articles, and groups was shared by @mayhem1349. The resources can be found at this link
  • @burnydelic suggested adding the Mech Interp Discord group to the list of resources on LLM Interpretability.
  • An intriguing poster sighting was reported by @teknium neilbert., however, insufficient details were provided about it.

Alignment Lab AI Channel Summaries

▷ #general-chat (4 messages):

  • LLM Interpretability Resources: @mayhem1349 has shared a Github repository containing a curated list of open-source tools, papers, articles, and groups related to Large Language Model (LLM) Interpretability.
  • Additional Group for LLM Interpretability: @burnydelic suggested adding the Mech Interp Discord group to the list of resources on LLM Interpretability.

▷ #general-chat (1 messages):

  • Poster Sighting: User @teknium neilbert. shared an experience of spotting an intriguing poster, suspecting the creator to be an author of the content presented on it. No additional details were provided regarding the content of the poster or the potential UW connection postulated by the user.

DiscoResearch Discord Summary

Only 1 channel had activity, so no need to summarize…

  • Custom MoE models: @jp1 discussed a custom 2bit quant with 4 experts that has consistent output up to 500 tokens. They offered a link to the experimental quants of 4 expert MoE mixtrals in various GGUF formats on Hugging Face here.
  • 4 Expert MoE Mixtrals: The goal, according to @jp1, is to create the best-performing MoE below 10GB. Also, they shared experimental q8 and q4 files available for training and finetuning, specifying that “No sparsity tricks yet” have been used.
  • Installation of Llama.cpp: @jp1 provided a brief guide to download and run llama.cpp from Github, concluding that their 8.4GB custom 2bit quant works okay until 512 token lengths, after which it starts looping.