Inspired by Patrick Collison's [Fast Grants](https://future.com/what-we-learned-doing-fast-grants/), Jan Leike is [launching his own](https://openai.com/blog/superalignment-fast-grants):

image.png

The Notion page has research directions to explore in more detail, with many good papers.

image.png

[TOC]

OpenAI Discord Summary

  • Introduction of the new OpenAI Developers handle on Twitter under the username @OpenAIDevs. The online community members were encouraged to follow this account for updates. Twitter link.
  • Discussions around the utilization differences between Gemini vs Bard chatbots from OpenAI, with affirmation that both bots can read each other’s instructions and provide newer, unique perspectives on solving coding problems.
  • Various issues with OpenAI’s services were highlighted, from problems with OpenAI’s website, to features like archiving and phishing captcha. Also included were ongoing conversations around GPT customization and dissatisfaction about the business relationship between OpenAI and Microsoft.
  • A heated debate existed regarding the performance and function concerns of GPT (OpenAI). Users reported various problems with GPT-4, such as unfriendly responses and inaccessibility, and issues with its capacity. Furthermore, there were speculations on the potential release of an updated version, GPT-4.5.
  • Users shared their experiences with custom GPT-4 difficulties, their struggles with PDF file attachments, and potentials of using Dalle 3 in GPT for style imitation. An announcement revealed that a bug with image recognition was resolved.
  • People discussed issues around prompt engineering, like struggles with getting the AI to adhere to detailed guidelines for modifying code and difficulties in getting the AI to rewrite original text into a format suitable for reading aloud. Introduction of an enhancement in Convo-lang with JSON mode support was also noted.
  • In the API discussions, there were further elaborations on newly introduced JSON Mode support in Convo-lang, necessary rules for producing Venn diagrams, issues with GPT-4 in modifying large scripts without stripping off the comments and introduction of placeholder comments, and challenges in making human-written text sound more natural for TTS in informative ā€˜talking head’ videos, especially in the medical field.

OpenAI Channel Summaries

ā–· #annnouncements (1 messages):

  • Announcement of New OpenAI Developers Twitter Handle: @abdubs introduced the new OpenAI Developers handle on Twitter and encouraged users to follow for updates. The handle can be accessed via this link.

ā–· #ai-discussions (91 messagesšŸ”„šŸ”„):

  • Gemini vs Bard: User @offline discusses their experience with using the Gemini and Bard chatbots from OpenAI. They mention that the bots can read each other’s instructions, providing unique perspectives on coding issues, and giving newer information than what GPT provides. User @muyfashionista shares they have not noticed significant differences between these bots.

  • Use of AIs for Illegal activities: User @sylviajade raises a concern about chatGPT being used to generate malware, wondering if any tracking or monitoring of such uses is in place. Respondent @lugui clarifies that violating the Terms of Service (ToS), which includes illegal activities, can result in account bans on OpenAI.

  • Customizing GPTs: Users @rudish, @elektronisade, and @aznironman discuss on how to customize GPT for various purposes, including use in Discord bots and for creating an offline chatbot. @webhead recommends using Llama2 based models for most cases due to its good licensing conditions and availability of various models.

  • Concerns with GPTs and Alternatives: Users @stevenramenby and @jerrybobs share their dissatisfaction with GPTs, citing issues with rambling words and plugin failures. @jerrybobs suggests looking into AI Assistants, which are harder to set up but purportedly more reliable.

  • Perplexity chatbot: User @chief_executive shares their positive experience with the Perplexity chatbot, commending its real-time browsing capability for information gathering. Comparatively, they find Perplexity superior to both ChatGPT’s browsing tool and Bing AI. They also mention that Perplexity provides 600 uses a day under its Plus version for $20 a month.

ā–· #openai-chatter (544 messagesšŸ”„šŸ”„šŸ”„):

  • Performance issues and concerns with GPT-4: Users @koll0212, @stevenramenby and @lindeberg02 reported experiencing issues with GPT-4, mentioning it as ā€œacting up,ā€ giving random words and emojis as response, inability to use it, and even cancelling subscription due to these problems. @stevenramenby shared several instances of the issues in chat transcripts, which were discussed with users @offline, @rjkmelb, and @solbus. The issue seemed to persist across custom agents for @stevenramenby, prompting further help requests in the specified channels.
  • ChatGPT Down Experience: Users @zappymonster, @7877, @andrew.lol, @gardener_windson, @acaba1806, @bricky___, @smokzz and @thepitviper reported experiencing a downtime with the ChatGPT platform. They described it as returning 502 errors, timing out, or being unable to open the site.
  • Usage Cap and User Interface Concerns: Users @lindeberg02, @openheroes, @solbus and @kyoei discussed the usage cap on GPT-4 for ChatGPT Plus. @solbus clarified that the cap is currently 40/3hrs for the base GPT-4 usage and custom GPTs are limited to 25/3hrs. @lindeberg02 complained about restrictions after payment and sought more information about the cap.
  • Expectation and Speculation on GPT-4.5 Release: Users @realmunknown, @DawidM, @realgavok, @mrcrack_, and @lugui engaged in discussions and speculation about the potential release of an update, specifically GPT-4.5. These were based on rumors and leaks, with @DawidM noting that a user on Reddit was a source of information about the potential model update.
  • GPT Performance and Function Concerns: Users @skrrt8227, @you.wish, @afterst0rm, @kyper, and @bad3r voiced criticisms about GPT’s perceived limitations, including an inability to effectively format text, a lack of consistency, and the impact of a usage cap. @bad3r specifically criticized the server capacity of the platform, arguing that ChatGPT Plus users should be allocated more bandwidth.

ā–· #openai-questions (251 messagesšŸ”„šŸ”„):

  • Issues with OpenAI Services: Several users reported issues with OpenAI’s services, from problems with OpenAI’s website @jcrabtree410, @mohammad_soqar101, @l3xp3rt, @kamikolart, @dsssssssss and @cosmosfractal, to specific features and functions such as archiving @nickiee and @xhls and phishing captcha pop up despite being logged in from a familiar device @vassiou.
  • Chatbot Discussions: Multiple users sought assistance or had conversations around GPT usage and customization. Users such as @stevenramenby, @bionicle1337, and .cybersenpai discussed issues and questions around custom GPTs, plugin behavior, and fine-tuning. Got several clarifications from @elektronisade and @satanhashtag amongst others.
  • Disputes Around OpenAI and Microsoft: User @bionicle1337 expressed dissatisfaction about the business relationship between OpenAI and Microsoft, expressing beliefs of unfair business practices and antitrust violations. Some community members like @elektronisade and @Rock provided different perspectives but the discussions remained unresolved.
  • Product Features and Improvements Discussions: @stevenramenby reported anomalies with custom GPT’s output, @jtensetti discussed the name toggle issue with Plus subscription. In line with this, .staycurious inquired about setting up a feature for automatic motivation emails, and @jah777 asked about integrating a ChatGPT bot on Discord. Relevant suggestions were provided by other members.
  • Multiple Language Discussions: Various users were engaging in conversations in multiple languages, being reminded by @satanhashtag to use English for broader communication.

ā–· #gpt-4-discussions (16 messagesšŸ”„):

  • Chat Limit Issues: User @zeriouszhit expressed concerns about facing a chat limit even when encountering network errors or bugs on OpenAI’s end, describing a situation where they were unable to process images and were penalized with chat limit reductions.
  • Image Recognition Bug Fix: @solbus mentioned that an image recognition bug has been resolved according to reports on a specific Discord thread.
  • Custom GPT Disappearances: @ps5671 and @pietman talked about issues with custom GPTs disappearing or not being editable, which seem to have resolved eventually.
  • GPT Ability to Generate Detailed Technical Analysis Charts: @consciousness3783 inquired about GPT-4’s capabilities in generating technical analysis charts with details like price action indicators through images and text within a single prompt.
  • Issues with PDF Files: @jazy_87295 encountered difficulties when trying to attach a PDF file in GPT, wondering if a conversion to the TXT format is necessary.
  • Style Imitation with Dalle 3: @xchronos shared an idea of using Dalle 3 in GPT to create images that imitate the style of other provided images.

ā–· #prompt-engineering (22 messagesšŸ”„):

  • Convo-Lang Update: User @iyioio reported an enhancement in Convo-lang, with JSON mode support now available. They provided detailed examples of how to define structs and utilize JSON for various data formats, including a discussion of using it with Vision and function calling. Find more details in the NPM docs for Convo-lang.
  • Instructions for Chatbots: User @offline suggested the strategy of prefixing instructions with numbers for easy referencing, along with emphasizing that the bot should only produce VENN diagrams with 3 circles unless otherwise stated by the user.
  • Script Modification Guidelines: @joeyjoejoeshabadoo42 is experiencing difficulties in getting the AI to adhere to their detailed guidelines for modifying code, specifically when the code size reaches or exceeds 110 lines. The rules include preserving all code comments, modifying as minimal code as possible, clear explanation of modifications, honoring the code’s history, and always providing a full modified script without placeholder comments.
  • Natural Language Processing: @.ursium expressed difficulty in getting the AI to rewrite original text into a format suitable for reading aloud, in a specific context of creating medical videos. They highlighted issues where the AI changes nuances in the text, sometimes affecting the meaning of important information.
  • NLP responses: @world_designer suggested informing the AI about the context and nature of the script for a more accurate and nuanced understanding, which @.ursium agreed to try.

ā–· #api-discussions (22 messagesšŸ”„):

  • JSON Mode support in Convo-lang: @iyioio introduced JSON Mode support in Convo-lang with examples demonstrating application in both simple and complex scenarios, such as listing planets in the solar system and describing people in an image. More details can be found in the NPM documentation for convo-lang.

  • Rule for Producing Venn Diagrams: @offline stated a clear rule that Venn diagrams produced should only have 3 circles unless the user specifies otherwise.

  • Custom Instructions for Code Modifications: @joeyjoejoeshabadoo42 has been facing issues with large python scripts when using custom instructions in the context of GPT-4 via the web chat UI. The AI fails to adhere to instructions, especially when the script size reaches around 110 lines. It gives placeholder comments and strips code comments.

  • Making Text Sound Natural for ā€˜Talking Head’ Videos: @.ursium is trying to make human-written text sound more natural for TTS in informative ā€˜talking head’ videos, specifically within the medical field. The issue is that when asked to simplify the text or remove difficult to pronounce words, the text is rewritten and critical words such as ā€˜may’ or ā€˜sometimes’ are replaced or removed, changing the meaning.

  • TTS AI Model Vs ChatGPT: In the conversation between @.ursium and @world_designer, @world_designer initially proposed using a separate TTS AI model for @.ursium’s problem. However, after @.ursium clarified that the issue involves adjusting written text for spoken presentation and not generating TTS, @world_designer suggested explaining the script context to ChatGPT.


OpenAccess AI Collective (axolotl) Discord Summary

  • Extensive dialogue on AI performance and training strategies involving models like PHI-2 and A6000s. Critical conversations included potential overalignment of AI models, strategies for securing free computational power, handling CUDA out of memory issues using techniques like reducing micro batch size, and contemplating on the risk of overtraining during fine-tuning processes.
  • Deep dive into DPO training was initiated by @faldore where he shared a link to a RL/DPO pull request #935 on the OpenAccess AI Collective Axolotl repository.
  • Significant discussions around template and token configurations, for instance the thread on logic behind LLAMA2 SeparatorStyle which seemed to skip index 0 in messages dropping any instructions and the issue with unexpected space additions linked to the Hugging Face tokenizer opened by @Aleksa Gordić.
  • Users also shared their exploration on handling end-of-sentence (EOS) token generation issues, proper load time for Qlora with ChatML template during inference stage, and finding the right configuration format for training models.
  • The suggestions and fixes for the aforementioned challenges were often accompanied with Git pull requests, such as PR #952 submitted by @hamelh to rectify the LLAMA templating concern.
  • The guild had detailed conversation on datasets, discussing topics like the token to TB ratio of models, hardware requirements, and duration for model training. There was also a query on function calling datasets with Glaive Function Calling v2 dataset being shared as a potential resource.

OpenAccess AI Collective (axolotl) Channel Summaries

ā–· #general (85 messagesšŸ”„šŸ”„):

  • AI Overalignment and Performance: @noobmaster29 and @faldore discussed how their AI models might be overaligned. @faldore shared a humorous system prompt he used for his AI model and the group discussed the potential consequences if an AI were to realize the incentivises were false.
  • AI Training: @faldore discussed his plans to train a phi2 model. @le_mess asked for advice on getting free compute, to which @faldore suggested applying for Microsoft’s Level 3 startup programme.
  • AI Model Fine-tuning and OOM issues: @mihai4256 experienced CUDA out of memory issues while using deepspeed zero3 to fine-tune a Yi model with 8 x A100 (that is 8 * 80 GB), @nruaif suggested reducing micro batch size and increasing accumulation steps. @dangfutures also encountered out of memory issues while fine-tuning the full model.
  • DPO Training and Git Repo:The users discussed DPO training with @faldore sharing a link to a RL/DPO pull request #935 on the OpenAccess AI Collective Axolotl repository.
  • Troubleshooting Issues: @matts9903 and @dangfutures reported issues when saving checkpoints and after one epoch of model training. They received some troubleshooting suggestions from @caseus_ and @dangfutures.

ā–· #axolotl-dev (111 messagesšŸ”„šŸ”„):

  • Fine-tuning and Overfitting: Users @caseus_, @faldore, @noobmaster29, and @nruaif were discussing about how much fine-tuning and Quick Language-learning Objective with Reinforcement Actions (Qlora) to apply on A6000s and PHI-2 models, with the topic raising concerns about overtraining. Users suggested both full fine-tuning and Qlora first, highlighting that adjustments can be made if the model begins to overfit.

  • Optimization Concerns on Mixtral: User @casper_ai shared his unsuccessful experience attempting to work with optimization suggestions from unsloth.ai. He is looking forward to their potential pull request (PR) to check if improvements can indeed be achieved, especially with Mixtral models where memory is crucial.

  • Llama2 Templating Issue: User @Aleksa Gordić brought up an issue regarding the logic behind airoboros, llama-2, and others relying on SeparatorStyle.LLAMA2, as it appears to skip index 0 in the messages thus dropping any instructions. He proposed a solution where index 0 should be yielded together with the system prompt. This discussion further evolved into a discussion on more robust solutions, test cases, and other Lama2 related issues like prompt assembly.

  • Fix for Llama Templating Issue: User @caseus_ highlighted a PR opened by @hamelh to fix the Llama templating issue. The PR #952 addresses errors in the EOS/BOS application amongst other issues.

  • Token Space and Decoding Issues: User @Aleksa Gordić identified an issue regarding unanticipated space additions linked to the Hugging Face tokenizer. It sparked a discussion about token manipulation complexities, with @hamelh suggesting using Hugging Face chat templates as the ā€œsource of truthā€ to resolve the issues.

ā–· #general-help (79 messagesšŸ”„šŸ”„):

  • Discussion on Special Tokens and Configuration: User @noobmaster29 discussed a challenge they were having with their models not generating an end-of-sentence (EOS) token, causing their model to make continuous outputs up to the token limit. They shared various configuration settings they had tried, including different methods for setting the EOS token, and asked other users for advice.

  • Running Qlora with ChatML Template: Regarding the use of Qlora with a ChatML template, @noobmaster29 expressed confusion about when to load the ChatML template during the inference stage. @caseus_ confirmed that the template should be loaded in that format.

  • Configuration Format for Training Models: There were also discussions on what kind of configuration format should be used when training models. @noobmaster29 and @self.1 were not sure about the correct token configurations for running prompts with the ChatML format.

  • Inference Issues: In addition, @noobmaster29 reported that when they used a particular conversational setting during inference, the model didn’t adhere to the ChatML format. @nanobitz suggested checking the full output using the preprocess CLI with --debug.

  • Links of Interest: @noobmaster29 shared a link to a Hugging Face model with a configuration that worked well with ChatML, but upon trying to replicate it, they didn’t achieve the same results.

ā–· #datasets (33 messagesšŸ”„):

  • Dataset sizes and training metrics discussions: A detailed discussion happened between users @natefyi_30842, @noobmaster29 and @nanobitz. They discussed token to TB ratio of models, the duration and hardware required for training models, with references to PHI-2 and OPT models. User @natefyi_30842 clarified misunderstanding about the time and GPUs required for training different models (PHI-2 card, OPT-125m card).
  • Training smaller models: User @noobmaster29 suggested that models in the hundred million token range might be feasible for individual efforts, linking to the OPT model on huggingface.
  • Enquiry about Function Calling Dataset: User @le_mess asked for recommendations on function calling datasets, and shared the Glaive Function Calling v2 dataset and a leaderboard for function calling. @bjoernp showed interest in this as well.

Nous Research AI Discord Summary

  • In-depth discussions and speculation focusing on AI model training strategies, and handling of datasets within the guild. Specifically, extensive conversation centered around potential contamination of the MetaMath dataset, with proposals for an automatic contamination check, despite potential cost concerns. The focus also extended to strategies such as Neural Architecture Search (NAS) over merge parameters, in place of an excessive concentration on model training for benchmarks
  • Chatter around upcoming models, including speculative user anticipation over the capabilities of Llama 3 and Galactica 2.0. A particular focus on coding-specialized models like Dolphin 2.5 Mixtral 8x7b was evident, with acknowledges for its proficiency in coding.
  • Discussion on creating a decoupled and continuously improving world model, as voiced by @maxwellandrews, and a contemplation of its high-level approach.
  • A shared interest in high-performance workstation setups for purposes beyond gaming, such as 3D rendering and machine learning. Specific link shares included a high-performance workstation named ā€œBig Bossā€ and a discussion regarding NixOS and Python packages management with Nix on the NixOS forums.
  • Probing dialogues on AI models and tokenization, including Based model architecture, MorphPiece as a potential BPE replacement, and the Phi-2 performance. A clear disappointment with Phi-2 was expressed for not patching holes left by the previous Phi-1/1.5 models. Links shared involved a tweet on Based, a Paper on MorphPiece, a research paper by OpenAI, and a blog post on FunSearch.
  • Model performance updates from users, with reports on the average score of the Solar Model from @gabriel_syme, who later suggested fine-tuning and slurping the Solar model, and the workings of the RedPajama 3B model albeit at a slower pace from @bevvy.
  • Critical comments on the pricing of GPT4-Turbo, with the general consensus categorizing the costs as overly high even for high-value tasks.
  • Queries and pointers raised in the Ask-about-llms channel relating to Solar 10’s preset, alternatives to MLC with a suggestion for Android/iOS setup, and a reference to LLM Farm.
  • Lastly, a professional tone of discussion was flavored with personal updates from @gezegen on recovering from flu and missing some conversations.

Nous Research AI Channel Summaries

ā–· #ctx-length-research (1 messages):

  • Decoupled and Continuously Improving World Model: User @maxwellandrews is contemplating on a high-level approach to create a decoupled and continuously improving world model.

ā–· #off-topic (27 messagesšŸ”„):

  • Graphics Card Discussion: @everyoneisgross shared some issues with their graphics card, specifically mentioning black elements and checker renders. They have considered the limitations of their Colorful 3060 card.
  • High-performance Workstation: @skadeskoten shared a link to ExtremeTech about a high-performance workstation from Germany named ā€œBig Bossā€, which has 7 RTX 4090 GPUs and a 64-core Threadripper Pro 5995WX. The workstation is designed for 3D rendering and machine learning rather than gaming.
  • Model Finetuning: @euclaise stated their GPT-7b models can be finetuned on a single GeForce 3090 using their custom optimizer. They speculated that the Mixtral model would potentially fit on the ā€œBig Bossā€ workstation setup too.
  • NixOS and Nix Package Management: @euclaise asked if anyone in the channel uses NixOS and manages Python packages with Nix. They also shared a link to a discussion on the NixOS forums.
  • Training Models for Specific Language: @skadeskoten expressed a desire to train language models for Norwegian, specifically medical Norwegian. @euclaise responded that it would take a long time to complete such a task.

ā–· #benchmarks-log (2 messages):

  • Solar Model Performance: @gabriel_syme reported that the Solar model holds an average of 74, though no specification was provided about what this metric measures.
  • Finetuning and Slurping Solar Model: In a subsequent message, @gabriel_syme suggested the need to finetune and slerp the Solar model.
  • Discussion on Phi-2 Performance: User @bevvy and @georgejrjrjr discussed the performance of Phi-2. @georgejrjrjr expressed disappointment that Phi-2 did not patch holes left by the previous Phi-1/1.5 models and speculates it might actually be a case of Phi-CTNL.
  • Based Model Architecture & MorphPiece Tokenization: There was a discussion on new model architectures including Based, with a link shared by @nods to a tweet by Simran Arora detailing its simplicity and efficiency. @georgejrjrjr noted the rapid advent of new model architectures, also bringing up MorphPiece as a potential BPE (Byte Pair Encoding) replacement. He shared a Paper Link on the subject.
  • Fine-tuning Dataset Creation: User @atgctg brought up the topic of creating instruction datasets for fine-tuning as an important parallel task to pre-training data curation.
  • Pricing of GPT4-Turbo: A discussion led by @gabriel_syme and @giftedgummybee critiqued the high pricing of GPT4-Turbo. The prices were deemed improper even for high-value tasks.
  • Weak to Strong Generalization Paper & FunSearch: @giftedgummybee shared a link to a research paper by OpenAI, and @coffeebean6887 shared the GitHub repository with the code. A blog post on FunSearch was shared by @nods, and sparked a discussion among the users regarding its application and novelty.

ā–· #general (139 messagesšŸ”„šŸ”„):

  • Discussion on MetaMath Dataset: There was significant debate regarding the potential contamination of the MetaMath dataset. @bjoernp mentioned the importance of avoiding contaminated data in model training, and @mihai4256 confirmed potential contamination after running a test, scoring 0.96 where anything above 0.85 is deemed as a high likelihood of contamination. There was also discussion on automatically checking for contamination when models are submitted, but potential costs were raised as a concern.

  • AI Model Training: Various users discussed model training strategies, with @nonameusr expressing concern over too much focus being placed on model training for benchmarks. @euclaise suggested employing Neural Architecture Search (NAS) over merge parameters, prompting further discussion on model merging.

  • Upcoming Models: Users such as @Error.PDF and @gabriel_syme speculated about the potential release and capabilities of future models like Llama 3 and Galactica 2.0. However, no specific release dates or features were confirmed.

  • Focus on Coding: Models specialized in coding were discussed by users @nonameusr and @metaldragon01, focusing on the recently released Dolphin 2.5 Mixtral 8x7b model which claims to be very good at coding.

  • Model Execution: Discussion around model execution included mention of QuIP models by .beowulfbr and mentions of how to run inference on these models effectively. @decruz mentioned running Openhermes on a phone.

Relevant resources and links included in discussion:

ā–· #ask-about-llms (6 messages):

  • RedPajama 3B Performance: User @bevvy reported that the RedPajama 3B was working, albeit at a slow pace. The prefill was slow and the system ran at 3 tokens/s on a pixel 7.
  • Question about Solar 10 Preset: User @agcobra1 queried about the preset used by Solar 10. No responses were provided within the given message history.
  • Alternative to MLC: User @gezegen asked if there was a possible alternative to using MLC and suggested setting up on Android/iOS.
  • Reference to LLM Farm: User @orabazes mentioned LLM Farm, though the context wasn’t clear from the given message history.
  • User Illness: User @gezegen noted that they were recovering from the flu and may have missed some previous conversations.

HuggingFace Discord Discord Summary

  • NeurIPS Meetup Outreach & LSTM Discussion: User @.yosun extending an invite for the last-minute attendees to a NeurIPS meetup and an AI3D breakfast. Additionally, @vipitis elucidated on working with LSTM in the conversation and shared link to PyTorch LSTM documentation.

  • Chatbots & VSCode Plugins Talk: A brief discussion on customizing the AI name in LLAMA chatbots carried out by users @criticaldevx, @jeffry4754, @shrap42, @ahmad3794 and @funapple. Tips included fine-tuning and replacing the name post-generation using a script.

  • Deep Dive into RNNs, LSTMs & GRUs: User @nerdimo provided a thorough explanation of recurrent neural networks concepts, alongwith components like LSTMs and GRUs. They also discussed the resource they’re learning from - the Andrew Ng Machine Learning specialization. User .memoshi contributed by sharing a paper on the SwitchHead method on arXiv.

  • Space-Themed Twitter Account & Inclusive AI: @boggdanu_ sharing their twitter account @madpapayas focused on astronomy, and @sk21_ discussing about inclusive AI design stating a post by Dr Maria Panagiotidi.

  • AI Related Projects & Tutorials: Users sharing their AI projects and asking for feedback. @marielandryceo presented methodologies of AI, @rwitz_ introduced the merge of two AI models and @appstormer_25583 unveiled an AI that analyzes logos. A query regarding fine-tuning was resolved.

  • MagVIT2 Presentation & New Reading Material Suggestions: @chad_in_the_house introducing MagVIT2 and sharing associated resources, @memehunter7209 suggesting Mathematical Machine Learning as a potential reading group book and lastly @netskink.1 inviting members to participate in their dataset project.

  • Image-Conditioned Diffusion Model & Other Discussions: @mr.frog2014 sharing insights from their experiments with image conditioned Diffusion Models. @nixon_88316 seeking information about stablecode and @sebys7 asking about parameters tweaks in SD-X4-Upscaler.

  • PyTorch Model Training & Hierarchical Text Classification: User @merve3234 providing instructions on transformer models training with PyTorch and @stroggoz discussing Hierarchical Text Classification using separate models for each level. Also, @ppros666 soliciting advice on fine-tuning an LLM for the first time and highlighting a guide on the same.

HuggingFace Discord Channel Summaries

ā–· #general (82 messagesšŸ”„šŸ”„):

  • NeurIPS Meetup: User @.yosun is calling for last-minute attendees to a NeurIPS meetup and an AI3D breakfast the next day. Shared a link to twitter post.
  • Working with LSTM: User @vipitis explained the functionality of LSTM and how multi-layered LSTM works. Suggested to check out the PyTorch LSTM documentation for understanding its implementation.
  • Customizing AI Name in LLAMA: Users @criticaldevx, @jeffry4754, @shrap42, @ahmad3794 had a conversation on how to change the AI name in LLAMA chatbots. Suggestions include fine-tuning and replacing the name post-generation using a script.
  • Training Trouble: @vishyouluck raised an issue with fine-tuning a model using autotrain_advanced, suspecting a problem with the train.csv file.
  • VSCode Plugins: @funapple asked for recommendations on VSCode plugins that work well with local Language Models (LLM) for real-time code suggestions.

ā–· #today-im-learning (6 messages):

  • Understanding Recurrent Neural Networks: User @nerdimo delved into the concepts of recurrent neural networks (RNNs), including Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). They explained that LSTMs and GRUs employ gates to update their internal memory and process relevant information, which aids in making accurate predictions and avoiding the vanishing gradient problem.
  • In additional discussions, @nerdimo mentioned the concept of bidirectional RNNs, where information flow can proceed from left to right and vice versa. Moreover, they discussed the power (and computational cost) of stacked RNNs that form a deep RNN.
  • User @merve3234 questioned if there is a significant difference in performance between GRUs and LSTMs, given that GRUs are generally more efficient to train. They also asked about the resources @nerdimo is using for learning.
  • In response, @nerdimo indicated that they are learning from the Andrew Ng Machine Learning specialization. They also expressed their intuition that LSTM would be the best choice due to its increased filtering and parameters.
  • SwitchHead Method for Transformers: User .memoshi shared a link to an arXiv paper about the SwitchHead method. It claims to reduce both compute and memory requirements of the self-attention layers in Transformers, achieving practical speedups without sacrificing language modeling performance.

ā–· #cool-finds (2 messages):

  • Astronomy Twitter Account: @boggdanu_ shared a link to his Twitter account @madpapayas which is focused on astronomy.
  • Creating Inclusive AI: @sk21_ brought to attention a Substack post about creating inclusive AI. Subtitled ā€œA strengths-based approach to Artificial Intelligenceā€, the post is by Dr Maria Panagiotidi and it discusses the issues of inclusivity in AI design. The link is as follows: Creating Inclusive AI.

ā–· #i-made-this (5 messages):

  • Navigating the Depths: The Critical Thinking ToT and Scientific Method CoT in AI: User @marielandryceo shared an extensive overview of two essential methodologies guiding the world of Artificial Intelligence, namely the Critical Thinking Tree of Thoughts (ToT) and the Scientific Method Chain of Thoughts (CoT). The post detailed the step-by-step process of how these methodologies function in AI, emphasizing the significance of persistently refining our understanding. The discussion hashtagged #CriticalThinkingToT, #ScientificMethodCoT, and #AIExploration.

  • Merge of AI Models: @rwitz_ shared a link to a merge of two AI models - viethq188/LeoScorpius-7B-Chat-DPO and GreenNode/GreenNodeLM-7B-v1olet - using slerp as a merge method. He detailed the slice sources and parameters involved in the merge process. The base model for the merge is viethq188/LeoScorpius-7B-Chat-DPO.

  • Brand Logo Analyzer GPT: User @appstormer_25583 introduced an AI, which gives design feedback and improvement tips for logos based on the uploaded image. A link to their project was provided.

  • @merve3234 asked @rwitz_ if he had submitted his merged model for the leaderboard, to which he affirmed with a ā€œyesā€.

ā–· #reading-group (15 messagesšŸ”„):

  • Presentation of MagVIT2: User @chad_in_the_house initiated a presentation on MagVIT2, a model introduced in the paper ā€œMagVIT2: Language Model Beats Diffusion: Tokenizer is key to visual generationā€ by Carnegie Mellon and Google. The model can generate images as tokens/words and is seen to beat diffusion models. For more in-depth discussion, link to the blog was shared and code available here.
  • Suggestion for Reading Group Material: @memehunter7209 suggested studying the mml-book in the reading group for a better understanding of the math stuff required for machine learning.
  • Discussion about mml-book: @Pantera and @charlieparker8035 sought clarification about the level and content of the book respectively, and @memehunter7209 advised that it’s more of a review book and referred them to Gilbert strang lectures and course on EDx.
  • Project Invitation: @netskink.1 invites members to participate in their project working on a dataset of images and weather conditions aimed at detecting icy bridges.
  • Suggestion on Unresolved Math Problem Solve: @caleb_sol proposed discussing the paper about AI’s capacity to solve previously unsolved math problems, indicating it could be a good topic for the reading group.

ā–· #diffusion-discussions (3 messages):

  • Image Conditioned Diffusion Model: User @mr.frog2014 shared their experiment with image conditioned Diffusion model, suggesting the idea of injecting noise directly to x and subtracting after denoise model. They also raised a question about the potential benefit of incorporating an attention module.
  • Query about StableCode: User @nixon_88316 posed a query seeking information about stablecode.
  • Parameter Query for SD-X4-Upscaler: User @sebys7 is using the SD-X4-Upscaler with diffusers, and asked about tweaks to the parameters that might yield a specific type of result.

ā–· #NLP (6 messages):

  • Model Training Tips: User @merve3234 gave detailed instructions on using transformer models with PyTorch and decoder input tokens. They suggested manually feeding decoder_input_ids during the forward phase of the training loop.
  • Hierarchical Text Classification Discussion: @stroggoz proposed an approach to classifying academic texts using a network of smaller Bert models. The discussion raised the idea of classifying texts by primary topic (e.g., math, chemistry), then further classifying by subtopic, using separate models for each level.
  • LLM Context Length Clarification: User @ppros666 asked for clarification on the context length of Llama 2, specifically asking if the context length is 4000 or 4096. @Cubie | Tom clarified that the context length is 4096. Relevant information was linked to from the HuggingFace model’s config.json file.
  • Request for LLM Fine-Tuning Guidance: @ppros666 expressed an interest in fine-tuning an LLM for the first time and requested an up-to-date tutorial or example code. They highlighted a guide titled ā€œLlama-2 4bit fine-tune with dolly-15k on Colabā€, seeking advice on its reliability.

ā–· #diffusion-discussions (3 messages):

  • Image Conditioned Diffusion Model: User @mr.frog2014 discussed about a potential alteration to the standard image-conditioned diffusion model, presenting a method where the condition signal is added directly to the injected noise and subtracted subsequently after the denoise model. They also asked whether implementing an attention module to inject the condition would yield better results.
  • Inquiry about Stablecode: @nixon_88316 posted asking if anyone has knowledge regarding stablecode. However, no responses or follow-ups were captured in the supplied message history.
  • SD-X4-Upscaler usage: User @sebys7 asked if there are specific parameters using the sd-x4-upscaler with diffusers that could cause certain outcome in the resulting image. However, no particular image or result was provided for reference within the given context.

LangChain AI Discord Summary

  • Detailed discussions on LangChain integration and usage: official documentation, YouTube tutorials, creation of Language Learning Models (LLMs), output streaming, and implementation of metadata were extensively discussed.
  • Queries raised on the progress of Plan-and-Execute, difficulties in copying text from LangServe’s ā€˜DOCUMENTS’ section, and accessing LangServe endpoints and fields led to collective troubleshooting and information exchange.
  • Contributions shared included @andysingal’s Medium blog post on LangChain Expression Language (LCEL), @appstormer_25583’s GPT-based brand logo analyzer, @pagerize_admin’s Pagerize video summarizer, @gokusan’s TinyLLM library for production applications, and @joshuasundance’s pre-commit hook to detect API keys.
  • Guild members participated in a LLM user survey hosted by Catena Labs to provide insights on market preferences, accessible here.

LangChain AI Channel Summaries

ā–· #general (48 messagesšŸ”„):

  • Integration and Learning Resources for LangChain: @chasemcdo suggested utilising resources from the official LangChain site for documentation and its YouTube channel for further information and tutorials.
  • Plan-and-Execute Progress Inquiry: @casper_mars raised a question about the progress on Plan-and-Execute, to which @manlylubbin replied they are still planning.
  • Implementation of Metadata on LLM: @reddiamond69 highlighted that on generating output, LangChain allows for printing source documents along with their metadata, which can be implemented on the users’ application.
  • Difficulties with Streaming Output Through API: @menny9762 shared struggles streaming the output through Next.js. A comprehensive discussion entailed, involving @seththunder and others, on creation and use of Language Learning Models (LLMs), streaming and callback methods.
  • LLM Survey from Catena Labs: A community survey was promoted by @jay_wooow to gather data on LLM usage and preferences to inform product development and provide useful market data. The survey is hosted here.

ā–· #langserve (3 messages):

  • Copying Text from Output Section: @fritz4374 reported difficulties in copying text from the ā€˜DOCUMENTS’ section of a run. The user can copy from the input and output sections but not from the ā€˜DOCUMENTS’ section.
  • LangServe Endpoint & Accessing Fields: @khophi.co is seeking assistance with accessing the <payload>.history field in the langchain context when a frontend request, like { 'mymessage': 'message' }, is sent to the path="/myendpoint" in LangServe. The user is curious about how to retrieve other fields beyond the ā€˜input’ field, which LangServe does automatically.

ā–· #share-your-work (7 messages):

  • LangChain Expression Language (LCEL) Instructions: @andysingal shared a Medium blog post on mastering the LangChain Expression Language (LCEL), providing examples of its application in scraping Wikipedia.
  • GPT-based Brand Logo Analyzer: @appstormer_25583 showcased a brand logo analyzer GPT that offers design feedback and suggestions for improvement based on an uploaded logo image.
  • Pagerize - AI Video Summarizer: @pagerize_admin presented Pagerize, an AI summarizer for YouTube videos. Included was an example summary of a Theory of Mind LangChain and Plastic Labs webinar, viewable here.
  • TinyLLM - LLM Library for Production Applications: @gokusan developed TinyLLM, a library for running LLM applications at scale. An example of creating and evaluating an agent can be viewed here.
  • Pre-Commit Hook to Detect LLM API Keys: @joshuasundance created a pre-commit hook to prevent developers from putting their API keys into source control.

ā–· #tutorials (1 messages):

potatooff: https://www.youtube.com/watch?v=mrjq3lFz23s


DiscoResearch Discord Summary

  • LLM-Assisted Scoring Evaluation Experiment: User @_jp1_ initiated a discussion regarding an early-stage AI startup’s LLM-assisted batch evaluation tool for highlight extraction. This sparked debates on reducing score and using special rating tokens for accuracy and discrimination ā€œLink to the original postā€. The conversation later branched out to potential collaboration and understanding the evaluation method featuring @bjoernp and @_jp1_.
  • Mixtral Implementation: The conversation in mixtral_implementation channel highlighted different elements of mixtral implementation including the best practices, access to high-end hardware, and expert selection strategies along with benchmark results.
    • The main points here include a custom build of llama.cpp shared by @kalomaze and a tweet shared by @someone13574 suggesting that using 3 experts yields the best perplexity results.
  • OpenAI’s Alignment Plan and Phi-2 Discussion: OpenAI’s new paper’s alignment plan and effectiveness of Phi-2 were the topics of discussion here. Also, there was a suggestion by @flozi00 to implement scoring models to enhance data quality in Disco Research.
  • Progress on Simplified Language Understanding and Modeling (Llama) Integration and FastEval Evaluation: Conversation revolved around llama.cpp integration into projects, the completion of a substantial part of FastEval evaluation and problems related to llama-cpp-python. More specifically, new developments were noted such as the FastEval project fork on GitHub by Disco Research. Suggestions were provided by @bjoernp to @rtyax for debugging simpler models and using a single thread for the local model.

DiscoResearch Channel Summaries

ā–· #disco_judge (11 messagesšŸ”„):

  • LLM-Assisted Scoring Evaluation Experiment: User @_jp1_ shared a post from Reddit about an early-stage AI startup’s results using their LLM-assisted batch evaluation tool. The tool is used for highlight extraction from news articles and the MMLU benchmark using a scoring model. They referenced three 1, 2, 3 different research papers as the basis of their evaluation method. This led to a discussion about whether reducing score categories and using special rating tokens could influence accuracy and the ability to discriminate. Link to the original post

  • Potential Collaboration: User @bjoernp expressed interest in the evaluation approach and suggested collaboration with the startup working on similar projects.

  • Evaluation Method Understanding: A discussion between @bjoernp and @_jp1_ ensued concerning the circular nature of using benchmarks for evaluation models. They talked about how a good benchmark could negate the need for an eval model and how the use of evaluation models is often resorted to when there is no easy way to benchmark or it would be too much effort to create benchmarks for everything to measure.

  • Confusion on Benchmarking and Evaluation Models: @_jp1_ expressed confusion over the evaluation method and questioned if measuring MMLU of the eval model is similar to direct model evaluation. @bjoernp acknowledged the similarity but noted a slight difference, stating that rating a response is an easier task.

  • Other potential benchmark: In their conclusion, @bjoernp suggested that a held-out test set could potentially be a good benchmark for the rating model.

ā–· #mixtral_implementation (29 messagesšŸ”„):

  • Mixtral Implementation Best Practices: @nul3 asked which is better between the instruct and non-instruct versions for chat. @goldkoron recommended the instruct version.
  • Custom MoE Routing: @kalomaze shared a custom build of llama.cpp which is a modification to the Mixtral PR that lets users customize the amount of experts that are routed per token.
  • Access to High-End Hardware: @tarikoctapm offered access to a machine with 4 x RTX 4090, asking in return for the sharing of results from models.
  • Expert Selection Strategy Discussion: @chrismcmaster and @kalomaze exchanged ideas on various expert selection strategies. They discussed hardcoding a set number of experts, ā€˜min_p’ strategy, and top-k experts. @kalomaze elaborated on ā€˜min_p’, which acts as a minimum probability threshold based on the maximum probability.
  • Benchmark Results: @bjoernp shared preliminary benchmarks which showed sub-optimal performance when hardcoding either 1 or 4 top-k experts. @someone13574 shared a tweet suggesting that using 3 experts yields the best perplexity results. These results were critiqued by @kenjiqq who observed inconsistencies in the Q6 quant.

ā–· #general (5 messages):

  • OpenAI’s Alignment Plan: @_jp1_ brought attention to the interesting content in OpenAI’s new paper regarding their alignment plan. No link was provided.
  • Discussion on Phi-2: @bjoernp and @flozi00 engaged in conversation about Phi-2, questioning its effectiveness and the legitimacy of its benchmarks. While @bjoernp showed skepticism, @flozi00 shared some optimistic observations from Twitter examples, especially for smaller scale uses cases on edge.
  • Data Quality Scoring Models in Disco Research: @flozi00 suggested the implementation of scoring models to enhance data quality in Disco Research. The aim is to not only deduplicate datasets, but also cleanse them of low-quality data like raw lists from Wikipedia or translation errors.

ā–· #benchmark_dev (13 messagesšŸ”„):

  • Collaboration on Simplified Language Understanding and Modeling (Llama) Integration: @bjoernp requested @rtyax for collaboration on incorporating llama.cpp into their project and asked for clarity on the problems related to tokenizer that @rtyax had mentioned earlier.
  • FastEval Evaluation Completion and Future Steps: @bjoernp outlined the next steps including adding the capacity for n repetitions of benchmarks and calculating the mean and standard deviation, thereby completing a substantial part of the evaluations. He also pointed out the continued need for the integration with llama.cpp to examine the effects of min_p and enabling grammar-based inference to fix the output format.
  • FastEval Repository on DiscoResearch GitHub: @bjoernp created a fork of FastEval with the DiscoResearch GitHub account for coordination, directing users and @rtyax to submit their changes there. The repository is found at https://github.com/DiscoResearch/FastEval.
  • Issues with Llama-cpp-python: @rtyax reported that llama-cpp-python was failing silently when running generate/chat_completion. They noted that the difficulty may not be with the tokenizer, but could not identify the specific problem. They had recently rebuilt llama-cpp-python for mixtral, and were uncertain if the problem was local or more widespread.
  • Debugging Suggestions: @bjoernp suggested debugging simpler models first, like mistral-7b, and changing fasteval evaluation constants to use a single thread for the local model. @rtyax said that only one GPU was being used in his box, but the generation was continuing as if it was never called due to a try/catch block.

Latent Space Discord Summary

  • Announcement of Anthropic’s new Google Sheets tool, Claude for Sheetsā„¢, praised by @swyxio as ā€œspreadsheets are the best prompt engineering toolā€. Claude for Sheetsā„¢
  • In-depth coverage of Mamba model provided through a YouTube video shared by @swyxio.
  • Introduction to the Mistral-kit project which utilizes mistral-7b and ollama; repository shared by @kevmodrome on GitHub. Github link
  • Discussion around rumored release of GPT 4.5 initiated by @swyxio with a linked tweet.
  • Access to Mistral API provided to @fanahova and @coffeebean6887; latter also noted the addition of Mistral as an endpoint on Anyscale. Anyscale URL
  • Announcement of the new podcast release by @fanahova on Twitter and Hacker News. (Twitter Link)
  • Mention of Qtransformers by @swyxio in the #llm-paper-club channel, without further elaboration or context given.

Latent Space Channel Summaries

ā–· #ai-general-chat (9 messagesšŸ”„):

  • Anthropic’s New Google Sheets Tool: @swyxio mentioned a new product called Claude for Sheetsā„¢, which can bring the helpful AI assistant Claude from Anthropic to Google Sheetsā„¢. @swyxio stated that ā€œspreadsheets are the best prompt engineering toolā€.
  • Discussion on Mamba Model: @swyxio shared a YouTube video that provides a thorough explanation of the Mamba model.
  • Mistral-kit Project: @kevmodrome shared a Github link to the mistral-kit project, which uses mistral-7b and ollama.
  • Rumors About GPT 4.5: @swyxio mentioned some rumors about GPT 4.5 coming soon.
  • Access to Mistral API: Both @fanahova and @coffeebean6887 indicated that they received access to the Mistral API. @coffeebean6887 also pointed out that Anyscale added Mistral as an endpoint and posted the URL for it.

ā–· #ai-event-announcements (1 messages):

fanahova: New pod is out! https://twitter.com/FanaHOVA/status/1735371425836568905

Also live on HN

ā–· #llm-paper-club (1 messages):

swyxio: Qtransformers


LLM Perf Enthusiasts AI Discord Summary

  • In the general channel, two key discussions were highlighted:
    • Issue with Model 3.5: @0xmmo reported unusual behavior from AI Model 3.5 involving an excessive production of new lines in responses.
    • The workload of ChatGPT during Finals Week was humorously speculated by @res6969, who suggested it could be overloaded due to increased usage by students.
  • The finetuning channel saw @robertchung offer advice for a text extraction task, recommending an initial 30-50 samples for fine-tuning, with the potential to add more examples if results are unsatisfactory.
  • In the opensource channel, @robhaisfield discussed a strategy to overcome rate limits by rotating among various fine-tuning providers but mentioned the challenge of accounting for the subtle behavioral differences typical of each provider’s unique fine-tunes.
  • Useful resources were shared in the resources channel, such as an insightful Twitter link posted by @nosa_.
  • The openai channel showcased two interesting references:
    • @pantsforbirds pointed out an OpenAI paper on supervising larger models with smaller ones.
    • @firefox8975 highlighted Google AI’s competitive capability in supporting function calls, referencing the official document.

LLM Perf Enthusiasts AI Channel Summaries

ā–· #general (3 messages):

  • Possible Issue with Model 3.5: @0xmmo reported that they’ve been getting strange responses from AI model 3.5, which has started to respond to their function calls with an excessive number of new lines.
  • AI Workload during Finals Week: In a lighter note, @res6969 humorously speculated that ChatGPT might be overloaded due to ā€œfinals week,ā€ a reference to a period of intense academic work. This was backed up by @potrock, who mentioned that many people in a graduate-level Natural Language Processing (NLP) course are utilizing GPT-generated synthetic data for their final projects.

ā–· #finetuning (2 messages):

  • Text Extraction and Fine-Tuning: @robertchung suggested for a text extraction task, 30-50 samples may suffice for initial fine-tuning. They also suggested that if the results are not satisfactory, one can add more examples and proceed to fine-tune the already fine-tuned model.

ā–· #opensource (1 messages):

  • Discussion on Rotating Among Providers: @robhaisfield shared his thoughts on a strategy to power through rate limits by programmatically rotating among various fine-tune providers. He, however, expressed concerns that the subtle differences in behavior produced by each provider’s unique fine-tunes might make it tricky to treat all providers as a commodity.

ā–· #resources (2 messages):

  • Sharing Useful Resources: User @nosa_ shared a useful Twitter link with potentially valuable information for the community. User @ayenem responded positively to the shared link.

ā–· #openai (2 messages):

  • Supervising Larger Models with Small Models: User @pantsforbirds shared an OpenAI paper about using small models (GPT-2) to supervise larger models (GPT-4).
  • Google AI’s Function Calling: @firefox8975 mentioned that Google AI supports function calling, which they found to be competitive with OpenAI during their exploration of it. They also provided a reference to Google AI’s official document.

Alignment Lab AI Discord Summary

  • [oo] channel featured updates on Depth Upscaling utilized by Upstage for the franken llama-mistral, boasting 10.7B parameters with the model SOLAR-10.7B-Instruct-v1.0. ā€œUpstage claims to have surpassed Mixtral with their approach.ā€
  • New Open Source AI Grants were discussed in the same channel, with users congratulated on their grants via an announcement from a16z.
  • [oo-priority] held details about a specific event from @entropi, although the details of the event are not provided.
  • In [phi-tuning], @entropi shared an update on the Phi-2 transformer model with 2.7 billion parameters and a link to more information. The model’s data was the same as Phi-1.5 but with additional new sources encompassing NLP synthetic texts and safety and educationally-filtered websites. Updates on the model’s weights were also mentioned.

Alignment Lab AI Channel Summaries

ā–· #oo (2 messages):

  • Depth Upscaling on Franken Llama-Mistral: User @entropi shared a link to an article about Upstage’s use of ā€œDepth Upscalingā€ on a franken llama-mistral to get to 10.7B params with continued pretraining. Upstage claims to have surpassed Mixtral with their approach. The end result is their model SOLAR-10.7B-Instruct-v1.0.
  • New Open Source AI Grants: @entropi also congratulated users <@257999024458563585> and <@503006108928180245> for their new AI grants, sharing the announcement from a16z.

ā–· #oo-priority (1 messages):

entropi: @here https://discord.com/events/

ā–· #phi-tuning (2 messages):

  • Phi-2 Model Summary: @entropi shared a link about Phi-2, a Transformer with 2.7 billion parameters. The data used for its training were the same as Phi-1.5. Additionally, the Phi-2 model was also trained with a new data source consisting of various NLP synthetic texts and filtered websites for safety and educational value.
  • Phi-2 Weights Update: @entropi reported that the weights for Phi-2 were updated recently.

YAIG (a16z Infra) Discord Summary

Only 1 channel had activity, so no need to summarize…

  • AI Economics Discussion: @stevekamman shared insights on the economics of AI, specifically focusing on the comparison between training and inference cost and revenues. He said: ā€œInference must pay the bills - GPU, S&M, overheadsā€¦ā€ and pointed out that the revenue generated by inference must be noticeably higher than the cost associated with training to maintain profitability in the long term.
  • Consideration on Training Costs: @stevekamman noted that the cost incurred in training can be offset by selling the foundational models but he also pointed out that it’s inevitable for the buyers of these models to fund the purchase either through revenue or efficiency gains.
  • Future of FM Companies: @spillai speculated on whether foundation model (FM) companies need to develop their own clouds to continue capturing enough long-term value from their customers and therefore attain profitability. They drew attention to the potential need for vertical AI infrastructure providers for large language models (LLMs).
  • Utilization of GPU by FM companies: @stevekamman questioned the utilization efficiencies of GPUs by FM companies, given their binary nature (100% or 0% utilization). This, considering how potential scale-based utilization efficiencies might be limited.

Skunkworks AI Discord Summary

  • Open-source survey initiated by @jay_wooow to understand motivations and challenges in building with LLMs (Language Learner Models). The survey data will be used internally and to assist the wider AI community in development tool creation.
  • Raw survey data and key insights will be open-sourced in a report and published online upon reaching the target participant number.
  • Link to the survey was shared. Participation keeps developer identities anonymous.
  • Uncontextualized link to a YouTube video was posted by pradeep1148 under off-topic. Further discussion or context was not provided, thus relevance remains unclear.

Skunkworks AI Channel Summaries

ā–· #general (1 messages):

  • LLM Tools Usage Survey: @jay_wooow has initiated an open-source survey aimed at understanding the motivations and challenges involved in building with LLMs (Language Learner Models). The data gathered through this survey will be used for internal productivity enhancement and to assist the wider community with market data to optimize the creation of development tools. The survey should take approximately 10 minutes to complete.
  • Results Publication: Once the target number of participants has been reached, all raw data collected as well as the key insights will be open-sourced in a report and published on a blog platform. This can be a valuable resource for AI developers to understand goals, challenges, and tool usage in the community.
  • Participation Link: The link to participate in this survey was shared. All participating developers’ identities will remain anonymous.

ā–· #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=MOimHasrCKk


MLOps @Chipro Discord Summary

Only 1 channel had activity, so no need to summarize…

  • Mastering MLOps and ML Engineering: Key Strategies for 2024:
    • @amitqwak announced a live session titled ā€œMastering MLOps and ML Engineering: Key Strategies for 2024ā€ scheduled for January 17th 2024 at 11:30 AM EST. The session aims to provide organizations with advanced insights and strategies for effectively incorporating and managing AI and ML in their business frameworks, with a focus on MLOps and ML engineering trends. The event, primarily for ML Engineers, Data Scientists, Data leaders, and Software engineering managers is free of charge. The registration link is here.
  • Arize Holiday Special:
    • @sarahwelsh announced the Arize Holiday Special scheduled for December 15, 2023. The event comprises a series of live, virtual hands-on workshop sessions focused on prompt engineering, search and retrieval workflows, and LLM system evaluations. Speakers from Hugging Face, PromptLayer, Shopify, and Arize are participating. The registration link is here.

AI Engineer Foundation Discord Summary

Only 1 channel had activity, so no need to summarize…

  • Upcoming Event Notification: ._z posted a notification for an upcoming event with the associated Discord link provided.

The Ontocord (MDEL discord) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Perplexity AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.