Coqui, one of the leading open source text to speech options surviving from the Mozilla ML group, shut down today. The announcement tweet is beautiful and heartfelt.
Table of Contents
[TOC]
HuggingFace Discord Discord Summary
- âFast as lightningâ sdxl questions its own speed: As pointed out by User
@aifartist
, the performance claims of sdxl being3X faster
are dependent on specific techniques like use oftorch.compile
and removal offp16
andattention
, casting doubt on the role of diffusers 0.25 features in this performance improvement. - âSharing is Caringâ extends to HuggingFace user tokens as well: According to
@osanseviero
, a HuggingFace user token can indeed be utilized on multiple running machines, although using distinct tokens is suggested for safer operations. - Learning Loss Minimization (LLM) Leaderboards play hide and seek:
@lee0099
âs initial query about LLM leaderboard non-functioning was mooted as the leaderboard was later found to be working fine. - Creating Transformers from square one:
@torres8552
shared a Kaggle notebook providing a deep dive on constructing Transformer architecture for language-translation tasks from scratch using PyTorch. - Shoes, Sandals, and Boots strut on the Image Dataset ramp:
@andysingal
introduced an image dataset containing 15k images of shoes, sandals, and boots, promoting its use for multiclass classification with deep neural networks. - Web-Crawling mysteries of Common Crawl revealed:
@exponentialxp
âs curiosity about the working of Common Crawl was satisfied by@cakiki
âs explanation of the process involving powerful computers, a URL list and a âspiderâ software for web crawling and indexing. An invitation for further exploration of the Common Crawl was extended via a link to the Common Crawl codebase on GitHub.
HuggingFace Discord Channel Summaries
â· #general (85 messagesđ„đ„):
- Speedy sdxl under scrutiny: User
@aifartist
expressed skepticism regarding some performance claims related to sdxl, such as being3X faster
. They noted that these claims appeared to depend heavily on methods not specific to diffusers 0.25, such as usingtorch.compile
and removingfp16
andattention
. They requested clarification on which features specific to diffusers 0.25 actually improved performance. - HuggingFace user token on multiple machines:
@dizzyme
inquired whether a HuggingFace user token can be employed on two or more running machines.@osanseviero
confirmed that it could, but suggested that using distinct tokens might generally be safer. - Python command issue on Arch Linux: User
@gez_gin
encountered an issue on Arch Linux where the terminal reportedfrom
as an unknown command.@cakiki
pointed out thatfrom
is a Python keyword, and suggested to@gez_gin
that they run Python first to obtain a Python REPL. - Learning Loss Minimization (LLM) Leaderboards Glitch:
@lee0099
queried about issues with the LLM leaderboards stating that it was not functioning. Later, they updated that the problem seemed to have been resolved. - Confusion over MoE Frankenmodels:
@kquant
sought assistance with their entries to the open LLM leaderboard. Theyâd submitted two entries â one mistakenly labeled as an adapter â and requested help from admins to remove the incorrect entries and keep only the correct, âoriginalâ entry. They had not slept for several days and apologized for any inconvenience their errors mightâve caused.
Links mentioned:
- Diffusers Gallery - a Hugging Face Space by huggingface-projects
- solarc-moe-10.7bx4.Q6_K.gguf · TheBloke/SOLARC-MOE-10.7Bx4-GGUF at main
- Kquant03/CognitiveFusion-4x7B-bf16-MoE · Hugging Face
- Open LLM Leaderboard - a Hugging Face Space by HuggingFaceH4
- Meet: Real-time meetings by Google. Using your browser, âŠ
â· #today-im-learning (8 messagesđ„):
- @neuralink progresses on end-to-end FP8 training: Stated they implemented 19% of end-to-end FP8 training, indicative of some interesting progress in their work with 3D parallelism.
- @duplaja discovers SpeechT5 nuances while optimizing: Shared updates on their work with SpeechT5, focusing on creating a custom handler and troubleshooting issues with numerals and larger strings pagination. They found using multiple instances on the lower AWS GPU T4 more cost-effective and shared their working handler.py here.
- @farlin9000 relearns ML basics via Luis Serrano: Shared a YouTube video by Luis Serrano on Deep Learning with Neural Networks as they review ML basics. Farlin9000 initially encountered confusion on activation functions and probabilities but later understood the accounts of truth classification.
Links mentioned:
A friendly introduction to Deep Learning and Neural Networks: A friendly introduction to neural networks and deeâŠ
â· #i-made-this (23 messagesđ„):
- Exploring Transformers from Scratch: User
@torres8552
shared a Kaggle notebook on exploring and building the Transformer architecture for language-translation tasks from scratch using PyTorch, trained on the OpusBook dataset. - Shoe vs Sandal vs Boot Image Dataset:
@andysingal
introduced a new image dataset which contains 15,000 images of shoes, sandals and boots. Ideal for performing multiclass classification with deep neural networks like CNNs. - Illustration with resnet-50 on Shoe vs Sandal vs Boot image Dataset:
@andysingal
presented a notebook using resnet-50 on Shoe vs Sandal vs Boot image dataset. - Introduction of Augmentoolkit:
@heralax
created Augmentoolkit, a fully-local dataset generation tool powered by LLM. It turns plaintext into multi-turn conversations that can finetune instruct-tuned models. - Using Augmentoolkit on Different Datasets:
@andysingal
expressed interest in applying Augmentoolkit on an Instruction-based dataset like the one at Kaggle.@heralax
explained that it could be done by modifying a couple of cells in the notebook, but the code would differ based on the dataset structure.
Links mentioned:
- Transformer From Scratch With PyTorchđ„: Explore and run machine learning code with Kaggle âŠ
- Question-Answer Dataset: Can you use NLP to answer these questions?
- llama_index/examples/paul_graham_essay/data/paul_graham_essay.txt at main · run-llama/llama_index: LlamaIndex (formerly GPT Index) is a data frameworâŠ
- Andyrasika/ShoeSandalBootimages · Datasets at Hugging Face
- PyTorch-ML/notebooks/resnet-50.ipynb at main · andysingal/PyTorch-ML: Contribute to andysingal/PyTorch-ML development byâŠ
â· #reading-group (6 messages):
- Live Participation vs Async Discussions:
@swyxio
inquired about the format of discussions, needing a heads up for live events.@lunarflu
clarified that the discussions are usually async and text-only, due to the global nature of the community. - Blogpost Discussion Suggestion:
@lunarflu
suggested having discussions under each blogpost, similar to the format for papers, acknowledging that this feature is currently unavailable. - Weekly Event for Paper Discussion: Following the discussion format query,
@lunarflu
proposed creating a weekly event for discussing papers, including start times and range. - Call-to-action for Personal Presentations:
@lunarflu
encouraged members to put together presentations for discussions, offering to create server-wide events once a date is provided. - Presentation Schedule Confirmation: In response to
@lunarflu
âs call,@dhruvdh
committed to preparing a presentation by Friday.
â· #computer-vision (5 messages):
- Concerns About Opening Images in Datasets:
@xcykim_56659
asked about how to open the image content in datasets and get the image data from an ImageFolder PIL object for a pretrained CVT model. Later,@xcykim_56659
resolved their own inquiry and reported success. - FPS Computation Query in Object Detection Leaderboard:
@anasuna
expressed doubts about the frames per second (fps) computation on the Object Detection Leaderboard, indicating the numbers seemed too low. - Training CV Model on Continuous Values:
@tony_assi
expressed interest in resources for training a Computer Vision (CV) model using images paired with continuous numerical values, rather than discrete labels.
Links mentioned:
Open Object Detection Leaderboard - a Hugging Face Space by hf-vision
â· #NLP (4 messages):
- Common Crawlâs web indexing explained:
@exponentialxp
asked how web data is collected for Common Crawl, and@cakiki
explained that the process involves powerful computers, a list of URLs, and software referred to as a âspiderâ for the crawling and indexing of these sites, similar to functions performed by search engines like Google and Bing. - Invitation to explore Common Crawlâs codebase:
@cakiki
provided a link to the Common Crawl codebase on GitHub for@exponentialxp
to explore if theyâre interested.
Links mentioned:
Common Crawl Foundation: Common Crawl provides an archive of webpages goingâŠ
Mistral Discord Summary
- Prompts Tango with Mistral-7B:
@cognitivetech
ponders over two ways of system prompts using Mistral-7b, with speed and quality consistency as looming challenges. - Deciphering Oobaâs Enigma:
@cognitivetech
shared a template from Ooba but found it confusing. - Taking the AI Lab Home:
@quantumpioneer.
queried hardware prerequisites for a local AI lab setup for experiments. - Hit or ReTrain:
@maxdipper
probed ways to lean on a previously trained uncensored model for additional training as a cost-effective alternative to retraining from scratch. - Data Mining with Mixtral/Mistral:
@unknownperson2156
sought feedback on user experiences using Mixtral or Mistral to extract predefined question data using LLMs. - Big Dreams with Mistral 8x7B:
@mysterious2078
was on the hunt for documents or papers about the Mistral 8x7B model. - Unshackling the Local Runway:
@michaelwechner
shared success running Mistral 7B locally on Mac M1 and in the cloud using Ollama and Scaleway. - Tackling Virtual Limitations:
@Idellarus
detailed his struggle to run a model on a restricted virtual desktop environment, as confirmed practical by@duck
. - vLLM vs TGI, A Mixtral Story:
@andersruge
inquired about the ramifications of vLLM and TGI on performance metrics, answered succinctly by@casper_ai
. - Nano-Chatbots for All:
@daain
gave a rundown of options for deploying real-time chatbots on limited resources, including APIs and smaller models like Phi-2 or TinyLlama-1.1B-Chat-v1.0. - GPU Hunting Season:
@comcyber_12802
asked for GPU specs for finetuning Mistral 7B and got RTX 3090 recommended by@le_mess
with a training time approximation of 1 hour. - Mistral, The Open Source Mystery:
@darshansharma_
clarified that Mistral is indeed open source, with@refik0727
validating the fact. - AGIâs Imminent Arrival?:
@poltronsuperstar
sparked the challenge by predicting the advent of AGI in weeks to month with the observe-built-nurture system marking the era of no-code AI but clarified the âabsolute geniusâ of the eventual model. - The Quest to Define AGI: User
@.tanuj.
invited the community to share their interpretations of Artificial General Intelligence (AGI); indeed a challenge worth undertaking.
Mistral Channel Summaries
â· #general (61 messagesđ„đ„):
- Exploring System Prompts with Mistral-7b:
@cognitivetech
sought advice on system prompts using Mistral-7b, experimenting between two formats with varying success#1
and#2
. Speed and quality consistency seemed to be issues when modifying the prompts. - Template from Ooba for Implementing Prompts:
@cognitivetech
shared Oobaâs template for implementing prompts, though found it confusing#1
. - Hardware for Local AI Experiments:
@quantumpioneer.
inquired about hardware specifications and power requirements for a PC setup, intended for running local AI experiments#1
. - Additional Training after Uncensored Model:
@maxdipper
asked if there would be cheaper way to add additional content training on top of an uncensored model, comparing it to training an uncensored model from scratch#1
. - Lead Collection with Mixtral or Mistral:
@unknownperson2156
asked for user experiences using Mixtral or Mistral, specifically for data or information collection, i.e., predefined question data as a conversation with Long Language Models (LLM)#1
.
Links mentioned:
- mistralai (Mistral AI_)
- app.py · openskyml/mixtral-46.7b-chat at main
- âRiff Runner: Heavy Metal: âUnleash the power of heavy metal in Riff Runner, âŠ
- Riff Runner Metal (Pre-Release - Apps on Google Play
â· #models (2 messages):
- Interest in Edge Computing:
@kagevazquez
expressed enthusiasm towards edge computing, stating, âNope but edge computing sounds awesomeâ. - Inquiry about Mistral 8x7B Documentation:
@mysterious2078
sought for any available documents or papers about the Mistral 8x7B model.
â· #deployment (34 messagesđ„):
- Running Large Language Models Locally:
@michaelwechner
shared his experience using Ollama to run Mistral 7B locally on a Mac M1 and in the cloud through Scaleway using an Apple Mac mini M2 Pro. The discussion also extended to whether Ollama and other similar tools are wrappers of llama.cpp. - Deployment Restrictions on Virtual Desktops:
@kartik.07
discussed his challenge of running a model locally on a virtual desktop, where he couldnât install new software or third-party tools.@duck
confirmed that running inference would require some type of software, which might not be possible with such restrictions. - Comparing vLLM and TGI for Mixtral: On
@andersruge
âs query about performance benchmarks between vLLM and TGI,@casper_ai
highlighted that vLLM is generally faster as it prioritizes optimization, whereas TGI is mainly focused on reducing time to first token. - Scaling Down for Real-Time Chatbot Applications:
@daain
suggested options for deploying real-time chatbots with limited resources, such as using APIs, choosing smaller models like Phi-2 or TinyLlama-1.1B-Chat-v1.0, or utilizing NVidia Jetson Nano.
Links mentioned:
- GitHub - jmorganca/ollama: Get up and running with Llama 2 and other large language models locally: Get up and running with Llama 2 and other large laâŠ
- Run Mistral 7B using Ollama on a Mac M2 16GB at Scaleway: I recently installed Mistral 7B using Ollama on myâŠ
â· #finetuning (5 messages):
- Request for GPU Recommendation:
@comcyber_12802
asked for the minimum GPU requirements for finetuning Mistral 7B for a dataset of about 5000 question-answer pairs.@le_mess
suggested using an RTX 3090, mentioning it could train the stated dataset in approximately 1 hour, and offered assistance via private messages. - Taking Time to Learn: Post GPU recommendation,
@comcyber_12802
stated intentions of investing more time in better understanding the agents like RAG, QLoRA, Axolotl, Peft before proceeding, appreciating@le_mess
âs assistance. - Unrelated Conversation:
@akshay_1
commented on an unspecified source by saying itâs equivalent to telling someone to âgoogle it,â to which@duck
apologized if it came off as offensive.
â· #random (13 messagesđ„):
- Poltronsuperstarâs Take on No-Code AGI Platform: User
@poltronsuperstar
suggested a no-code platform powered by Language Learning Models (LLMs) with multiple types of agents; a generalist agent to overarch various specialist agents. The focus becomes having smart high-level decision-making, not singularly on implementation. - Inter-Agent Communication & Contextual Data Storage:
@poltronsuperstar
elucidated that agents should communicate directly and via shared context. Files were suggested as ideal tools for storing high varying data, emphasizing the efficiency of a filesystem, facets, and history in a slightly repurposed git repo. - AGI Around the Corner?: In a daring prediction,
@poltronsuperstar
predicted the advent of Artificial General Intelligence (AGI) in a matter of weeks to months. Citing GPT-4 level LLMs as a possible ceiling, the timeline was admitted to be somewhat intuition-dependent. - AGI: Simple but Genius: While AGI is predicted to be rather simple to explain (akin to GAN),
@poltronsuperstar
disclaimed that the explanationâs simplicity doesnât take away from the eventual model being âabsolute geniusâ. - Defining AGI: User
@.tanuj.
posed an important question: âHow do yâall define AGI?â, seeking to understand the variety of definitions held by the chat community.
â· #la-plateforme (6 messages):
- Question about Mistralâs Open Source status:
@darshansharma_
asked if Mistral is open source, to which@refik0727
confirmed that it is. - Open Discussion Initiated:
@lerela
encouraged open question asking on the channel. - Request for MISTRAL_API_KEY:
@carloszela
mentioned that he is adding a java library into langchain4j for mistral-ai and sought a MISTRAL_API_KEY demo. - Mediumâs Performance Inquiry:
@_definitely_not_sam_
asked if other users have also experienced slow performance on Medium, but no responses were noted.
LAION Discord Summary
- LAIONâs Child-Pornography Contamination Dilemma:
@chad_in_the_house
brought up a Stanford paper uncovering child porn in LAION datasets that provoked an imperative discussion regarding responsibility and dataset sanitation. Further debate on disclosure norms by@progamergov
,@.undeleted
, and@peacekeeper8310
raised possible anti-FOSS AI agenda and corporate regulatory capture motivations. - Decoding the LAION Conundrum: Amid rising concerns over LAIONâs controversial datasets,
@thejonasbrothers
and@chad_in_the_house
discussed possible mitigation approaches, therapy dilemma between total eradication and acceptable-degree reduction, and the issueâs influence on legality perceptions of crawling and storing likely contaminated data. - Dissecting SISRâs Noise Challenge:
@vrus0188
pointed to a research paper outlining how intrinsic early-training noise of deep-learning-based Single Image Super-Resolution (SISR) complicates obtaining optimal results. - Innovations for Refined Image Generation: HandRefiner and ElasticDiffusion, shared by
@vrus0188
, introduce strategies for refining malformed digital-hand renderings and training-free arbitrary-size image generation, respectively. URLs: HandRefiner and ElasticDiffusion. - Advancements in Boundary Modelling and Document Reasoning:
@thejonasbrothers
highlighted a differentiable model that uses boundary attention to excel in modelling image boundaries and a new DocLLM approach that merges bounding-box information with spatial layout structure to refine document understanding. - Robotics Inspired by Curiosity:
@vrus0188
highlighted a YouTube video showcasing how robots can be developed to embody the element of curiosity.
LAION Channel Summaries
â· #general (110 messagesđ„đ„):
-
LAION in Hot Waters Due to Unsavory Content:
@chad_in_the_house
discussed a recent Stanford paper that discovered child pornography in LAION datasets, forcing LAION to take them down. The community expressed concern over this issue and discussed alternatives like using Common Crawl. -
Debate Over Responsible Disclosure and Impact: Users
@progamergov
,@.undeleted
, and@peacekeeper8310
evaluated the Stanford researchersâ approach, with some stating that revealing the issue without allowing LAION to mitigate it first could be construed as reckless and not aligned with responsible disclosure norms in the security world. In addition, they pointed out the possibility of an anti-FOSS AI agenda and corpos looking for regulatory capture. -
Rethinking the Strategy â More Due Diligence?:
@thejonasbrothers
and@chad_in_the_house
debated potential solutions to the problem, acknowledging the mutable nature of illegal images and the impossibility of a 100% uncontaminated dataset. They argued for a middle ground approach - potentially legalizing datasets if due diligence has been made to remove Not-Safe-For-Work (NSFW) content. -
Intricacies of Content Responsibility: User
@thejonasbrothers
pointed out that the responsibility must ultimately lie with those hosting illicit content, not with LAION for containing potentially âharmful stringsâ. Yet, the ongoing dilemma raises questions about the legality of crawling, saving, and possibly distributing potentially contaminated data. -
Hard Questions on Purging Troublesome Data: In light of the recent issues with LAION databases, users
@chad_in_the_house
and@thejonasbrothers
navigate the complexities of removing all problematic content. They concede that total eradication might be impossible, but reducing it to an acceptable degree could become the next best move. However, the paper exposing the issue in LAION datasets could inadvertently provide a roadmap for locating illicit content on the internet, complicating the matter further.
Links mentioned:
- Electronic Tip Form | FBI
- nvidia/parakeet-rnnt-1.1b · Hugging Face
- Another Hit Piece on Open-Source AI: Stanford researchers find problematic content in LâŠ
â· #research (8 messagesđ„):
-
Noise Obstacles in Image Super-Resolution Optimisation:
@vrus0188
introduced a research paper highlighting the challenges posed by inherent noise during early training steps in deep-learning-based Single Image Super-Resolution (SISR). The study underscores the need for further scrutiny into the ill-posed nature of SISR processes.
-
HandRefiner aims to improve Image Generation:
- A GitHub repository named HandRefiner was shared by
@vrus0188
. This project presents a methodâDiffusion-based Conditional Inpaintingâfor refining malformed hands in generated images.
- A GitHub repository named HandRefiner was shared by
-
ElasticDiffusion offers Training-free Image Generation:
@vrus0188
introduces ElasticDiffusion from GitHub repository, offering a novel PyTorch implementation for training-free arbitrary size image generation.
-
Differentiable Model Architecture to Improve Image Boundaries:
@thejonasbrothers
toted a study exhibiting a differentiable model employing boundary attention, which can exceptionally model boundaries while offering superior resistance to noise, sub-pixel precision and the adaptability to handle images at their native resolutions.
-
DocLLM: Innovative Approach to Visual Document Reasoning:
@thejonasbrothers
discusses a paper that presents DocLLMâa lightweight extension to traditional Large Language Models (LLM)âdelegates attention only to bounding box information to integrate the spatial layout structure, thus, sidestepping expensive image encoders. Furthermore, it tailors a pre-training objective that helps to infill text segments. The poster also provided a direct quote from the paper.
-
Robot Development Inspired by Curiosity:
- A YouTube video entitled âThis Curious Robot Should Be Impossible!â was flagged by
@vrus0188
.
- A YouTube video entitled âThis Curious Robot Should Be Impossible!â was flagged by
Links mentioned:
- Boundary Attention: Learning to Find Faint Boundaries at Any Resolution: We present a differentiable model that explicitly âŠ
- Noise-free Optimization in Early Training Steps for Image Super-Resolution: Recent deep-learning-based single image super-resoâŠ
- DocLLM: A layout-aware generative language model for multimodal document understanding: Enterprise documents such as forms, invoices, receâŠ
- This Curious Robot Should Be Impossible!: â€ïž Check out Weights & Biases and sign up for âŠ
- GitHub - wenquanlu/HandRefiner: Contribute to wenquanlu/HandRefiner development byâŠ
- GitHub - MoayedHajiAli/ElasticDiffusion-official: The official Pytorch Implementation for ElasticDiffusion: Training-free Arbitrary Size Image Generation: The official Pytorch Implementation for ElasticDifâŠ
OpenAccess AI Collective (axolotl) Discord Summary
- Fine-tuning Limbo:
@l_teto_l
queried if fine-tuning LLAMMA 2 with Manticore datasets could yield better results, sparking an engaging discussion with multiple users chiming in with insights and linked resources. - Bug Hunt for Mixtral:
@bratao
shared a bug report pointing out some issues with Mixtral finetuning. Despite this, they observed that Mixtral instruct performed better even after applying suggested fixes. - Adventures in Attribution:
@yamashi
initiated a debate on pinpointing the most influential tokens to an output, recommending backpropagation and input gradient analysis. Various users suggested tools like ooba. - Benchmark Bash:
@yamashi
criticized benchmark tests like medmcqa and pubmedqa for incomplete words and skewed distributions, leading to discussions about better evaluation methods. - Bounty Hunting with Triton Kernels:
@caseus_
announced a $2400 bounty for improved speed and memory efficiency for Triton kernels for FFT. - Balancing the Act of Learning Rates:
@nafnlaus00
discussed optimal learning rates, evaluative loss, and training loss, highlighting their impact on model performance and emphasizing maintaining balanced ratios. - The Dropout Debate:
@nafnlaus00
shared their insights on ascertaining the most effective dropout rates and the ongoing processes of metaparameter tuning. - Hyperparam Magic with Axolotl:
@giftedgummybee
piqued interest by mentioning the use of autohyperparam tuning in Axolotl. - Skipping Workflows for Merging Multiple PRs:
@caseus_
proposed using[skip ci]
tags for merging multiple PRs in succession to reduce workflow runs, pointing to GitHub documentation for the concept. - Unraveling Grouped GEMM and Grouped Experts:
@caseus_
and@casper_ai
deep-dived into the link between Grouped GEMM and grouped experts, also sharing a comparative GitHub link. - Tackling Non-English Fine-tuning:
@muhammad_ichsan
discussed challenges in fine-tuning Mistral for non-English languages (Indonesian), prompting advice from members like@nanobitz
on tokenizer enlargement and text instruction. - Navigating Large Model Trainings on Multiple GPUs:
@b_ryan0
sought strategies for training large models (like codellama 34b) across multiple GPUs.@noobmaster29
suggested a solution usingzero3
and micro-batching. - Solving Non-GPU Development for Axolotl:
@kcaverly
enquired about a feasible non-GPU development setup for Axolotlâs CLI, leading@noobmaster29
to suggest affordable rental options on runpod. - Boosting Non-English Performance:
@noobmaster29
shared an academic paper for improving non-English performance in models like Mistral. - Praying for Shearing Mistral Code:
@dangfutures
requested sharing of the shearing mistral code once itâs figured out. - Quest for Quantifying Token Effect:
@nosa_.
recommended testing whether increasing token quantity could improve the capabilities of Sheared-LLaMA using extensive datasets like SlimPajama. - Legal Compass for Non-Copyright Content Use:
@dctanner
instigated a discussion regarding the use of non-license restricted content to avoid any legal consequences, especially after recent copyright cases. - Casting Doubt on Bluemoon Quality:
@xzuyn
warned against the sole use of bluemoon due to lower content quality and advocated for an assorted book dataset within copyright limits.
OpenAccess AI Collective (axolotl) Channel Summaries
â· #general (42 messagesđ„):
- Fine-tuning Dilemma:
@l_teto_l
asked if finetuning LLAMMA 2 with the datasets used for Manticore would yield great results. This sparked a discussion where various users chimed in with their insights and shared relevant links. - Mixtral Finetune Bugs:
@bratao
shared a bug report about Mixtral finetuning but added that the Mixtral instruct still performed better even after applying certain fixes. - Tokens Contribution Analysis:
@yamashi
sparked an interesting conversation about figuring out which tokens contribute most to the output, suggesting backpropagation and looking for the gradient for each token in the input. Other users like@nanobitz
mentioned tools like ooba which might provide this feature. - Criticisms on Benchmarks:
@yamashi
expressed frustration at the apparent shortcomings of benchmarks like medmcqa and pubmedqa, stating that they sometimes didnât provide complete words and often had skewed distribution, prompting a need for closer assessment. - Bounty for Optimizing Triton Kernels:
@caseus_
made an announcement about a $2400 bounty for optimized Triton kernels for FFT, looking for improvements on speed and memory efficiency.
Links mentioned:
- CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models: The ability to perform causal reasoning is widely âŠ
- Question · Issue #6 · pratyushasharma/laser: Hi, Thanks for releasing this code. Does this codeâŠ
- Incorrect implementation of auxiliary loss · Issue #28255 · huggingface/transformers: System Info transformers version: 4.37.0.dev0 PlatâŠ
- [BOUNTY] Optimized Triton Kernels for full fine tunes · Issue #1038 · OpenAccess-AI-Collective/axolotl: đ Feature description Weâve seen marketing frâŠ
- HellaSwag or HellaBad? 36% of this popular LLM benchmark contains errors: We analyzed HellaSwag, a popular LLM benchmark, anâŠ
- Fix load balancing loss func for mixtral by liangxuZhang · Pull Request #28256 · huggingface/transformers: What does this PR do? Fixes #28255 Before submitâŠ
â· #axolotl-dev (10 messagesđ„):
- Balancing Learning Rates and Loss Ratios:
@nafnlaus00
discussed the relationship between learning rates (LR), evaluative loss, and training loss, advising to watch for their ratios as they affect model performance. They noted: âDepends on your LR. Watch the ratio between eval loss and train loss, aka how focused it is on memorizing the training data.â They also mentioned that the ideal divergence between evaluative and training loss should not exceed 5-10%. - Determining Ideal Dropout Rates:
@nafnlaus00
shared insights on optimal dropout rates stating, âI had been using 0.25 dropout but I think lower is probably better. But I think higher than 0.07 is probably best.â They acknowledged still being in the process of metaparameter tuning to find the best dropout and LR for their case. - Autohyperparam Tuning in Axolotl:
@giftedgummybee
made a comment about using autohyperparam tuning in Axolotl, provoking curiosity among the community members. - Skipping Workflow Runs while Merging Multiple PRs:
@caseus_
suggested using[skip ci]
tags while merging multiple PRs in a row to reduce workflow runs. They shared a link (Skipping workflow runs - GitHub Docs) for the same from GitHub docs. - Grouped Experts and MOE:
@caseus_
and@casper_ai
discussed the relationship between Grouped GEMM and grouped experts, with the latter saying, âGrouped GEMM = grouped experts as far as I can seeâ.@caseus_
also highlighted a comparison link (Comparing masterâŠmoe · imoneoi/openchat) on GitHub to further exemplify.
Links mentioned:
- Skipping workflow runs - GitHub Docs
- Comparing masterâŠmoe · imoneoi/openchat: OpenChat: Advancing Open-source Language Models wiâŠ
â· #general-help (50 messagesđ„):
- Struggling with Non-English Fine-tuning: User
@muhammad_ichsan
expressed difficulty with fine-tuning Mistral on Indonesian Wikipedia dataset, citing stagnant training loss.@nanobitz
advised him to increase tokens in the tokenizer, feed the model a lot of tokens, and then instruction tune.@noobmaster29
also suggested mixing in English during the Full-Fine-Tuning (FFT), given@muhammad_ichsan
âs report of catastrophic forgetting with English queries. Link to Wikpedia Dataset - Mistral Vicuna1.1 Formatting:
@le_mess
shared a chat template they created for Vicuna1.1, with@nanobitz
suggesting to add\n
when making it single line. - Training Large Models Across GPUS:
@b_ryan0
inquired about a recipe for training large models like codellama 34b across multiple GPUs, and@noobmaster29
provided a solution usingzero3
and micro-batching. - Non-GPU Development for Axolotl:
@kcaverly
asked about a GPU-poor development setup for the CLI of Axolotl, to which@noobmaster29
suggested renting on runpod for affordability. - Improving Non-English Performance:
@noobmaster29
shared an academic paper (https://arxiv.org/pdf/2401.01055.pdf) that might be helpful for those seeking to improve non-English language performance of models like Mistral.
Links mentioned:
â· #shearedmistral (7 messages):
- Request for Shearing Mistral Code:
@dangfutures
requested that the code for shearing mistral be shared once figured out. - Hypothesis on Token Quantity:
@nosa_.
suggested that it would be interesting to test the hypothesis that increasing token investment could further improve the capability of Sheared-LLaMA. - Debate on Data Adequacy: In the context of testing the above hypothesis,
@nosa_.
and@xzuyn
agreed that SlimPajama might offer a large enough set of data for testing. - Discussion on Non-Copyright Content Use:
@dctanner
raised concerns about using non-license restricted content for continued pre-training to avoid potential legal issues, particularly considering the recent developments in the NYTimes case. - Quality Concerns about Bluemoon Dataset:
@xzuyn
advised against the singular use of bluemoon due to possible content quality issues and recommended gathering a book dataset that wouldnât pose any copyright challenges.
Perplexity AI Discord Summary
- Hungry for a Spanish Interface: User
@juaniespeche
voiced the need for a Spanish UI for Perplexity, pointing out that the AI can already respond accurately in Spanish. - Perplexity Pricing Puzzles:
@archient
requested clarification on Perplexityâs token pricing when utilizing multiple models.@icelavaman
and@ok.alex
clarified that Perplexity operates under a prepaid credits system, with the total cost being the cumulative amount for each model based on processed tokens. - Craving Direct Model Communication:
@saltrockr
queried about the possibility of interacting with models directly without internet searches.@reflext
suggested using Perplexityâs writing mode for this purpose. - Unexpected Hiccups in Trial Period Payment:
@ava12138
and@boredkarma
conversed about difficulties in validating payments for the 7-day Perplexity Pro trial, observing inconsistencies in the acceptance of different card types. - Striking UI Similarities between Phind and Perplexity:
@neuralspace
and@reflext
discussed the noticeable similarities between the UIs of Phind and Perplexity.@reflext
argued that such resemblances are inevitable given the central search-bar design convention. - Gratitude for Perplexity AIâs Help:
@hei_veno
gave positive feedback about how Perplexity AI has significantly aided in developing training content, although the specifics couldnât be shared due to confidentiality.@aontoni
and@whiterickruben
also shared their experiences with Perplexity AI assisting in a university project and an exam prep, respectively. - Showcasing Perplexity AIâs Profile through an Article and a Video:
@nayka3473
provided a link to an article they wrote about Perplexity and other AI chat platforms, as well as a YouTube video titled: âRanking top AI Chat Platforms: Phind, ChatGPT, Claude, Gemini Pro, Poe and more!â. - Pondering Perplexity Appâs Roles:
@archient
posed an interesting question about the correlation between a profile in the Perplexity app and a system role in the API. - Call for a Solar 10.7b Model:
@arcinarci
suggested the inclusion of a âsolar 10.7b modelâ in the Perplexity spectrum.
Perplexity AI Channel Summaries
â· #general (65 messagesđ„đ„):
- Spanish User Interface Needed: User
@juaniespeche
expressed a desire for a Spanish interface in Perplexity, noting that the AI already responds effectively in Spanish. - API Pricing Clarification:
@archient
inquired about Perplexityâs token pricing when using multiple models.@icelavaman
explained that the total cost would be a sum of the costs for each model based on tokens processed. Further inquiries about usage billing led@icelavaman
and@ok.alex
to clarify that Perplexity operates via a prepaid credits system. - Direct Conversations with Models:
@saltrockr
asked for a way to query models directly without internet searches involved.@reflext
suggested the use of the writing mode in Perplexity. - Payment Issues for Trial Period:
@ava12138
and@boredkarma
discussed issues in payment validation methods for the 7-day trial of Perplexity Pro, noting inconsistencies in which cards are accepted. - UI Similarities between Phind and Perplexity:
@neuralspace
and@reflext
discussed the similarities between the user interfaces of Phind and Perplexity.@reflext
stated that such similarities are inevitable given the central search-bar design type.
Links mentioned:
- Perplexity - AI Companion: Ask anything while you browse
- Perplexity - AI Search: Upgrade your default search engine
- Getting Started with pplx-api
- Perplexity - AI Search: Upgrade your default search engine
â· #sharing (5 messages):
- User Feedback on Perplexity AI: User
@hei_veno
mentioned that Perplexity AI helped a lot in developing training content, although the detailed information could not be shared due to work-related confidence. - Resource Recommendation:
@aontoni
shared a link that they found helpful, but didnât specify further details. - Perplexity AI Assists with MS Access:
@aontoni
later stated how Perplexity AI helped them understand the relationship between a form and a query in MS access for a university project. - Perplexity AI useful for Exam Help: User
@whiterickruben
mentioned that Perplexity AI helped them assist a friend with an upcoming exam. - Article on AI Chat Platforms Including Perplexity:
@nayka3473
wrote an article about Perplexity and other AI chat platforms, which they shared via this link. They also shared a YouTube video titled: âRanking top AI Chat Platforms: Phind, ChatGPT, Claude, Gemini Pro, Poe and more!â and asked for feedback.
Links mentioned:
- The Rise of AI: comprehensive list of top AI Chat Platforms: Top AI Chat Platforms of 2023
- Ranking top AI Chat Platforms: Phind, ChatGPT, Claude, Gemini Pro, Poe and more!: Discover our top-ranked AI Chat Platforms of 2023,âŠ
â· #pplx-api (2 messages):
- Question about Perplexity Appâs Profile vs API System Role:
@archient
asked, âIs the profile in the perplexity app the same as a system role in the API?â. - Request for Solar 10.7b Model:
@arcinarci
inquired about the possibility of having a âsolar 10.7b modelâ.
OpenAI Discord Summary
- Missing img2img Functionality in ChatGPT:
_typedef
inquired about an img2img model,@solbus
clarified that ChatGPT does not currently support direct img2img functionality. However, DALL·E developers in an AMA hinted at future âimage referencesâ potentially introducing img2img feature. AMA Link - Ease of API Integration With Actions:
@iamhere6321
complimented the ease of use and effectiveness of Actions in connecting to an external API. In contradiction,@niko3757
preferred more flexibility and the ability to create new threads. - Concerns with Decreasing Gpt4 Efficiency: Seeing a decline in gpt4âs efficiency,
@caesarrzk
asked for recommendations to improve this,@my5042
suggested using custom gpt and âyou are chatgptâ instruction for better output. - ChatGPT Performance and Signup Issues:
@wolf.lover
expressed issues with ChatGPTâs lagging and errors,@zeromaid
faced issues during the signup process. - GPT4 Factual Accuracy Concerns:
@wesego
raised concerns about GPT4âs factual accuracy when generating text from an attached document, and@niko3757
suggested using interconnected APIs or CI. - Teaching Immutable Syntax to ChatGPT:
@facebreaker.
inquired on how to teach ChatGPT an immutable fixed syntax or structure for more specific and reproducible responses. - File Review with GPTâs Assistance:
@jferrari_75079
asked for assistance with a project where GPT reviews / summarizes filesâ content, and provides recommendations on action (delete, archive, or save). - CreatingTheLatest Investment Articles Without Advisories:
@komal0887
asked for help refining a prompt for generating articles with only the latest investment information, specifically without any advice or evaluative sentences. They were using the gpt-3.5-turbo-instruct model for this task. - Chatbots Mimicking Conversation Styles:
@emaanios
inquired about chatbots that could mimic a provided conversation style for their language generation bot research.
OpenAI Channel Summaries
â· #ai-discussions (13 messagesđ„):
- No Direct img2img Feature in ChatGPT Yet:
@_typedef
inquired if the model for txt2img is the same for img2img.@solbus
clarified that currently, ChatGPT does not have a direct img2img functionality. It recognizes an uploaded image (img2txt), which can then be used to generate a similar image in a subsequent txt2img step. However, Solbus referenced an AMA where DALL·E developers hinted at potential future âimage referencesâ, which could introduce some form of img2img feature. The AMAâs link was shared but may require archive access to view. - Image to Image - A General Query:
@_typedef
later clarified that their previous question about img2img functionality was general and not specifically related to OpenAI. - An URL without context:
@jaicraft
shared a URL without any preceding or succeeding context. - Digital Exhaustion: User
@mad_cat__
expressed fatigue and found it hard to navigate discord rooms. However, they also mentioned their excitement about their work.
Links mentioned:
âSteve Jobs Unveils Siri Chat: Created with Bard.
â· #gpt-4-discussions (20 messagesđ„):
- OpenAI Actions Ease of Use:
@iamhere6321
complimented the ease of configuration and effectiveness of using Actions to connect to an external API, calling it a promising approach.@niko3757
shared an alternate perspective, preferring assistants that have more flexibility and can create new threads. - Signup Issues Encountered: User
@zeromaid
reported issues with the signup process on the platform, receiving a message that âSignup is currently unavailable, please try again later.â They reiterated the problem, indicating they were unable to sign up. - ChatGPT Performance Issues:
@wolf.lover
reported performance issues with ChatGPT, indicating it had become laggy and was causing errors in Firefox. They expressed concern about needing to switch chats despite having spent a significant amount of time on the current one. - Advantages of Using Assistants: In a discussion with
@iamhere6321
,@niko3757
listed several advantages of using Assistants over custom GPTs. These include unlimited actions, the ability to package multiple actions into one, triggering new threads, and increased knowledge embedding into the model among other perks. Despite highlighting these advantages,@niko3757
also noted that these features come with a cost. - Seeking Assistance with GPT4 Accuracy:
@wesego
asked if anyone had success getting GPT4 to write text while accurately adhering to factual information in an attached document. They noted discrepancies in the AIâs generated story and the factual accuracy based on their experience.@niko3757
suggested moving away from CustomGPT and trying interconnected APIs, potentially also involving Continuous Integration (CI). - Challenges with Imposing Fixed Syntax and Structure:
@facebreaker.
sought guidance on how to teach ChatGPT an immutable fixed syntax/structure. They experienced problems with changing syntax and quality reduction over time, and hoped to make the modelâs responses reproducible and specific to their needs. - Issues After Switching User-Agent:
@vova5963
joked about being blocked by Mouser after frequently switching their User-Agent, noting that this allowed them to watch YouTube without being blocked.
â· #prompt-engineering (12 messagesđ„):
-
Refining Prompts for Article Generation: User
@komal0887
asks for help refining a prompt that generates articles based on text extracted from different URLs. The generated articles should only contain the latest information and not include investment advice, call-to-action, or evaluative sentences. The user uses gpt-3.5-turbo-instruct model. -
Issues with Lazy gpt4:
@caesarzzk
expressed concerns about gpt4 seeming to get lazier over time, omitting output code or analysis when possible and sometimes even struggling with comprehension.@my5042
suggests using instructions like âyou are chatgptâ in custom gpt for better result. -
Building an Accurate Story:
@wesego
asked for guidance on how to write an accurate story. -
Questions on System Prompts:
@itsnp
asked if they could pose their queries about system prompts in the channel. -
Chatbot Mimicking Conversation Style:
@emaanios
inquired if any chatbot exists that can mimic the style of conversation from provided chat logs for their research in language generation bots. -
Help Requested with File Management using GPT:
@jferrari_75079
asked for assistance with a project in which GPT would examine each file, subfolder, and image and advise on whether to delete, archive, or save it. The task also includes GPT providing a short summary of the fileâs content. The user reported that their earlier attempts resulted in GPT making decisions based on superficial aspects like the fileâs last modified date.
â· #api-discussions (12 messagesđ„):
- Refining Investment Article Prompts:
@komal0887
expressed need for assistance with refining the prompt given to thegpt-3.5-turbo-instruct model
for generating articles from text extracted from different URLs related to financial updates. They want the output to contain only latest information and not advise or evaluative sentences. - Increasing Efficiency of Gpt4:
@caesarzzk
noted Gpt4 being increasingly less efficient and asked for recommendations to improve this predicament.@my5042
suggested using custom gpt and adding the instruction âyou are chatgptâ for a better output. - Recursive Checker for Conciseness and Thoroughness: In response to an undefined problem,
@madame_architect
proposed a solution involving a recursive checker skill to ensure the right balance between comprehensiveness and conciseness in writing. - Chatbots Mimicking Conversation Styles:
@emaanios
asked about chatbots specifically designed to mimic conversation styles based on provided chat logs and@beanz_and_rice
confirmed their existence. - Help with GPT Reviewing Files:
@jferrari_75079
sought help with GPT thoroughly examining files to decide whether to delete, archive, or save them based on their content. They also wanted GPT to provide a short summary of each fileâs content. It was noted that GPT was previously basing its decisions on superficial aspects like fileâs last modified date.
Eleuther Discord Summary
- The DPO is All About Distribution: @gabriel_syme drew focus to how Differential Privacy Offsetting (DPO) is more related to distribution than samples.
- The Lion Roars in Optimizing: @marthinwurer shed light on the functions of the lion optimizer, emphasizing that it doesnât allow large loss spikes due to its fixed change in weights every step.
- Image Captioner Hunt: @frazermc is on the look-out for a nimble image captioner to run through 500k images, indicating a preference for non-LM-augmented options. He shared an Awesome-Multimodal-Large-Language-Models repository for reference.
- Caught in the Mixture of Experts: @michaelmelons queried if anyone experimented with Mixture of Experts (MoE) with experts of varying parameter sizes including simple and complex architecture experts.
- Transformers Learn Algorithms and the Collaboration Proposal: @stellaathena proposed a collaboration around a study named What Algorithms can Transformers Learn? A Study in Length Generalization and the mysteries of compositional capabilities of transformers.
- Pythia-70m Stumbles : @micpie reported a drastic underperformance by the Pythia-70m model in a benchmark test, noting an accuracy drop to 0.002. The insightful @hailey_schoelkopf proposed that floating point precision with fp16 auto dtype could be behind this and adjusting to
float32
could rectify the issue.
Eleuther Channel Summaries
â· #general (18 messagesđ„):
- Lion Optimizer Prevents Large Loss Spikes:
@marthinwurer
observed the practical benefits of using the lion optimizer, specifically that there are no large loss spikes, as the weights only change a fixed amount each step, not a multiple of the gradient. - LLM Flipping Response Logic:
@sk5544
sought community input regarding a paper or research that might explain why a Large Language Model (LLM) flips its response when asked âAre you sure?â. - Seeking Efficient Image Captioner:
@frazermc
shared about needing an image captioner to process 500k images, ideally not an LM augmented one. They shared a GitHub repository on Multimodal Large Language Models for reference. - Efficient Sequence Shifting in Huggingface Datasets:
@.the_alt_man
shared code based on using Huggingface datasets to shift sequence, but noted that the overhead oftorch -> list -> jax.Array
is too heavy and asked if thereâs a better way to accomplish this preprocessing natively in Huggingface. - Running lm-evaluation-harness in Google Colab:
@lee0099
asked if itâs possible to run lm-evaluation-harness in Google Colab,@hailey_schoelkopf
confirmed itâs possible and shared a guideline on GitHub on how to do so. - Implementation of a Data-Controlled Forget Gate in LSTM:
@sentialx
queried on how to implement a data-controlled forget gate in an LSTM,@wonkothesensible
suggested looking at rwkv for inspiration. - Praise for Pythia LLM Analysis:
@swyxio
acknowledged and highlighted the work done by the Pythia team, sharing a Twitter thread by @rasbt that extols Pythiaâs comprehensive analysis of Large Language Models.
Links mentioned:
- lm-evaluation-harness/examples/lm-eval-overview.ipynb at main · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of autoregressâŠ
- Tweet from Sebastian Raschka (@rasbt): I am reviewing my favorite papers of the year, andâŠ
- GitHub - BradyFU/Awesome-Multimodal-Large-Language-Models at Evaluation: :sparkles::sparkles:Latest Papers and Datasets on âŠ
â· #research (17 messagesđ„):
- DPO Distribution Focus:
@gabriel_syme
noted that the connection with Differential Privacy Offsetting (DPO) focuses more on distribution rather than samples. - Theorem 5.4 Discussion:
@salmon_lemon
expressed confusion regarding Theorem 5.4.@sumo43
provided some insights, suggesting that by successfully optimizing the generator, its output would become similar to the data, and explained lambda as a learning rate parameter. - Concept Erasure for Image Models:
@voxs
inquired if anyone has done concept erasure for image models and later said they found some relevant resources. - Mobile ALOHA Imitation Learning System:
@ai_waifu
posted a link to Mobile ALOHA, a low-cost, whole-body teleoperation system developed for imitating mobile manipulation tasks in robotics.@thatspysaspy
admired the demo and queried about its robustness, while@ai_waifu
discussed cost-efficiency and claimed that mass production could bring down the cost significantly. - Mixture of Experts with Variable Parameter Sizes:
@michaelmelons
asked if anyone had attempted MoE (Mixture of Experts) at scale with experts of varying parameter sizes, including simple and more complex architecture experts.
Links mentioned:
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation: by Zipeng Fu*, Tony Z. Zhao* and Chelsea Finn at SâŠ
â· #interpretability-general (2 messages):
- Collaboration Proposal on Transformer Algorithms: User
@stellaathena
discussed the possibility of a collaboration with the lead author of What Algorithms can Transformers Learn? A Study in Length Generalization. They explored the topic of compositionality and information theoretical complexity of tasks as expressed in RASP(-L), and expressed interest in understanding why transformers donât achieve perfect generalization. - Positive Response to Collaboration: User
@dashiell_s
expressed interest in joining the proposed collaboration.
â· #lm-thunderdome (15 messagesđ„):
- Pythia-70m drastically underperforms in tests: User
@micpie
noticed that the Pythia-70m model underperformed on a benchmark test, resulting in an accuracy of 0.002 instead of the previous result of 0.609 See message. - Floating point precision might be the issue:
@hailey_schoelkopf
suggested that the issue could be due to the model running in fp16 using the auto dtype in HF. By adjusting the dtype tofloat32
, the test returned more reasonable results See message. - More Pythia models affected: The issue seemed specific to the v1 Pythia models and more prevalent in smaller models. According to
@hailey_schoelkopf
, enabling torch autocast could potentially help See message. - Difficulty loading local datasets:
@micpie
experienced a problem loading local datasets with JSON format.@hailey_schoelkopf
suggested usingdataset_path: json
anddataset_kwargs: { data_dir: /path/to/benchmark_0-2 }
as a temporary solution, but noted that they will make changes to restore the original functionality See message. - Pending changes to restore original functionality: Despite the suggested workaround for loading local datasets,
@micpie
chosen to wait for changes to be implemented so they wonât have to adjust their approximately 400 config files See message.
Latent Space Discord Summary
- Healing Tokens with Ayenemâs Project:
@ayenem
unveiled a project named TokenHealer, which trims and regrows prompts to harmonize with a modelâs tokenizer. This increases model completion and its resilience to trailing whitespaces/punctuation. More context on the issue TokenHealer addresses can be found in this article. - API Barrier for MidJourney:
@kevmodrome
sought to know if MidJourney can be used via APIs other than Discord.@jevonm
clarified that it is currently Discord-exclusive. - Seeking An AI for Audio Analysis:
@zf0
was curious about a chat model capable of audio analysis instead of mere video frames.@swyxio
suggested exploring riffusion style approaches or Metaâs Seamless models. - Coquiâs Closure Echoes in the AI Community:
@swyxio
disseminated the news of Coquiâs shutdown. Coqui was an open-source speech technology organization. - GPT-4 Summarizes AI/ML Papers:
@intheclouddan
spotlighted a tool on emergentmind.com that employs GPT-4 to summarize AI/ML papers. - InsightPilot to be Discussed at LLM Paper Club:
@swyxio
and@eugeneyan
announced a discussion on InsightPilot at the upcoming LLM Paper Club. InsightPilot is an LLM-power automated data exploration system. - Mixture of Experts (MoEs) on the Horizon: For the upcoming week, the LLM Paper Club, as informed by
@swyxio
, will discuss a paper on âMixture of Expertsâ, a buzzing topic in the open AI community. The link to the blog post is here. - Noting Down the LLM Paper Club:
@swyxio
emphasized the need for note-taking during the paper club sessions and invited suggestions for discord notetaking bot tools.
Latent Space Channel Summaries
â· #ai-general-chat (17 messagesđ„):
- TokenHealer released by Ayenem: User
@ayenem
introduced TokenHealer, a project that trims and regrows prompts to align with a modelâs tokenizer. This improves model completion and its robustness to trailing whitespaces/punctuation. A related blog post was also shared to provide more context on the problem TokenHealer is solving. - MidJourney platform query: User
@kevmodrome
asked if MidJourney could be used via any API apart from Discord.@jevonm
replied that currently it is only accessible via Discord. - Query about chat model for audio analysis:
@zf0
inquired about a chat model that can analyze audio instead of just video frames.@swyxio
suggested looking into a âriffusion style approachâ or Metaâs Seamless models. - Shutdown of Coqui announced:
@swyxio
shared the news of Coquiâs shutdown, an open-source speech technology organization. - New tool for summarizing AI/ML papers:
@intheclouddan
brought attention to a tool on emergentmind.com that uses GPT-4 to summarize AI/ML papers.
Links mentioned:
- Tweet from Josh Meyer đžđŹ (@josh_meyer): Coqui is shutting down. Itâs sad news to starâŠ
- Tweet from Sam (@Sam_Awrabi): 1. AI funding mostly sits in the model layer for nâŠ
- AI/ML Research, Explained | Emergent Mind: Stay informed about important new AI/ML arXiv reseâŠ
- GitHub - Ayenem/TokenHealer: Contribute to Ayenem/TokenHealer development by crâŠ
- The Art of Prompt Design: Prompt Boundaries and Token Healing: Learn how standard greedy tokenization introduces âŠ
â· #ai-event-announcements (1 messages):
- InsightPilot discussion with a leading force:
<@187636841988620288>
will guide a discussion on InsightPilot (copilots for data analysis) here. - LLM Paper Club: This event is a weekly paper review of LLM papers, with focus on big ideas, their relevance, and any open-ended questions after reading.
- No Upcoming Sessions Yet: The series presently has no upcoming sessions but advises regular check-back for updated schedules.
- Matrix for Paper Selection: The paper for review is decided a week ahead, with details shared in the
#llm-paper-club
channel. - Tag In for Discord Notifications: Users are encouraged to request to be tagged in
<@&1107197669547442196>
for discord notifications related to the meet-up.
Links mentioned:
LLM Paper Club (now in Discord) · Luma: A weekly paper review of LLM papers, starting fromâŠ
â· #llm-paper-club (14 messagesđ„):
- InsightPilot: LLM-Empowered Automated Data Exploration System:
@swyxio
shared details of todayâs paper on InsightPilot, an LLM-based, automated data exploration system designed to streamline the data exploration process. The paper can be found at this link. - Join the InsightPilot Discussion:
@eugeneyan
invites members to join the discussion about the LLM to analyze data through this Discord link. - Next in Line: Mixture of Experts (MoEs):
@swyxio
provides the link for next weekâs paper on âMixture of Expertsâ, a hot topic in the open AI community. The link to the blog post is here. - Future Paper Consideration: Self-Play Fine-Tuning (SPIN):
@swizec
suggests considering a paper on Self-Play Fine-Tuning (SPIN) for a future discussion. The proposed paper can be found at this link. - Note-Taking for the Paper Club:
@swyxio
expressed a need for good note-taking during the paper club sessions and is seeking suggestions for discord notetaking bot tools.
Links mentioned:
- Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models: Harnessing the power of human-annotated data throuâŠ
- Join the /dev/invest + Latent Space Discord Server!: Check out the /dev/invest + Latent Space communityâŠ
- Mixture of Experts Explained
- InsightPilot: An LLM-Empowered Automated Data Exploration System - Microsoft Research: Exploring data is crucial in data analysis, as it âŠ
LLM Perf Enthusiasts AI Discord Summary
- LLM eyed to rephrase Anki cards: @thebaghdaddy shared a curious interest in using LLM to rephrase Anki cards for better information generalizability in the
#collaboration
channel. - Exploring a Multi-agent System for smoother storytelling: @yikesawjeez proposed setting up a multi-agent system, including an âorchestratorâ, a âstate managerâ and a small model trained in play-by-post material, to manage narrative creation.
- Steering Plot with âObjectivesâ: @yikesawjeez further suggested the inclusion of an âObjectivesâ section checked by a âDMâ in the system, that can help steer the plot in the intended direction.
- Aim to break the AI Narrative Loop: @yikesawjeez pinpointed a common issue with AI content generation â repetitive narrative loops. Solution suggested: Altering both playerâs and modelâs texts to disrupt the loop.
- Long-context Models to assist narrative management: @yikesawjeez believes that long-context models managing narratives could potentially benefit from plot unfolding examples for precise few-shot directions.
- Search + Search RAG API out for beta-testing: @emrgnt_cmplxty in the
#rag
channel announced the release of a new Search + Search RAG API, inviting eager contributors for a beta test and user application feedback. This model is also open-sourced. - Community interest in New API: @yikesawjeez showed keenness to check out this new API and requested a link.
LLM Perf Enthusiasts AI Channel Summaries
â· #collaboration (5 messages):
- Using LLM to Rephrase Content:
@thebaghdaddy
expressed interest in utilizing LLM to rephrase Anki cards with an objective to improve information generalizability. - Multi-agent System for Narrative Creation:
@yikesawjeez
detailed their idea to operate a multi-agent system to manage narrative creation. The system proposed included, an âorchestratorâ, a âstate managerâ and a small model trained in play-by-post material, collaborating to compress narrative information into manageable sections. - Objective-driven Narrative Management:
@yikesawjeez
also mentioned the possibility of having an extra âObjectivesâ section checked by a âDMâ to steer the plot in a specific direction. - Avoiding Narrative Loops:
@yikesawjeez
underscored the challenge of navigating AI-generated narrative loops, where similar responses can trigger repetitive text. They suggested modifying playerâs messages and the modelâs to break the loop. - Long-context Models for Narrative Management:
@yikesawjeez
proposed that long-context models managing narrative can benefit from examples of how a plot might unfold, facilitating targeted few-shot directions.
â· #rag (3 messages):
- New Search + Search RAG API for Beta Testing:
@emrgnt_cmplxty
announced the release of a new Search + Search RAG API and asked the community if they could do a quick beta test and provide feedback, specifically if it would be useful for their applications. - Open Source Model:
@emrgnt_cmplxty
mentioned that the model behind this newly introduced API is open sourced. - Request for Link to New API: User
@yikesawjeez
showed interest and asked for a link to this new API.
DiscoResearch Discord Summary
Only 1 channel had activity, so no need to summarizeâŠ
- GPT-4 Turbo vs GPT-4 Comparison: User
@philipmay
asked for a judgement on the performance of GPT-4 turbo (gpt-4-1106-preview) compared to regular GPT-4. - Turbo Excels in Conversations:
_jp1_
noted that GPT-4 Turbo may even be better than GPT-4 for âconvenience promptsâ or normal dialogues and tasks involving long contexts, based on personal impressions. - Turbo Struggles with Complex Tasks: However,
_jp1_
also mentioned that GPT-4 Turbo seems to underperform when faced with complex instructions, such as a series of custom tasks in a specific order. - Coding Contexts Prove Challenging:
@mister_poodle
expressed that in the context of coding, GPT-4 Turbo often struggles to implement the full code, even when explicitly instructed; this issue is less frequent with GPT-4 unless when dealing with long context lengths. - Overall Performance of GPT-4:
@mister_poodle
observed a perceived degradation in the performance of both GPT-4 Turbo and GPT-4 since their respective launches.
Alignment Lab AI Discord Summary
- Warm Welcome to a Physics Pundit: New member
@ddt1909
aka Daniel shared his experience in ML/Computer Vision and his current project on information extraction using LLMs for enterprises, influenced by a podcast recommendation to join the server. - Phi-Tuning Falls Flat:
@benxh
described having had mostly negative experiences with phi-tuning, warning the community about the struggles with this model adjustment parameter. - Hugging Face Models: Less Than a Hug More of a Thud:
@benxh
found that the fine-tuned models available on Hugging Face are lackluster, indicating potential unidentified issues, creating a deeper conversation about quality control and expectations around pre-trained models.
Alignment Lab AI Channel Summaries
â· #join-in (1 messages):
- New member introduction:
@ddt1909
introduced himself as Daniel, who has a physics background and has been working in ML/Computer Vision since 2017. Heâs currently building an information extraction product based on LLMs for the enterprise. His decision to join the server was influenced by@660097403046723594
âs recommendation on a podcast.
â· #phi-tuning (3 messages):
- Negative Experiences with Phi-Tuning: User
@benxh
expressed dissatisfaction with the phi-tuning, as theyâve had mostly negative experiences. - Lackluster Fine-Tuned Models on Hugging Face:
@benxh
also points out that the fine-tuned models present on Hugging Face are lackluster and there seems to be an unidentified issue with them.
YAIG (a16z Infra) Discord Summary
Only 1 channel had activity, so no need to summarizeâŠ
- Finding Resources on Analytical Databases: User
@pranay01
expressed an interest in learning about state of the art in analytical databases/large scale analytical systems and asked for suggestions on whom to follow, noting their appreciation for user<@1016864328189759488>
. - Resource Recommendation from Expert: User
@andypavlo
pointed@pranay01
to an upcoming course on this exact topic and provided a link to the courseâs page. - Accessibility for Non-CMU Folks:
@pranay01
followed up by asking if there was a previous version of this course that they could access, and whether non-Carnegie Mellon University students could enroll in these courses.
Links mentioned:
CMU 15-445 :: Advanced Database Systems (Spring 2024): Carnegie Mellon University
Skunkworks AI Discord Summary
- NEJM Image Challenge Dataset Now Accessible:
onuralp.
shared the NEJM Image Challenge dataset in GitHub, noting thereâs no need for data cleaning for users with existing models. Plans to share gpt4v results this week were hinted at, with any suggestions for model tweaks or other amendments welcomed.
Skunkworks AI Channel Summaries
â· #off-topic (1 messages):
pradeep1148: https://www.youtube.com/watch?v=O6RPmtuGKMM
â· #bakklava-1 (1 messages):
- NEJM Image Challenge dataset shared:
onuralp.
made the dataset for the NEJM Image Challenge available on GitHub, and mentioned that there is no need for data cleaning for users who already have their model in place. He also mentioned his plans to upload the gpt4v results this week, and welcomed any suggestions for model changes or other modifications.
Links mentioned:
GitHub - cx0/nejm-image-challenge: NEJM Image Challenge dataset and experiments: NEJM Image Challenge dataset and experiments. ContâŠ
The Datasette/LLM (@SimonW) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.