Nous announced their seed, and the business focus is Nous Forge:
Rabbit R1 also launched their demo at CES and opinions were very divided.
In other news, OpenAI shipped the GPT store today, and briefly leaked their upcoming personalization feature.
â
Table of Contents
[TOC]
Nous Research AI Discord Summary
- Breaking the LLMâs Context Window Limit with Activation Beacon:
@kenakafrosty
shared an arXiv paper on Activation Beacon, a new solution that could potentially solve Large Language Models (LLMs) context window issue.@_stilic_
confirmed that the code will be available on GitHub. - Vibes on Tech Gadgets and AI Use: In the off-topic channel, topics revolved around M2-equipped Apple Vision Pro, Rabbit product, WEHEAD AI companion, layoffs at Humane, and humor about Language Learning Models (LLMs) and their use.
- Curated Tech and AI Links: Interesting links shared include tools like Light Activation Beacon Training and MaybeLongLm Self-Extend for LLM, AI for explaining AI systems, discussions about Rabbit.tech, model interpolation, 2 MoE model and WikiChat.
- Nous Researchâs Exciting Seed Financing and Future Plans:
@teknium
announces Nous Researchâs successful $5.2 million seed financing and their plan to etch transformer architecture into chips, creating powerful servers capable of real-time voice agents, improved coding, and running trillion parameter models. Further open-source research and the development of Nous-Forge are also in the pipeline. - OpenAI Communityâs Various Projects on AI & LLMs: A broad range of topics were covered on the general channel, including the development progress of QLORA, research on wearable AI mentor, discussions on fine-tuning Large Models, use of custom architectures, experiments with WikiChat Dataset, and a spontaneous Spanish Speaking Session.
- LLM-related discussions and inquiries: In the ask-about-llms channel, discussions centered on replicating Phi Models, solutions to VRAM issues with Mixtral and Ollama, and strategies for building LLMs tailored to a proprietary knowledge base. Users considered the use of Synthesizer tool and suggested ways to create synthetic data sets.
- Obsidian Project Code Request: In the project-obsidian channel, users expressed interest in the Obsidian script that
@qnguyen3
has used for their work. The script, when shared, would be valuable for other guild members in their own projects.
Nous Research AI Channel Summaries
â· #ctx-length-research (3 messages):
- Activation Beacon: A solution for LLMâs context window issue:
@kenakafrosty
shared a link to an arXiv paper on a new solution named Activation Beacon. The paper states that Activation Beacon âcondenses LLMâs raw activations into more compact forms such that it can perceive a much longer context with a limited context windowâ. This tool seems to have the ability to balance both memory and time efficiency during training and inference. - Upcoming Code for Activation Beacon on GitHub:
@_stilic_
offered an update, saying that the code for the Activation Beacon will be available here on GitHub.
Links mentioned:
- Soaring from 4K to 400K: Extending LLMâs Context with Activation Beacon: The utilization of long contexts poses a big challenge for large language models due to their limited context window length. Although the context window can be extended through fine-tuning, it will reâŠ
- FlagEmbedding/Long_LLM/activation_beacon at master · FlagOpen/FlagEmbedding: Dense Retrieval and Retrieval-augmented LLMs. Contribute to FlagOpen/FlagEmbedding development by creating an account on GitHub.
â· #off-topic (95 messagesđ„đ„):
- M2-equipped Apple Vision Pro, and the Rabbit product discussed:
@nonameusr
was excited about the M2 on the Apple Vision Pro, while.beowulfbr
expressed skepticism about the Rabbit productâs cost and inference coverage. They also speculated that Apple might release a similar product this year. - Weighing in on WEHEAD AI companion in 2024: Several users had resonating and humorous opinions about the WEHEAD AI companion in 2024, following a link shared by
@teknium
.@everyoneisgross
admired the lowpoly aesthetic, and@youngphlo
imagined carrying the AI around like a baby. Check the post here. - Chats on Humane layoffs ahead of their first device launching:
@mister_poodle
shared a link regarding layoffs at Humane ahead of the startup shipping its first device, a preordered $699, screenless, AI-powered pin. The article is available at this link. - Humor about Language Learning Models (LLMs) and their use: Users
@Error.PDF
and@n8programs
joked about the usefulness of LLMs in understanding and communicating in foreign languages. They also humorously speculated on the next advancements in LLMs, such as Discord mods that automatically translate all screen text to the userâs native language.
Links mentioned:
- Tweet from PCMag (@PCMag): The WEHEAD AI companion is not the assistant we envisioned for 2024. #CES2024
- Mcmahon Crying He Was Special GIF - Mcmahon Crying He was special WWE - Discover & Share GIFs: Click to view the GIF
- Humane lays off 4 percent of employees before releasing its AI Pin: The cuts were described as cost-cutting measures.
â· #interesting-links (39 messagesđ„):
- Exploring Light Activation Beacon Training with MaybeLongLM Self-Extend:
@kenakafrosty
raised the idea of combining Light Activation Beacon Training with MaybeLongLm Self-Extend to potentially eliminate the context window problem. - AI Agents Unraveling AI Systems:
@burnydelic
shared an article on the novel approach taken by MITâs CSAIL researchers who used AI models to experiment on other systems and explain their behavior. - Is Rabbit.tech the Next Big Thing?:
@kevin_kevin_kevin_kevin_kevin_ke
sparked a discussion on Rabbit.tech, a tech company offering standalone hardware for artificial intelligence. Some users expressed skepticism about the need for a separate device when smartphone apps could offer similar functionality (@georgejrjrjr
and@teknium
), while others (@gezegen
) defended the uniqueness of specialized hardware for AI companions. - Criticisms and Defence of Model Interpolation: In the context of AI model development,
@georgejrjrjr
,@romaincosentino
and@charlie0.o
discussed the limitations and potential advantages of model interpolation.@romaincosentino
posited that thereâs a lack of theoretical foundation in model interpolation, while@charlie0.o
considered it as a form of regularization. - Stumbled Upon Large MoE Model and WikiChat:
@nonameusr
shared two links, one to a 2 MoE model based on Intel-neural series v3, and a tweet mentioning WikiChat, a tool boasting improved factual accuracy over GPT-4. The latter prompted@decruz
to query its difference from systems grounded with RAG.
Links mentioned:
- rabbit â home: r1: your pocket companion. Order now: $199, no subscription required.
- fblgit/UNAversal-2x7B-v1 · Hugging Face
- AI agents help explain other AI systems: FIND (function interpretation and description) is a new technique for evaluating automated interpretability methods. Developed at MIT, the system uses artificial intelligence to automate the explanatiâŠ
- Tweet from Owen Colegrove (@ocolegro): This result is so fascinating: WikiChat: Stopping LLM Hallucination - Achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4! Anyone interesâŠ
â· #announcements (1 messages):
- Nous Research Announces $5.2M Seed Financing Round:
@teknium
announces the successful conclusion of the $5.2 million seed financing round, with co-leads Distributed Global and OSS Capital, and participation from several angel investors, including Vipul, founder and CEO at Together AI, Yonatan Ben Shimon, founder at Matchbox DAO, and Balaji, Thibaud, founder at OpenRouter and OpenSea, Chris Prucha, founder at Notion, founder and CEO at Glaive AI, and Gavin, founder and CEO at etched.ai. - Burning Transformers Architecture Into Chips: The intention was revealed to create the worldâs most powerful servers for transformer inference by burning the transformer architecture into chips.
- Products Impossible with GPUs:
@teknium
outlines the projected capabilities of Nous Researchâs servers, emphasising real-time voice agents, improved coding through tree search, and multicast speculative decoding. - Room for Trillion Parameter Models: The upcoming servers are expected to be able to run trillion parameter models, featuring a fully open-source software stack, expansibility to 100T param models, beam search, and MCTS decoding.
- Open-Source Pursuits & Future Project, Nous-Forge: Stressing the importance of open-source research,
@teknium
announces that the funding will allow for continued investment in LLM Architecture, Data Synthesis, Simulation, & Agent Engineering research, and the development of Nous-Forge, set for 2024. The team of developers and advisors mentioned includes<@153017054545444864>
,<@387972437901312000>
,<@265269014148808716>
, and<@187418779028815872>
.
Links mentioned:
Etched | The Worldâs First Transformer Supercomputer: Transformers etched into silicon. By burning the transformer architecture into our chips, weâre creating the worldâs most powerful servers for transformer inference.
â· #general (377 messagesđ„đ„):
- Nous Team raises $5.2 million: The Nous team posted a tweet announcing that they have successfully raised $5.2m in seed funding. Members and friends of the Nous team expressed their excitement and offered congratulations. Source:
@youngphlo
- Development of QLORA: User
@n8programs
shared progress on the development of QLORA, a method to fine-tune OpenAI models. He successfully trained QLORA on a single m3 max and concluded that QLORA generally performed better than ordinary mistral. Source:@n8programs
- Researching Wearable AI mentor: User
@mg_1999
is working on a wearable AI mentor and consulted the community about the best model to use from Nous Research. They shared a link to the productâs website:AISAMA
- Discussion on Finetuning Large Models: The community discussed the benefits and challenges of training and merging large versus small models. Users argued for different strategies such as merging fine-tuned models with base models and using multiple adapters. Notable links shared include
LM-Cocktail
andCFG
- Suggestion of Modal and Runpod platforms: User
@decruz
discussed modal.com, a tool used for inference, and facilitated introductions to people at the company for those in need of GPUs. User@kenakafrosty
mentioned runpod as an alternative platform with similar capabilities. - Use of custom architectures: User
@mihai4256
asked for advice on sharing custom architectures inherited from LlamaForCausalLM with others. He was guided to use a similar method as implemented byQwen
, including a custom modeling file and allowing import withtrust_remote_code=True
. - Interest in WikiChat Dataset: User
@emrgnt_cmplxty
expressed interest in the dataset used by the WikiChat team, stating their own tests with it produced promising results. They raised the idea of reproducing the dataset to fine-tune OpenHermes-2.5-Mistral-7B. Source:stanford-oval/WikiChat
- Spontaneous Spanish Speaking Session: Various users engaged in a fun and humorous conversation in Spanish. The conversation had no informative value but ended with the conversation in fun and good cheer.
Links mentioned:
- Tweet from anton (@abacaj): Why am I recommending bigger models? I mean look at this. Even Qwen-1.8B-chat fails at properly associating turns⊠smaller models will not cut it
- Google Colaboratory
- LM-Cocktail: LM_Cocktail
- Slerp - Wikipedia
- Evangelion Laugh GIF - Evangelion Laugh Smile - Discover & Share GIFs: Click to view the GIF
- Sama AI App: What if humans had infinite memory? Our new AI wearable is created to give you unlimited memory.
- Modal: Modal helps people run code in the cloud. We think itâs the easiest way for developers to get access to containerized, serverless compute without the hassle of managing their own infrastructure.
- Stay on topic with Classifier-Free Guidance: Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be usâŠ
- Oh God Kyle Broflovski GIF - Oh God Kyle Broflovski Stan Marsh - Discover & Share GIFs: Click to view the GIF
- Issues · stanford-oval/WikiChat: WikiChat stops the hallucination of large language models by retrieving data from Wikipedia. - Issues · stanford-oval/WikiChat
- Tweet from Nous Research (@NousResearch): Nous Research is excited to announce the closing of our $5.2 million seed financing round. Weâre proud to work with passionate, high-integrity partners that made this round possible, including câŠ
- Tweet from rabbit inc. (@rabbit_hmi): Introducing r1. Watch the keynote. Order now: http://rabbit.tech #CES2024
- GitHub - cg123/mergekit: Tools for merging pretrained large language models.: Tools for merging pretrained large language models. - GitHub - cg123/mergekit: Tools for merging pretrained large language models.
- Example reading directly from gguf file by jbochi · Pull Request #222 · ml-explore/mlx-examples: This loads all weights, config, and vocab directly from a GGUF file using ml-explore/mlx#350 Example run: $ python llama.py models/tiny_llama/model.gguf [INFO] Loading model from models/tiny_llama/âŠ
- GGUF support by jbochi · Pull Request #350 · ml-explore/mlx: Proposed changes This adds GGUF support using the excellent gguflib from @antirez. Would there be interest in this? GGUF is currently very popular for local inference, and there are tons of models âŠ
â· #ask-about-llms (44 messagesđ„):
- Exploration of Replicating Phi Models:
@gson_arlo
inquired about the existence of open models that aim to replicate the Phi series. In response,@georgejrjrjr
pointed out that Owen from sci-phi has been focusing on synthetic data more than anyone else they know and also mentioned related projects such as refuel.ai and Ben Andersonâs Galactic. - Addressing VRAM Issues with Mixtral and Ollama:
@colby_04841
asked for advice on handling VRAM limitations while using Mixtral 8x7b and Ollama on a system with 4 RTX 3090 GPUs. - RAG vs Fine-Tuning for Proprietary Knowledge Base:
@bigdatamike
sought insights on whether to use RAG, fine-tuning, or both for building a language model tailored to a companyâs proprietary knowledge base. The users gave mixed opinions, with@colby_04841
favoring RAG and@georgejrjrjr
suggesting potentially transforming the data in the retrieval store to better match the output format. - Usefulness of Synthesizer Tool in Data Creation:
@georgejrjrjr
recommended Synthesizer, a tool developed by SciPhi-AI, for multi-purpose language model framework for RAG and data creation. User@everyoneisgross
confirmed its addition to their project list. - Best Way to Create Synthetic Data Sets:
@gezegen
asked about generating synthetic datasets, where@emrgnt_cmplxty
pointed to open source models as a scalable solution, emphasizing the need for maintaining accuracy.
Links mentioned:
GitHub - SciPhi-AI/synthesizer: A multi-purpose LLM framework for RAG and data creation.: A multi-purpose LLM framework for RAG and data creation. - GitHub - SciPhi-AI/synthesizer: A multi-purpose LLM framework for RAG and data creation.
â· #project-obsidian (4 messages):
- Obsidian script sharing request:
@qnguyen3
mentioned they just ran Obsidian for their work. In response to this,@vic49.
requested@qnguyen3
to share the script. There is also an indicated interest in the script from@thewindmom
.
OpenAI Discord Summary
- The Wait Game for GPT-4 Turbo: Member
@_treefolk_
pointed out their anticipation for the full release of GPT-4 Turbo for cheaper usage and increased token limit. The individual highlighted the vagueness of âearly Januaryâ as the promised release timeframe. - The AI vs. Coders Debate:
@you.wish
and@ăïœïœ ïœïœïœïœïœïœïœïœïœ ă
engaged in a spirited discussion about the implications of future GPT versions for coders. While@ăïœïœ ïœïœïœïœïœïœïœïœïœ ă
envisioned a future where âPretty much everybody on earth will be the worldâs best coder with the next versions of GPT,â@you.wish
contested this with the fact that current AI models can only perform very elementary tasks.
- When Discord Rules Stir Discussions: The guild experienced an exhaustive rule interpretation discourse when
@you.wish
asked for upvotes on a Reddit post, prompting dialogues about Discordâs Rule 7 prohibiting âself-promotion, soliciting, or advertisingâ. - LimeWire Takes AI to Music:
@shartok
brought a musical piece composed by LimeWire AI Studio into the conversation, sparking a discussion on AI-generated music. - Mixed Reviews on Midjourneyâs Latest Version: Voices in the guild like
@dino.oats
,@darthgustav.
and@satanhashtag
shared their experiences and perceptions of the newest version of Midjourney (MJ), discussing its deviation from Discord and the inclusion of privacy features. However,@you.wish
expressed dissatisfaction with the version. - Brand Guidelines Puzzle: In the GPT-4 discussions,
@mrbr2023
posed questions over a black-marked section of the brand guidelines documentation for OpenAI GPTs received in an email and the inability to share a screenshot or link to the document. - Roadblocks in GPT Publishing:
@mrbr2023
struggled with selecting âPublish to EVERYONEâ for their GPTs, which was resolved upon realizing that both the âNameâ and âDomainâ boxes in the builder profile settings need to be checked for the option to be available. - Exploring GPT Personalization: Venturing into the new GPT memory feature (personalization),
@winsomelosesome
and@darthgustav.
shared their experiences of the GPT learning from their chats. - ChatGPT Romantically Challenged: In the prompt engineering channel,
@rchap92
pointed out ChatGPTâs difficulty in crafting even a simple romantic scene without violating guidelines, a fact confirmed by@rjkmelb
who stated that ChatGPT is designed to be âG-ratedâ. - ChatGPTâs Conservative Guidelines; Possible Workaround?: When it comes to creating content that might violate guidelines,
@exhort_one
suggested an interesting workaround - involve ChatGPT in crafting a censored version* of the content and then let the user fill in the blanks. - AIâs Potential Across Different Fields:
@shoga4605
pondered over the vast potential of AI and language models and their potential impact on various fields such as linguistics, ecology, environments, and more. They also hypothesized about AIâs application in agriculture for theoretically determining the amount of food that could be produced from lawn space.
OpenAI Channel Summaries
â· #ai-discussions (160 messagesđ„đ„):
- Impatience for GPT-4 Turboâs official release:
@_treefolk_
expressed eagerness for GPT-4 Turbo to move out of preview for cheaper usage and increased token limit, questioning the specifics of the promised âearly Januaryâ release. - Discussion on the future of coding with GPT: A lively debate transpired between
@you.wish
and@ăïœïœ ïœïœïœïœïœïœïœïœïœ ă
about the potential for future GPT versions to replace coders. While@ăïœïœ ïœïœïœïœïœïœïœïœïœ ă
believes that âPretty much everybody on earth will be the worldâs best coder with the next versions of GPT,â@you.wish
argued that current models can only undertake very basic tasks. - Self promotion content prompts Discord rule debate: An extensive discussion about the application of Discordâs Rule 7, which prohibits âself-promotion, soliciting, or advertising,â was initiated by
@you.wish's
request for upvotes on a Reddit post. - AI generated music shared:
@shartok
shared a link to an AI-generated music piece created with LimeWire AI Studio. - Chat about latest Midjourney (MJ) version: Messages from
@dino.oats
,@darthgustav.
and@satanhashtag
detailed their experiences and opinions on the newest version of MJ, including its move away from Discord and introduction of privacy features.@you.wish
expressed dissatisfaction with the version.
Links mentioned:
- Using a ChatGPT made game to fool a Vet Gamedev: A lot of people were saying Games are safe from AI, so I asked AI to make a game that can fool the legendary creator of God Of War! đThanks to @DavidJaffeGaâŠ
- Radiant Warrior - LimeWire: âCheck out Radiant Warrior from shartok on LimeWireâ
â· #gpt-4-discussions (154 messagesđ„đ„):
- Confusion Over Brand Guidelines: User
@mrbr2023
showed confusion about why a section of the brand guidelines documentation for OpenAI GPTs given in an email had a black mark over it. They were also puzzled about being unable to share a screenshot or link to the document in the group. - Voice Quality for Custom GPTs:
@vantagesp
asked why the voice for custom GPTs was subpar, without any response or discussion following. - Struggles with Publishing GPTs: User âmrbr2023â displayed frustration about not being able to select âPublish to EVERYONEâ for their GPTs, eventually figuring out that both the âNameâ and âDomainâ boxes need to be ticked in the builder profile settings for the âEVERYONEâ option to become available.
- Users Encounter Technical Issues with GPTs: Several users reported that their GPTs disappeared and some website pages were not accessible, attributing it to a new update from OpenAI.
- Explored Personalization Feature of GPT:
@winsomelosesome
and@darthgustav.
explored the new GPT memory feature (personalization) in Settings, which allows GPT to learn from your chats. However,@darthgustav.
also noted that it seemed to be removed shortly after discovery. - Appeal for GPT Feedback: User
@faazdataai_71669
asked for feedback on their GPT, âResume Tailorâ, sharing a link to their GPT.
Links mentioned:
Brand guidelines: Language and assets for using the OpenAI brand in your marketing and communications.
â· #prompt-engineering (6 messages):
- ChatGPT struggles with romantic scenarios:
@rchap92
raises a concern about ChatGPT struggling to outline even a basic romantic scene without getting the âmay violate guidelinesâ highlight.@rjkmelb
confirms that Indeed, ChatGPT is designed to be âG-ratedâ. - Workarounds for ChatGPTâs conservative approach:
@exhort_one
provides a workaround suggestion that involves asking ChatGPT to censor any part that might violate guidelines, and then the user can fill in the blanks. - Prompt-engineering guide shared:
@scargia
shares a link to the Prompt Engineering guide on OpenAIâs website. - Inspiring potentials use-cases for AI:
@shoga4605
discusses potentials uses cases for AI and language models in understanding and modeling ecology, environments, habitats, and overall biodiversity. They also consider the possibilities of AI in agriculture, like determining how much food could theoretically be produced from lawn space. - Warm welcome to a new user:
@beanz_and_rice
welcomes@shoga4605
to the community, appreciating their enthusiasm and ideas.
â· #api-discussions (6 messages):
- Censoring needed content: User
@rchap92
asked if chatGPT cannot create a romantic scene beyond a kiss without getting a violation warning.@rjkmelb
confirmed that ChatGPT is designed to be G rated. - Working around âviolationâ highlights: User
@exhort_one
suggested asking ChatGPT to censor any part that may violate guidelines, thereby enabling users to fill in the blanks. - Prompt Engineering Guide:
@scargia
shared a link to OpenAIâs guide on prompt engineering. - Enthusiasm for AI potential:
@shoga4605
expressed excitement about the potential of AI and language models, and contemplated their application in fields such as linguistics, ecology, environments, and more. - Welcome to the discussion:
@beanz_and_rice
greeted and welcomed@shoga4605
to the chat.
LM Studio Discord Summary
-
Ubuntu Server and LMStudio Compatibility Issue: User
@m.t.m
inquired about running LMStudio on Ubuntu 22.04 server without an X server. Responding member@heyitsyorkie
indicated that LMStudio does not support a headless or CLI option and recommended llama.cpp for such needs. -
The GPU Debate RTX 4070 vs. RTX 4090: A discussion was held between
@b0otable
,@heyitsyorkie
,@fabguy
,@senecalouck
, and@rugg0064
on whether to purchase an RTX 4070 or RTX 4090, focusing on performance benefits, VRAM consideration, and price. -
Use of LM Studio for LM-as-a-service Queried: User
@enavarro_
questioned using only the LM Studio backend to set up a LM-as-a-service. They were informed by@heyitsyorkie
that such a feature isnât currently offered and was directed to llama.cpp as a potential solution. -
Forward Looking Talk on ROCm Support & MLâs Future: ROCm support and the future landscape of machine learning sparked a conversation.
@senecalouck
pointed out that ROCm support is already enabled in ollama, hoping it would soon be in LM Studio. The focus then shifted to the future implementation of ML and emerging players like TinyBox. -
Finding 7B-13B Model for Tool Selection and Chat Desires: User
@_anarche_
conveyed their struggle to find a locally usable 7B-13B model that can perform both tool selection (function calling) and chat, possibly in a franken/merged form. Their goal is to shift away from the gpt-3.5-turbo model. -
Stanford DSPy Highlighted as a Potential Solution:
@nitex_dkr
highlighted the Stanford DSPy, a framework for programmingânot promptingâfoundation models, as a potential solution to @anarcheâs challenge. -
Linux Loading Issue and Misleading Version Number Flagged: User
@moko081jdjfjddj
reported that the model wonât load on Linux and noticed a discrepancy with the Linux version number on the website. These issues were addressed by@heyitsyorkie
and@fabguy
who clarified the version issue and directed the user to the specific Linux Beta channel for further support. -
Unsupported Platform Issue Encountered:
@keryline
received an error message that their platform is not supported by LM Studio due to their processor not supporting AVX2.@dagbs
suggested trying the avx beta to resolve this. -
Mac v/s PC for Running Large Models Provokes Discussion: A conversation was initiated by
@scampbell70
about the hardware requirements for efficiently running larger models like Mistral 8x7b, Falcon 180b, or Goliath 120. Some members praised Macs (particularly the Mac Studio) for better performance, while concerns about their lack of upgradability were raised. -
GPUs with Memory Slots Unavailable:
@doderlein
asked where they could buy a GPU with memory slots, which@ptable
stated isnât possible.@heyitsyorkie
pointed to a unique solution from ASUS that couples a GPU with a SSD M.2 NVME to create a hybrid storage-graphics card. -
Hardware Usage in LM Studio Clarified: User
@besiansherifaj
inquired if a CPU is necessary in LM Studio while having a 4090 RTX GPU.@fabguy
clarified that the CPU will always be used even if the full model is on the GPU. -
Autogen Studio and LMStudio Usage Questioned: In the autogen channel, thelefthandofurza brought up a question about anyoneâs experience using autogen studio with lmstudio. The discussion did not proceed further.
LM Studio Channel Summaries
â· #đŹ-general (71 messagesđ„đ„):
-
LmStudio Install Issues on Ubuntu Server: User
@m.t.m.
asked if itâs possible to run LMStudio on Ubuntu 22.04 server without an X server.@heyitsyorkie
responded that LMStudio currently does not support a headless or CLI option and recommended llama.cpp for such needs. -
Nicknames and Server Rules: User
@sexisbadtothebone
asked if their nickname breaks the server rules about SFW content.@heyitsyorkie
advised editing nicknames to abide by the rules and maintain a work environment. -
The RTX 4070 vs. RTX 4090 Debate:
@b0otable
sought advice on whether to purchase an RTX 4070 or RTX 4090. The discussion involved@heyitsyorkie
,@fabguy
,@senecalouck
, and@rugg0064
, focusing on performance benefits, VRAM consideration, and price. -
Using LM Studio for LM-as-a-service:
@enavarro_
inquired about the possibility of using only the LM Studio backend to set up a LM-as-a-service. The user was informed by@heyitsyorkie
that such a feature is not available and was pointed to llama.cpp as a potential solution. -
Discussing ROCm Support & Future of ML: A discussion was held considering ROCm support and the future landscape of machine learning.
@senecalouck
mentioned that ROCm support is available in ollama, with hopes of seeing it in LM Studio soon. The discussion then evolved into exploring the future implementation of ML technology and new players like TinyBox.
Links mentioned:
- TheBloke/LLaMA-Pro-8B-Instruct-GGUF · Hugging Face
- ROCm support by 65a · Pull Request #814 · jmorganca/ollama: #667 got closed during a bad rebase attempt. This should be just about the minimum I can come up with to use build tags to switch between ROCm and CUDA, as well as docs for how to build it. The exiâŠ
â· #đ€-models-discussion-chat (18 messagesđ„):
- Seeking 7B-13B Model for Tool Selection and Chat:
@_anarche_
voiced the struggle to find a local 7B-13B model capable of both tool selection (function calling) and chat, potentially in a franken/merged form. They aim to transition away from the gpt-3.5-turbo model. - Dolphin Model Serves as Suitable All-Around:
@dagbs
recommended the Dolphin model as a good generalist option for coding, functions, tools, etc., despite mixed results with certain function-calling tools like crewai and autogen. - Langchain Compatibility Considerations:
@_anarche_
detailed their intention to integrate the new model into Langchain, for which they have already adjusted their bot to use the LM Studio API. - Uncensored Model Concerns on Discord: Drawing attention to potential issues with using an uncensored model in a Discord environment,
@dagbs
cautioned that such a move could result in a ban, exposing the need for careful model selection. - Stanford DSPy Shared as Possible Solution:
@nitex_dkr
flagged the Stanford DSPy, a framework for programmingânot promptingâfoundation models, which could offer a promising solution to @anarcheâs challenge.
Links mentioned:
GitHub - stanfordnlp/dspy: Stanford DSPy: The framework for programmingânot promptingâfoundation models: Stanford DSPy: The framework for programmingânot promptingâfoundation models - GitHub - stanfordnlp/dspy: Stanford DSPy: The framework for programmingânot promptingâfoundation models
â· #đ§ -feedback (10 messagesđ„):
- Linux Loading Issue:
- User
@moko081jdjfjddj
reported that the model doesnât load on Linux.@heyitsyorkie
and@fabguy
directed the user to Channels and Roles to select the Linux Beta role and post the issue in the specific channel. @moko081jdjfjddj
also noticed a discrepancy with the version number for Linux provided on the website, to which@fabguy
responded saying that the Beta version 0.2.10 had stability issues, hence not updated on the site.
- User
- Unsupported Platform Issue:
@keryline
experienced a problem with LM Studio on their Windows machine, getting an error message stating that their platform is not supported as their processor does not support AVX2 instructions.- To resolve this,
@dagbs
suggested the user to try the avx beta.
- New Beta Version Request:
@logandark
requested a new beta version that includes a specific commit from the llama.cpp repository.
â· #đ-hardware-discussion (29 messagesđ„):
-
Mac v/s PC for Running Large Models: User
@scampbell70
initiated a discussion on the hardware requirements for efficiently running larger models such as Mistral 8x7b, Falcon 180b, or Goliath 120 with the least amount of loss and best performance.@telemaq
and@heyitsyorkie
suggested a Macbook Pro or a Mac Studio for better performance, while@pydus
acknowledged the cost-effectiveness of a Mac Studio with 192GB of unified memory for a price of $7K. However,@scampbell70
expressed concerns about Macs due to their lack of upgradability(source). -
VRAM allocation on Apple machines:
@heyitsyorkie
shared a Reddit thread detailing how the amount of VRAM allocation can be controlled at runtime using a commandsudo sysctl iogpu.wired_limit_mb=12345
(source). -
Purchasing GPUs with Memory Slots:
@doderlein
asked where to buy a GPU with memory slots, a question that@ptable
answered as not being possible.@heyitsyorkie
mentioned a unique solution from ASUS that pairs the GPU with an SSD M.2 NVME, creating a hybrid storage-graphics card (source). -
Mac Performance with Goliath 120b Q8:
@telemaq
shared a Reddit post of a user who ran Goliath 120b Q8 on a Mac Studio M2 Ultra with 192GB memory, achieving about 7tok/s, proving the Macâs capability to handle larger models (source). -
Hardware Usage in LM Studio: User
@besiansherifaj
inquired about whether CPU is necessary in LM Studio while having a 4090 RTX GPU.@fabguy
clarified that the CPU will always be utilized even if the full model is on GPU.
Links mentioned:
- Reddit - Dive into anything
- Reddit - Dive into anything
- ASUS Announces Dual GeForce RTX 4060 Ti SSD Graphics Card
â· #autogen (1 messages):
thelefthandofurza: Has anyone used autogen studio with lmstudio?
Eleuther Discord Summary
- Model Performance Rises without Additional Data, Training, or Scale: A query launched by
@sehaj.dxstiny
regarding improving performance without extra resources turned up some interesting resources, courtesy of@ad8e
and@vatsala2290
, including the Machine Learning workshop poll results. - Taking AI Development Mobile:
@pawngrubber
was steered towardsmlc-llm
by@_fleetwood
for starting machine learning development on mobile devices. This open source tool develops, optimizes, and deploys AI models natively. - Unpacking Llama-2-70Bâs Benchmark Conundrum:
@tirmizi7715
expressed confusion over whyLlama-2-70B
performs worse on MT-Bench while doing well on other benchmarks. - Any Language for Mistral: A discussion headed by
@maxmatical
clarified that modern tokenizers handling all Unicode characters can allow language transfer, as cited by@thatspysaspy
and@stellaathena
. - Unravelling Huggingface Model Structures:
@sk5544
received a coding approach for extracting the PyTorch model definition code from Huggingface, shared by@thatspysaspy
. - Kaggle LLM Contest Peaks Interest:
@grimsqueaker
highlighted an ongoing Kaggle LLM contest of potential interest to the community. - Mechanism Behind MLM Loss Calculation: A discussion started by
@jks_pl
clarified why MLM loss is only computed on masked/corrupted tokens, explained by@bshlgrs
and@stellaathena
. - AI Behavior Explained through Evaluation:
@burnydelic
shared an interesting MIT News article discussing the development of AI models that evaluate and are able to explain the behavior of other AI systems. - muP for Simplified Hyperparameter Tuning: Users
@ad8e
,@thatspysaspy
,@ricklius
, and@cubic27
shared thoughts on muPâs ability to simplify hyperparameter tuning across scales, albeit not being a magic solution. - Twitter Data Limited in Datasets:
@stellaathena
clarified to@rybchuk
that significant amounts of Twitter data are unlikely in certain datasets. - The Truth Behind Mixtral Routing Analysis: A tweet highlighting a Mixtral routing analysis misconception was shared by
@tastybucketofrice
, and further discussed by@stellaathena
and@norabelrose
. - Gaining Insight with GPT/LLM Visualization Tool:
@brandon_xyz
announced a new tool that visualizes GPT/LLMâs cognitive process citing a tweet and received requests for private tool access. - Understanding Mechanistic Interpretability and BIMT: The role of Brain-Inspired Modular Training (BIMT) in boosting neural network interpretability was discussed by
@g_mine
, who pointed out a paper on this issue. - Pythia Data Preparation Standard: Queries by
@joshlk
about Pythia data prep received clarification from@pietrolesci
that it is a standard pre-training process, even if online information is sparse. - EOD Token Issue in Pythia-Deduped Dataset:
@pietrolesci
noted an issue that the Pythia-deduped dataset lacked EOD tokens, with possibilities about why raised by@hailey_schoelkopf
including the omission of the--append-eod
option during tokenizing. - EOD Tokens & Packer for Pythia Models:
@pietrolesci
and@hailey_schoelkopf
discussed whether the difference raised due to missing EOD tokens would affect the âpackedâ dataset that the Pythia models see during training. - Masking Role in Document Attention: A question about the function of masks from
@joshlk
got answered by@hailey_schoelkopf
, who clarified that they are not used to prevent cross-attention between documents.
Eleuther Channel Summaries
â· #general (61 messagesđ„đ„):
-
Optimization Options for Model Performance: User
@sehaj.dxstiny
raised a question about improving a modelâs performance without additional data, training, or scale. Recommended resources included a Machine Learning workshop poll shared by@ad8e
and work by the Amitava Das group at USC shared by@vatsala2290
. -
Exploring ML development on Mobile:
@pawngrubber
showed interest in starting machine learning development (specifically, inference) on mobile devices.@_fleetwood
suggestedmlc-llm
as a start point, a tool to develop, optimize, and deploy AI models natively on devices. -
Understanding Llama-2-70Bâs MT-Bench Performance:
@tirmizi7715
queried why the language modelLlama-2-70B
performed worse on MT-Bench compared toMixtral
andGpt 3.5
, while performing equally well on other benchmarks. -
Japanese Pretraining on Mistral: In a discussion initiated by
@maxmatical
concerning StableLMâs Japanese pretraining on the English language modelMistral
, it was clarified by@thatspysaspy
and@stellaathena
that modern tokenizers can handle all Unicode characters, allowing for language transfer. -
Understanding Huggingface Model Structure:
@sk5544
sought a way to obtain the PyTorch model definition code of a model loaded from Huggingface.@thatspysaspy
shared a coding approach to facilitate the same.
Links mentioned:
- Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think â Introducing AI Detectability Index: With the rise of prolific ChatGPT, the risk and consequences of AI-generated text has increased alarmingly. To address the inevitable question of ownership attribution for AI-generated artifacts, the âŠ
- HITYWorkshopPoll/PollResults.pdf at main · fsschneider/HITYWorkshopPoll: Results of the poll performed for the HITY workshop at NeurIPS 2022. - fsschneider/HITYWorkshopPoll
- GitHub - mlc-ai/mlc-llm: Enable everyone to develop, optimize and deploy AI models natively on everyoneâs devices.: Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. - GitHub - mlc-ai/mlc-llm: Enable everyone to develop, optimize and deploy AI models natively on everyâŠ
â· #research (25 messagesđ„):
- Kaggle LLM Contest:
@grimsqueaker
mentioned an ongoing Kaggle LLM contest that might be of interest to the research community. - Discussion on Masked Language Modeling (MLM) Loss Computation:
@jks_pl
initiated a discussion questioning why MLM loss is calculated only on masked/corrupted tokens.@bshlgrs
and@stellaathena
provided responses, indicating that this setup is due to the original unmasked token being an easy task for the MLMs which might not provide much informational value for learning. - Novel Method of Explaining AI Behavior:
@burnydelic
shared an MIT News article about researchers at MITâs CSAIL who have developed AI models that can conduct experiments on other AI systems to explain their behavior. - muP: A Boon for Hyperparameter Tuning:
@ad8e
shared his key takeaways about muP, emphasizing that muP simplifies the tuning of hyperparameters across different model scales. However, he also noted that muP is not a magic solution and may face issues with certain setups like tanh activations.@thatspysaspy
,@ricklius
, and@cubic27
concurred that the main benefit of muP is to facilitate hyperparameter transfer across scales. - Absence of Twitter Data in Certain Datasets: In response to
@rybchuk
âs query about extracting Twitter data from certain datasets,@stellaathena
responded that none of the datasets in discussion are likely to contain a significant amount of Twitter data.
Links mentioned:
AI agents help explain other AI systems: FIND (function interpretation and description) is a new technique for evaluating automated interpretability methods. Developed at MIT, the system uses artificial intelligence to automate the explanatiâŠ
â· #interpretability-general (8 messagesđ„):
- Mixtral routing analysis finds lack of specialization:
@tastybucketofrice
shared a tweet from@intrstllrninja
stating that a Mixtral routing analysis showed that experts did not specialize to specific domains.@stellaathena
expressed confusion about the widespread misunderstanding. - Previous findings align with Mixtral analysis: User
@norabelrose
commented that there was some other analysis showing the same thing about a year ago, indicating that this finding isnât entirely new. - Trigram frequencies compile on Pythia-deduped training set:
@norabelrose
also shared a link to a document detailing trigram frequencies computed on 11.4% of the Pythia-deduped training set. - Innovative tool for GPT/LLM visualization:
@brandon_xyz
mentioned the creation of a new tool that visualizes the thinking and understanding processes of a GPT/LLM, showcasing his tweet for example and welcomed private requests for tool access. - Mechanistic Interpretability and BIMT:
@g_mine
pointed out a paper discussing large language modelsâ mechanistic interpretability and the role of Brain-Inspired Modular Training (BIMT) in enhancing neural networksâ interpretability.
Links mentioned:
- Tweet from interstellarninja (@intrstllrninja): mixtral routing analysis shows that experts did not specialize to specific domains
- Evaluating Brain-Inspired Modular Training in Automated Circuit Discovery for Mechanistic Interpretability: Large Language Models (LLMs) have experienced a rapid rise in AI, changing a wide range of applications with their advanced capabilities. As these models become increasingly integral to decision-makinâŠ
- trigrams.pkl.zst: trigrams.pkl.zst
- Tweet from Brandon (@brandon_xyzw): This is what a GPT/LLM looks like as itâs thinking and understanding
â· #gpt-neox-dev (13 messagesđ„):
-
Pythia Data Preparation:
@joshlk
noted that there is a lot of information online about data prep for fine-tuning but not much on pre-training. They asked whether Pythiaâs process is typical or different.@pietrolesci
commented that it is generally a standard process for (decoder-only) Language Model (LLM) training, not just for Pythia. -
Missing EOD Tokens:
@pietrolesci
brought up the issue that EOD tokens werenât found in the Pythia-deduped dataset.@hailey_schoelkopf
found this surprising and mentioned a possibility that the option--append-eod
wasnât included while tokenizing the Pile + deduped Pile into the Megatron format. -
Different Packer for Pre-Trained Pythia Models?:
@pietrolesci
pointed out that if no EOD token was added, the resulting âpackedâ dataset would be different from what the Pythia models saw during training because there would be N missing tokens for N documents, shifting every token in the pack.@hailey_schoelkopf
concurred that if both the pre-shuffled and the raw idxmaps datasets donât have EOD tokens, they should match each other. But when packing, the NeoX codebase does not add EOD tokens itself (Source). -
Masking in Document Attention:
@joshlk
inquired about the appearance of masks and whether they are used to prevent cross-attention between documents.@hailey_schoelkopf
clarified that masks are not used for this purpose.
Links mentioned:
- gpt-neox/megatron/data/gpt2_dataset.py at e6e944acdab75f9783c9b4b97eb15b17e0d9ee3e · EleutherAI/gpt-neox): An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. - EleutherAI/gpt-neox
- Batch Viewer : Why Sequence Length 2049? · Issue #123 · EleutherAI/pythia: Hi, I am using utils/batch_viewer.py to iterate through Pythiaâs training data and calculate some batch-level statistics. Firstly, there are some gaps between the actual code in batch_viewer.py anâŠ
- GitHub - EleutherAI/pythia: The hub for EleutherAIâs work on interpretability and learning dynamics: The hub for EleutherAIâs work on interpretability and learning dynamics - GitHub - EleutherAI/pythia: The hub for EleutherAIâs work on interpretability and learning dynamics
- GitHub - EleutherAI/pythia: The hub for EleutherAIâs work on interpretability and learning dynamics: The hub for EleutherAIâs work on interpretability and learning dynamics - GitHub - EleutherAI/pythia: The hub for EleutherAIâs work on interpretability and learning dynamics
- EleutherAI/pile-deduped-pythia-preshuffled · Datasets at Hugging Face
- EleutherAI/pythia_deduped_pile_idxmaps · Datasets at Hugging Face
OpenAccess AI Collective (axolotl) Discord Summary
- Optimizing Mistral Trainingâ: User @casper_ai shared in-depth details about optimizing the Mistral model training: âMoE layers can be run efficiently on single GPUs with high performance specialized kernels. Megablocks casts the feed-forward network (FFN) operations of the MoE layer as large sparse matrix multiplications, significantly enhancing the execution speedâ.
- Potential slow-down in Deepspeed multi-gpu usage: @noobmaster29 highlighted an issue with
accelerate==0.23
(Deepspeed integration), causing slower training for users. Downgrading toaccelerate==0.22
or using themain
branch was suggested, with the fix awaiting release source. - Tracking Experiments with Axolotl and MLFlow: @caseus_ and @JohanWork discussed adding MLFlow into Axolotl for experiment tracking Pull Request #1059.
- Axolotl WebSocket for External Job Management @david78901 proposed a websockets endpoint in the Axolotl project for better external job management. @caseus_ expressed interest in incorporating the idea into the main project.
- A Discussion on the Impact of System Messages Training: In the context of model training, @le_mess stated the content of system messages have no significant impact on model performance and can be as random as âehwhfjwjgbejficfjeejxkwbejâ source.
- Implementing âShearingâ in ShearedMistral Training: @caseus_ pointed out a method for the process of shearing, specifically referencing a GitHub repo. He also discussed the merit of using SlimPajama over RedPajama v2 for data deduplication and quality, noting RedPajama v2 no longer includes subsets source.
OpenAccess AI Collective (axolotl) Channel Summaries
â· #general (7 messages):
- Mistral training optimization methods explained: In the channel,
@casper_ai
detailed some key information on how to optimize the training for the Mistral model. In particular, it was mentioned that MoE layers can be run efficiently on single GPUs with high performance specialized kernels. Megablocks [13] casts the feed-forward network (FFN) operations of the MoE layer as large sparse matrix multiplications, significantly enhancing the execution speed and naturally handling cases where different experts get a variable number of tokens assigned to them. - Request for Mistral model file on Ollama:
@dangfutures
inquired about the appropriate Mistral model file to be used on Ollama. - Potential regression in Accelerate/Deepspeed multi-gpu users: User
@noobmaster29
shared a tweet from@StasBekman
warning about a regression issue inaccelerate==0.23
(a Deepspeed integration). Downgrading toaccelerate==0.22
or using the latestmain
branch was advised to overcome this, with the fix awaiting release.
Links mentioned:
Tweet from Stas Bekman (@StasBekman): Heads up to Accelerate/Deepspeed multi-gpu users There was a regression in accelerate==0.23 (deepspeed integration) which would make your training much slower. The fix has just been merged - so you câŠ
â· #axolotl-dev (37 messagesđ„):
- Axolotl to incorporate MLFlow for experiment tracking:
@caseus_
discussed the addition of MLFlow for experiment tracking in the Axolotl project. This was proposed by@JohanWork
in Pull Request #1059. - System prompts in YAML configuration: Conversation around configuring initial system prompts for the sharegpt within the YAML file.
@dctanner
suggested this would be cleaner than adding it to all dataset records while@le_mess
shared that they are currently adding it manually each time. - Peft update fixed Phi LoRA issue:
@marktenenholtz
identified that an error in Phi LoRa handling was fixed in the update to peft==0.7.0. The issue related to shared memory not being handled correctly by previous peft versions and that identification of LoRa modules needed to be specific for embedding and linear layers. - Websockets added to Axolotl for external job management:
@david78901
proposed adding a websockets endpoint to the Axolotl project to allow external triggering and monitoring of jobs.@caseus_
showed interest in incorporating this into the main project. - Accelerate Pinning:
@caseus_
suggested that Accelerate needs to be pinned to the correct version as indicated by@nanobitz
, which was an issue highlighted in the Pull Request #1080.
Links mentioned:
- GitHub Status
- Issues · OpenAccess-AI-Collective/axolotl.): Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
- GitHub - kmn1024/axolotl: for testing: for testing. Contribute to kmn1024/axolotl development by creating an account on GitHub.
- pin accelerate for deepspeed fix by winglian · Pull Request #1080 · OpenAccess-AI-Collective/axolotl: see https://twitter.com/StasBekman/status/1744769944158712210
- GitHub - dandm1/axolotl: Go ahead and axolotl questions about the API: Go ahead and axolotl questions about the API. Contribute to dandm1/axolotl development by creating an account on GitHub.
- update peft to 0.7.0 by mtenenholtz · Pull Request #1073 · OpenAccess-AI-Collective/axolotl: peft==0.6.0 has a bug in the way it saves LoRA models that comes down to safetensors. This is fixed in peft==0.7.0.
- be more robust about checking embedding modules for lora finetunes by winglian · Pull Request #1074 · OpenAccess-AI-Collective/axolotl: @mtenenholtz brought up in the discord that phi has a more nuanced embedding module name. this PR attempts to handle other architectures a little more gracefully
- Add: mlflow for experiment tracking by JohanWork · Pull Request #1059 · OpenAccess-AI-Collective/axolotl: Adding MLFOW to Axolotl for experiment tracking, looked into how Weight and Bias has been setup and tried to follow the same pattern. Have tested the changes and everything looks good to me. Happy âŠ
â· #other-llms (1 messages):
leoandlibe: I use the exllamav2 convert.py to make EXL2 quants đ
â· #general-help (10 messagesđ„):
- Searching for Chat UI that Supports ChatML or chat_template:
@le_mess
inquired about any chat interfaces that support ChatML or chat_template out of the box. In response,@nanobitz
suggested ooba. - Interest in Testing gguf:
@le_mess
expressed his/her interest in testing gguf.@nanobitz
recommended either lm studio or ollama but didnât provide specific operating instructions for ollama. - Query about Zero2 Training Speed and GPU:
@athenawisdoms
asked if there is a significant difference in Zero2 training speed between two multi-GPU systems (e.g., a6000), one using pcie3.0x16 and the other one using pcie4.0x16. The response to this query was not recorded.
â· #rlhf (3 messages):
- Prompt Strategy a Necessity for AI Interaction: User
@caseus_
suggested that a prompt strategy, which would involve formatting prompts and combining previous turns into the input, is necessary for optimal AI interaction. - Prompt Strategy Already in Action: Following up on the discussion, user
@xzuyn
mentioned that they have already been implementing a prompt strategy.
â· #community-showcase (1 messages):
- System message has no impact on training performance:
@le_mess
stated that the content of the system message has no significant impact on the performance of the trained model. In their words, âThe system message could be âehwhfjwjgbejficfjeejxkwbejâ and the performance would probably still be the same.â
â· #shearedmistral (8 messagesđ„):
- Referencing Shearing Method Implementation:
@caseus_
pointed out a particular step in the shearing process via a link to a GitHub repository. He suggested utilizing the pre-processed data from the projectâs Google Drive, but cautioned this would mean being tied to the same dataset. - Consideration for SlimPajama Use:
@caseus_
contemplated if itâs worthwhile to choose SlimPajama over the RedPajama v2 dataset for improved deduplication and quality. He also observed that RedPajama v2 no longer includes subsets source. - Positive Reception to Dataset Subsets: Responding to this,
@emrgnt_cmplxty
voiced liking the subset feature and questioning why it was removed. - Potential Shift to Slim Pajamas:
@emrgnt_cmplxty
suggested a possible pivot to using Slim Pajamas for the project.
Links mentioned:
- LLM-Shearing/llmshearing/data at main · winglian/LLM-Shearing: Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning - winglian/LLM-Shearing
- togethercomputer/RedPajama-Data-V2 · Datasets at Hugging Face
HuggingFace Discord Discord Summary
- New Channels & Updates to Open Source Libraries:
@lunarflu
announced the launch of two new discussion channelstransformers.js
and ML and cybersecurity while celebratingdiffusers
reaching 20,000 Github stars (source tweet). Further, vishnu5n showed users how to optimize machine learning codes for machine learning models of skew detection (source link). - Attention Shifted to Attention & Self-attention: In diffusion-discussions
@grepolian
asked about the difference between attention and self-attention, but didnât receive a response (source link). - Advancements in AI Tech for Gaming:
@papasancho
in cool finds discussed AI integration in video games using Herika and Mantella, attributing it as a significant advance in gaming (Herika link), (Mantella link). - Solving the Mystery of Phi-2 Behaviour: In the general channel,
@admin01234
discussed a peculiar behavior of the Phi-2 model, where a correct response is followed by seemingly random answers that didnât relate to the input. - LLMs & SQL Injection Attacks: In the NLP channel, they discussed the potential vulnerabilities of web applications integrated with LLMs, particularly to SQL injection attacks, using a paper on arXiv as a reference (source link).
- Conversations on Deep Reinforcement Learning: In today-im-learning,
@couldhu
announced finishing a Deep RL course, while others like@muhammadmehroz
,@gduteaud
and@cloudhu
provided insights about the course and shared the link to the course (source link). - CCTV Query and Skew Detection: In the computer vision channel, user
@iloveh8
discussed implementing GPT-V or LLAVA for real-time CCTV usage, and@vishnu5n
shared their work on skew detection in document images (source link).
HuggingFace Discord Channel Summaries
â· #announcements (1 messages):
- New channels and open source updates grace HuggingFaceâs roster:
@lunarflu
announced the launch of two new discussion channelstransformers.js
and the intersection of ML and cybersecurity. Plus,diffusers
celebrated reaching 20,000 GitHub stars and the new training script integrating the pivotal tuning (referred from@cloneofsimo cog-sdxl
) and prodigy optimizer (referred fromkohya's scripts
) alongside compatibility with AUTO1111 was released. See the tweet for details. Transformers.js
gets a 2024 glow-up:@xenovacom
revealed significant improvements for Transformers.js developers; including features like conditional typing of pipelines, inline documentation with code snippets, and pipeline-specific call parameters and return types. See here for more.- Direct Mistral / Llama / TinyLlama safetensors pull from the Hub: MLX:
@reach_vb
confirmed that MLX can now pull Mistral/ Llama/ TinyLlama safetensors directly from the Hub, including the support for all mistral/ llama fine-tunes too! More information about the installation here. - Gradio brings out 4.13 version with critical fixes and compatibility: Version 4.13 will come with fixes for Button +
.select()
+ Chatbot, security patches, and compatibility with Python 3.12. Check out the comprehensive Changelog. - Swifter Whisper with speculative decoding: A noteworthy improvement cited was a 200% faster Whisper thanks to speculative decoding. See the tweet for more.
Links mentioned:
- Tweet from Linoy Tsabanđïž (@linoy_tsaban): Letâs go 2024 đ: đ training script in đ§š @diffuserslib leveraging techniques from the community: â pivotal tuning (from @cloneofsimo cog-sdxl) ⥠prodigy optimizer (from kohyaâs scripts) + âŠ
- Tweet from Sayak Paul (@RisingSayak): đ§š diffusers reached 20k stars on GitHub đ« But like many others, I am not a firm believer in this metric. So, letâs also consider the number of repos that rely on it and the SUM of their stars. âŠ
- Tweet from Xenova (@xenovacom): đš Weâre kicking off 2024 with several improvements for Transformers.js developers: - Conditional typing of pipelines based on task. - Inline documentation + code snippets. - Pipeline-specific calâŠ
- Tweet from Vaibhav (VB) Srivastav (@reach_vb): PSA đŁ: MLX can now pull Mistral/ Llama/ TinyLlama safetensors directly from the Hub! đ„ pip install -U mlx is all you need! All mistral/ llama fine-tunes supported too! 20,000+ checkpoints overall!âŠ
- Gradio Changelog: Gradio Changelog and Release Notes
- Tweet from Vaibhav (VB) Srivastav (@reach_vb): Parakeet RNNT & CTC models top the Open ASR Leaderboard! đ Brought to you by @NVIDIAAI and @suno_ai_, parakeet beats Whisper and regains its first place. The models are released under a commercialâŠ
- 2023, year of open LLMs
- Welcome aMUSEd: Efficient Text-to-Image Generation
â· #general (22 messagesđ„):
-
Event Reading Group on the Block:
@admin01234
enquired about the timings of the event reading group, to which@lunarflu
responded affirming its occurrence later in the week with additional synchronous discussions happening in the discord thread. -
Machine Learning Courses Query:
@daksh3551
sought suggestions for a structured course (both paid and free) on Machine Learning. -
Enigma of the Phi-2 Behaviour:
@admin01234
reported a peculiar phenomenon where the Phi-2 model would output a correct response followed by seemingly random responses. -
Desire for High Power Computing Environments: In a lengthy discourse,
@s4vyss
expressed difficulty using free computing resources like Kaggle and Google colab notebooks for larger projects due to lack of auto completion, error debugging, and limitations of working in a single notebook. The user wondered about alternative machine learning coding environments that could offer free computing power for local coding. -
Identifying PII in Headers using StarPII:
@benny0917
shared his experience attempting to identify Personal Identifiable Information (PII) in headers using the StarPII model from Hugging Face. The model struggles to correctly identify context-dependent PII headers.
Links mentioned:
â· #today-im-learning (7 messages):
- Benchmark Results for Mistral-7B-instruct & vLLM: User
@harsh_xx_tec_87517
detailed their benchmark results forMistral-7B-instruct
withvLLM
on LinkedIn, stating itâs a great library for deploying OSS LLMs. Detailed benchmarking results can be found on their LinkedIn post. - Completion of the DRL course:
@cloudhu
announced the completion of their DRL course and received congratulations from@osanseviero
. - Queries about the DRL course:
@muhammadmehroz
showed interest in pursuing the DRL course. In response, both@gduteaud
and@cloudhu
suggested the Deep Reinforcement Learning Course provided by Hugging Face, which can take one from beginner to expert level.
Links mentioned:
Welcome to the đ€ Deep Reinforcement Learning Course - Hugging Face Deep RL Course
â· #cool-finds (3 messages):
- Fine-tuning VLM like LLaVa: User
@silamine
asked for any research papers or GitHub repos that provide guidance on fine-tuning a VLM like LLaVa. - Embed charts in readme with mermaid:
@not_lain
recommended a tool for embedding charts into readme files called mermaid, and shared the GitHub link of the tool. - AI Integration in Video Games:
@papasancho
shared their perspective on AI integration in video games, considering it a significant advance since the Atari 2600.@papasancho
identified Herika and Mantella as examples of this innovation and shared links to the Nexus Mods pages for both Herika and Mantella which use AI technology to enhance in-game interactions.
Links mentioned:
- GitHub - mermaid-js/mermaid: Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown: Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown - GitHub - mermaid-js/mermaid: Generation of diagrams like flowcharts or sequence diagrams fromâŠ
- Herika - The ChatGPT Companion: âHerika - The ChatGPT Companionâ is a revolutionary mod that aims to integrate Skyrim with Artificial Intelligence technology. It specifically adds a follower, Herika, whose responses and interactions
- Mantella - Bring NPCs to Life with AI: Bring every NPC to life with AI. Mantella allows you to have natural conversations with NPCs using your voice by leveraging Whisper for speech-to-text, LLMs (ChatGPT, Llama, etc) for text generation,
â· #i-made-this (3 messages):
-
Worldâs Fastest Conversational AI Unveiled:
@vladi9539
shared a YouTube video of their attempt at creating the worldâs fastest conversational AI software. The software is about real-time conversations with AI and its implementation boasts the lowest latency conversational algorithm compared to current technologies. -
AlgoPerf Competition Launched:
@franks22
announced the recent launch of theAlgoPerf
competition, aimed at finding the best algorithm for training contemporary deep architectures. The competition, open to everyone, offers a $25,000 prize in each of its two categories. More information can be found in their GitHub repository. -
MusicGen Extension Update Announced:
@.bigdookie
updated everyone about new features added to the MusicGen browser extension. This includes undo functionality as well as the ability to crop AI-generated music. They also invited community members to test the extension, shared through a YouTube link, and asked for help in enhancing the speed of MusicGen outputs.
Links mentioned:
- Tweet from thecollabagepatch (@thepatch_kev): ok the browser extension for ai remixes of youtube tracks now has cropping / undo functionality lots of @_nightsweekends but itâs rdy for a user or ten. dm me #buildinpublic https://youtubâŠ
- AI roasting his programmer (I made worldâs fastest conversational AI): This is me having a conversation with an AI in real time.This implementation has the lowest latency conversational algorithm compared to everything Iâve seenâŠ
- GitHub - mlcommons/algorithmic-efficiency: MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.: MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models. - GitHub - mlcommoâŠ
â· #reading-group (5 messages):
- Splitting Databases for Efficient Learning:
@chad_in_the_house
suggested a method involving the separation of databases into development db and testing db. The process involves answering questions in the development db, storing correct answers along with their text embeddings and chains of thought, and using this information for in-context learning for new questions. - SG Event Proposal: @lunarflu indicated plans to set up an event tomorrow but did not disclose further details.
- Invitation to Discuss in Reading Group:
@lunarflu
showed interest in@bluematcha
âs topic and extended an invitation for a deeper coverage in the next Reading Group discussion. - Variational Inference Book Promotion:
@ypbio
shared information about a book on variational inference that claims to contain comprehensive review of the topic and everything needed to develop world-class foundational machine learning expertise. They included a link to the bookâs website, www.thevariationalbook.com. - Time Zone Challenges:
@skyward2989
expressed disappointment that the discussion would take place at an inconvenient time for them, specifically at 3 AM.
Links mentioned:
The Variational Inference Book: A comprehensive review of variational inference in one concise book.
â· #diffusion-discussions (3 messages):
-
Loading and Fusing LoRA Weights:
@sayakpaul
shared detailed instructions on how to load and fuse LoRA weights into the base UNet model. The commands shared were:pipeline.load_lora_weights()
and thenpipeline.fuse_lora()
. -
Query on Attention Mechanisms:
@grepolian
asked about the differences between attention and self-attention, but no explanation was provided in this chat history. -
Efficient LoRA Inference Discussed in Blog Post:
@yondonfu
posted a link to a recent Huggingface blog post on optimizing LoRA inference, elaborating efficient ways to load LoRA adapters and speed up inference. Key points include the observation that batching was not significantly improving throughput for diffusers and increased the latency six times. -
Batching with Diffusers Not Effective?:
@yondonfu
drew particular attention to subject of batching with diffusers, questioning the utility of the technique given the minor throughput increase with an 8 batch size contrasted with a six-fold increase in latency, and asked for further insight into why this might be the case. The questions were left unanswered at the close of the chat.
Links mentioned:
Goodbye cold boot - how we made LoRA Inference 300% faster
â· #computer-vision (2 messages):
- Query on Real-Time CCTV Use Case: User
@iloveh8
brought up a discussion around implementing GPT-V or LLAVA for real-time CCTV use such as theft detection or baby monitoring. - Skew Detection Resource Shared: In response,
@vishnu5n
shared their work which models skewed document images with their respective skewness. The detailed work can be found on Kaggle at this link which could potentially serve as a reference for similar problem statements.
Links mentioned:
skew_detection: Explore and run machine learning code with Kaggle Notebooks | Using data from rdocuments
â· #NLP (18 messagesđ„):
- SQL Injection Vulnerabilities in LLMs:
@jryarianto
discussed potential latency issues with computational resources in a developing country and asked about strategies to defend against SQL injection attacks. They suggested using parameterized queries for this purpose. They referenced a paper on arXiv that provides a comprehensive examination of a potential type of SQL injection attack that could occur with Language Models (LLMs). - Open Source LLM Model Suggestions for Conversational Chatbot:
@jillanisofttech
asked for suggestions for an open-source LLM model suitable for fine-tuning on a large custom dataset of PDF, txt, and docs files. They need to develop a conversational chatbot that can handle text and voice input, and were also interested in knowing an appropriate framework for building the application. - Text Generation Using NSQL-2B Model:
@madi_n
asked about settingmax_new_tokens
to a value greater than 2048 in a text generation task using the NSQL-2B model. They sought clarity on whether it is alright to increasemax_new_tokens
, considering the modelâs predefined max length. - Fine-tuning Mistral 7BâIdentifying the Correct Syntax:
@denisjannot
inquired about the correct syntax to use while fine-tuning Mistral 7B, noticing some irregularities when using the trained model.@asprtnl_50418
helped by suggesting the consistent use of the same prompt template that was used during the initial model training, and also provided a link to the End Of String (EOS) token. - GPU Usage for the Suno/bark-small Model in TTS Task:
@x_crash_
presented a question about enabling explicit GPU use withsuno/bark-small
model on Google Colab, after noticing that the model did not seem to be utilizing the GPU resources. They provided their Python script to illustrate their attempt.
Links mentioned:
- tokenizer_config.json · mistralai/Mistral-7B-v0.1 at main).)
- From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application?: Large Language Models (LLMs) have found widespread applications in various domains, including web applications, where they facilitate human interaction via chatbots with natural language interfaces. IâŠ
â· #diffusion-discussions (3 messages):
- Instructions on loading LoRA:
@sayakpaul
shared the method to load the LoRA weights into the base UNet model by callingpipeline.load_lora_weights()
andpipeline.fuse_lora()
. - Question about attention mechanisms:
@grepolian
posed a question about the difference between attention and self-attention. (Further discussion or responses not provided). - Deep dive into LoRA inference optimization:
@yondonfu
referenced a section from a HuggingFace blogpost discussing how LoRA inference has been optimized. They noted that it explained batching with diffusers doesnât improve performance significantly and usually results in a higher latency. Two questions derive from this finding:- Is batching with diffusers generally not worthwhile due to marginal throughput gains and significant latency increases?
- Whatâs the logic behind the observation that batching with diffusers does not improve performance given that there is enough VRAM available?
Links mentioned:
Goodbye cold boot - how we made LoRA Inference 300% faster
Perplexity AI Discord Summary
- API and Model Outages Notices: User
@phinneasmctipper
reported outages forpplx-7b-online
andpplx-70b-online
models via the API and API sandbox. User@monish0612
in #pplx-api reported a separate 500 internal server error.@icelavaman
acknowledged the issue. - Call for Improved Citation Format in AI Responses:
@Chris98
suggested replacing numerical citation references with hyperlinks in Perplexityâs AI responses. The idea received support from@byerk_enjoyer_sociology_enjoyer
. - Clarification on Subscription Billing During Free Trial:
@alekswath
questioned unexpected immediate billing when switching from a monthly to an annual plan during a free trial, triggering a discussion on subscription pricing. - Integration of Perplexity as a Search Engine: User
@bennyhobart
wanted to set up Perplexity as a default search engine in Chrome.@mares1317
shared a link to the Perplexity - AI Companion extension on the Google Web Store. - Changes Noticed in Claude 2.1 Responses: A shift in Claude 2.1âs tone was remarked by
@Chris98
and@Catto
who were unsatisfied with the AIâs recent responses, comparing them to GPT Copilotâs style and expressing a wish for Claude 2.1âs original voice. - Issue in Citing Sources: In the #pplx-api channel,
@hanover188
asked about pplx-70b-online modelâs capability to cite sources and@brknclock1215
clarified that it currently does not cite sources like the Perplexity app does, hinting at a potential future update but corrected later that it is currently not on Perplexityâs roadmap.
Perplexity AI Channel Summaries
â· #general (51 messagesđ„):
- Possible Outages in Perplexityâs Models:
@phinneasmctipper
reported encountering 500 error codes when trying to access thepplx-7b-online
andpplx-70b-online
models via the API and API sandbox.@icelavaman
acknowledged the issue and promised a fix despite currently being outside working hours. Link to the conversation - Request for Citation Format Change in Responses: User
@Chris98
raised a request to replace the numerical citation references with hyperlinks in Perplexityâs AI responses.@byerk_enjoyer_sociology_enjoyer
agreed and got support from@Chris98
via a â emoji reaction on a relevant issue they had previously raised. - Question on Subscription Pricing:
@alekswath
questioned why they were immediately billed $200 upon trying to switch to an annual plan from a monthly one during a free trial. They inquired if there were issues with the free trial. - Utilizing Perplexity as Default Search Engine:
@bennyhobart
asked how to set up Perplexity as the default search engine in Chrome.@mares1317
shared a link to the Perplexity - AI Companion extension on the Google Web Store to help with this. - Change in Claude 2.1 Responses:
@Chris98
and@Catto
expressed dissatisfaction with Claude 2.1âs recent responses, noticing a perceived decrease in quality and a shift in tone to sound more like GPT Copilot. They wished for a return to the original Claude 2.1.
Links mentioned:
- Application Status
- What is Search Focus?: Explore Perplexityâs blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
- Perplexity - AI Companion: Ask anything while you browse
- Reddit - Dive into anything
â· #sharing (8 messagesđ„):
- Stress Testing Web Application Resources Shared:
@whoistraian
provided a link for resources on stress testing a web application. - Inquiry about Perplexity AIâs Functionality:
@myob171
asked if Perplexity AI is an AI search engine. - Discord Channels Links Shared:
@mares1317
shared two Discord channel links, possibly containing other related discussions. - OpenAI Response Shared:
@__sahilpoonia__
posted a link regarding how OpenAI responds to certain queries. - Praise for Perplexityâs Calendar Integration:
@clockworksquirrel
highlighted how Perplexity simplifies calendar management via natural language, especially beneficial due to their physical disability. They also noted the usefulness of copying and pasting within the tool. - Gratitude Expressed for Perplexity:
@siriusarchy
expressed their appreciation for Perplexity. - Volkswagen Incorporates ChatGPT: According to
@ipsifu
, Volkswagen has integrated ChatGPT into its car systems.
â· #pplx-api (6 messages):
-
500 Internal Server Error: User
@monish0612
reported experiencing a 500 internal server error with the API for several hours. They are a paid user and are hopeful for a quick resolution. -
PPLX-70b-online modelâs Source Citation Issue:
@hanover188
asked if itâs possible for pplx-70b-online models to cite their sources like the Perplexity app does. They mentioned needing this for a build that requires summarized real-time data with actionable source information. -
No Direct Citation in PPLX-70b-online: In response to
@hanover188
âs query,@brknclock1215
provided a link and a summary of the source stating, âno - the pplx-70b-online model does not directly cite sources like the Perplexity app does⊠Adding support for grounding facts and citations is on Perplexityâs roadmap for the future.â -
Feature Not on the Roadmap?: Contrary to earlier information,
@brknclock1215
later corrected that support for grounding facts and citations is actually not on Perplexityâs roadmap, providing another discord link to a discussion confirming this.
LAION Discord Summary
-
AI Debates Heat Up - âUtopia or Not?â: A spirited discussion led by
@SegmentationFault
sought to dissect the viewpoints of anti-AI critics, concluding that their stances largely stem from virtuous signaling and unrealistic utopian considerations.@mkaic
further added to the discourse, asserting that the utopian goal of guaranteed income is more attainable with AI permeation. User.undeleted
humorously posited the three-step plan of anti-AI proponents: ban AI, preserve current jobs, and prevent future job elimination through technology. These assertions were met with resistance by@SegmentationFault
, who championed AIâs role in productivity and global competitiveness. -
Pizza AI or Human?: Amidst the steady flow of AI discourse,
@thejonasbrothers
injected an amusing angle to the conversation, posing a lighthearted question as to whether the sentence âI am a pizzaâ was crafted by AI or not. -
Game-changing AI Training Hacks surfaced:
@pseudoterminalx
introduced innovative AI training techniques in an in-depth discussion, expounding on the advantages of biasing towards early timesteps in the selection process. By manifesting successful results using Eulerâs image with zero-terminal SNR that surpassed the previous midjourney v6, this user circumvented traditional precedent, additionally endorsing the concurrent use of random crops and full frames. -
AI Detector Faces Credibility Crisis:
@lixiang01
expressed doubt about the efficacy of a specific AI detector, arguing that it can be effortlessly deceived with carefully constructed prompts. -
State Space Models vs Transformers: A research paper shared by
@thejonasbrothers
unveiled the rising dominance of State Space Models and Mixture of Experts over Transformers with the development of MoE-Mamba. The paper can be accessed here. -
Watermarking Woes for Generative Models: A blog post, based on a research paper titled Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models, was highlighted by
@chad_in_the_house
. The post delves into the challenges of watermarking generative models while preserving output quality and enabling AI verification. The full post can be found here. -
HiDiffusion Framework - A New Viable Option:
@gothosfolly
drew attention to a breakthrough text-to-image diffusion models framework called HiDiffusion, which can create high-resolution images. The research paper on HiDiffusion can be viewed here. -
Precise Location of RAU Block Questioned:
@gothosfolly
also raised questions about the exact location of the RAU block in the SDXL of HiDiffusionâs architecture, as described in the paper.
LAION Channel Summaries
â· #general (44 messagesđ„):
-
Captain AImerica and the Luddite League:
- There was a lively debate led by
@SegmentationFault
concerning the views and plans of anti-AI critics, especially those active on Twitter. Largely critical of their stances, he probed for insight into their greater scheme for halting AI development, before dismissing most of them as more inclined towards virtue signaling and unrealistic utopian ideals. Quoted,@SegmentationFault
: âEven if some country bans AI entirely, some other country will not, and companies there will be more productiveâ.
- There was a lively debate led by
-
*AI Utopia - Champagne Dreams: Mkaic chimed into the conversation noting the irony in anti-AI activistsâ utopian dreams.
@mkaic
argued that a utopia, where everyone is paid to exist, is more feasibly achieved by letting AI do all the work, rather than banning AI. -
**AIwhile, back at âAnti-AI ban malfunctionâ*: In a slightly satirical exchange, user
.undeleted
suggested a three-point plan he believes to be the mindset of anti-AI critics: Ban AI, maintain current jobs, avoid using tech to eliminate jobs in the future.@SegmentationFault
rebutted, pointing out that companies need to be competitive, and AI increases productivity, a reality he considered inevitable. -
âI am a Pizzaâ - Human or AI?:
@thejonasbrothers
made a playful contribution to the chat, writing the sentence âI am a pizza,â then asking if it was written by an AI, sparking a lighter, more humorous tone amidst the serious chat. -
AI Training Hacks with
Pseudoterminalx
: User@pseudoterminalx
unveiled some AI training hacks in an extensive, more technical discussion. They discussed the benefits of training on early timesteps, using a 50x bias on the probability of early timesteps being selected. They noted that this didnât eliminate others from the pool but tilted things notably. They flaunted these techniquesâ effectiveness by sharing several images, including one from Eulerâs with zero-terminal SNR, and asserted its improved quality than the previous midjourney v6. Towards the end, they added another trick - training on random crops and full frames (mixed) of 2160p blu-ray rips.
Links mentioned:
Phase1 Collect Underpants GIF - Phase1 Collect Underpants Gnome - Discover & Share GIFs: Click to view the GIF
â· #research (18 messagesđ„):
- Impossible task for AI detector:
@lixiang01
voiced skepticism about the effectiveness of a particular detector, stating that itâs quite impossible for it to work as, âthis kind of detector can be easily fooled by contents generated by a carefully written prompt.â - State Space Models challenge Transformers:
@thejonasbrothers
shared an informative research paper regarding the dominance of State Space Models (SSMs) and Mixture of Experts (MoE) over Transformers, focusing on the development of MoE-Mamba, which shows better performance while preserving the inference performance gains of Mamba against the Transformer. The paper can be accessed here. - Watermarking challenges in generative models:
@chad_in_the_house
introduced a blog post discussing the limits of strong watermarking for generative models, suggesting that even if creators watermark their outputs, it would be difficult to maintain quality while having the AI verifiable. The post was based on a paper called Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models and can be directly accessed here. - Insights on HiDiffusion framework:
@gothosfolly
brought attention to a paper about HiDiffusion, a tuning-free framework designed to enable pretrained text-to-image diffusion models to generate high-resolution images. The paper can be found here. - Questions raised about RAU block:
@gothosfolly
sought clarification if the RAU block for SDXL in HiDiffusionâs architecture is located one block later than it should be according to the paperâs appendix.
Links mentioned:
- MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts: State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly imâŠ
- HiDiffusion: Unlocking High-Resolution Creativity and Efficiency in Low-Resolution Trained Diffusion Models: We introduce HiDiffusion, a tuning-free framework comprised of Resolution-Aware U-Net (RAU-Net) and Modified Shifted Window Multi-head Self-Attention (MSW-MSA) to enable pretrained large text-to-imageâŠ
- Watermarking in the sand: Watermarking is often touted as an important safety consideration for AI generative models. The new executive order on AI safety highlights it as a key tool for combating âAI-enabled fraud and deceptiâŠ
Mistral Discord Summary
- Tinybox Revealed: In a discussion about Tinybox,
@digitalbo
explained that it was a small âsupercomputerâ designed to run at home. The link provided gave more insight into the product. - Mistral and Web Browsing Disconnect: When
@ajkuba
asked if Mistral could be used for web browsing,@sublimatorniq
clarified that unlike ChatGPT Plus, Mistral does not have a browsing feature or âfunction callingâ. - Deploying Models on Raspberry Pi 5 Explored:
@psdc4171
queried on how to run 7b on a Raspberry Pi 5.@ethux
provided resources to GGUF models compatible with Raspberry Pi, model suggestions from HuggingFace such as OpenChat 3.5 1210 and Mistral 7B Instruct v0.2, and WebUIâs from ooobaboogaâs GitHub repository and HuggingFaceâs chat-ui GitHub repository. - Deployment Issues with Mixtral 46B on AWS:
@sa_code
experienced deploying an unquantised Mixtral 46B issue on a g5.48xlarge AWS instance wherevllm
failed to load the model into memory even after sharding using âtensor-parallel-size 8. A solution was not found in the discussion. - Fine-tuning Troubles and Explorations: Conversations revolved around issues faced while fine-tuning Mistral using a 4090 (
@wilzh40
), questions about whether a single A100 would suffice for full Mistral 7b Instruct training (@dinonst74
), the effectiveness of fine-tuning with specific domain chat logs (@nickbro0355
), and struggles in training Mistral 7b Instruct for text to SQL (@dinonst74
).@adriata3
described their unsuccessful fine-tuning attempts with QLoRA 4-bit. - Paper on Mixtral of Experts Released:
@sophiamyang
shared a new paper on Mixtral of Experts https://arxiv.org/pdf/2401.04088.pdf. - Shout out to Vanna, the SQL helper:
@zain_vanna
announced the addition of Mistral integration to Vanna, a Python package using RAG for SQL generation for databases in GitHub Repository introduced. - Upset about Mistralâs API Latency: Users across the guild expressed their concerns about the variance in the Mistral APIâs response times, sometimes taking 5-9 seconds for a response.
@lerela
, a member of the Mistral team, stated they are actively working on improving response times. A suggestion for Memstral to incorporate a function akin to OpenAIâs âfunctionâ tokens was also discussed (@astel123457
).
Mistral Channel Summaries
â· #general (13 messagesđ„):
- What is Tinybox?: In response to
@gbourdin's
query about Tinybox,@digitalbo
defined it as a small âsupercomputerâ designed to run at home.@digitalbo
also shared a link for more information. The cost however was noted to be $15,000. - Mistral and Web Browsing:
@ajkuba
asked whether Mistral can be used for web browsing.@sublimatorniq
clarified that, unlike ChatGPT Plus, Mistral does not have a browsing feature nor âfunction callingâ. - Request for Guidance on Project:
@saga04
, a software engineer and startup founder, requested advice on embarking on a project to create a âworld teacherâ for children. - Code Generation with Open Source Model:
@xquietude
asked about the ability of OpenAIâs last open-source (7B) model to generate code.@.superintendent
confirmed the modelâs ability to generate code, while@sophiamyang
suggested that Mistral 8x7B could be better for this purpose.
â· #deployment (9 messagesđ„):
- Running 7b on Raspberry Pi 5: User
@psdc4171
sought advice on how to run 7b on their Raspberry Pi 5.@ethux
provided a range of resources including GGUF models compatible with Piâs from this GitHub repository, model suggestions not bigger than 4 bits from HuggingFace such as OpenChat 3.5 1210 and Mistral 7B Instruct v0.2, and WebUIâs for easy testing from ooobaboogaâs GitHub repository and HuggingFaceâs chat-ui GitHub repository. - Deployment error on AWS with unquantised Mixtral 46B:
@sa_code
tried to deploy an unquantised Mixtral 46B on AWS using the instance g5.48xlarge and ran into an issue wherevllm
failed to load the model into memory, even with the model sharded with âtensor-parallel-size 8.@ethux
expressed uncertainty, stating 192GB VRAM should be enough for the operation.@sa_code
suspects the issue might be with thevllm
package.
Links mentioned:
- GitHub - ggerganov/llama.cpp: Port of Facebookâs LLaMA model in C/C++: Port of Facebookâs LLaMA model in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
- TheBloke/openchat-3.5-1210-GGUF · Hugging Face
- TheBloke/Mistral-7B-Instruct-v0.2-GGUF · Hugging Face
- GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models. - GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language ModâŠ
- GitHub - huggingface/chat-ui: Open source codebase powering the HuggingChat app: Open source codebase powering the HuggingChat app. Contribute to huggingface/chat-ui development by creating an account on GitHub.
â· #finetuning (13 messagesđ„):
- Seeking success with Mistral fine-tuning using a 4090:
@wilzh40
asked if anyone had any success with fine-tuning Mistral using only a 4090, as well as using it solely for inference.@adriata3
responded that while they worked with QLoRA 4-bit for fine-tuning, the results were not as desired. - Will a single A100 be sufficient for full Mistral 7b Instruct training: User
@dinonst74
asked if a single A100 with 40 GB would be enough, or 80GB is needed, for full Mistral 7b Instruct training which is not 4bit and lora. - Curiosity about fine-tuning off of chat logs:
@nickbro0355
seeks advice on an effective way to fine-tune a model using specific domain chat logs, wondering if there would be much benefit from fine-tuning off user-approved chats or if itâs better to create own fine-tuning information. - Inference on vllm explained:
@wilzh40
asked@adriata3
what inference on vllm meant. A link to the vllm project on GitHub was shared in response. - Struggling with Training Mistral 7b Instruct for text to SQL:
@dinonst74
shared they were a beginner on the finetuning front and were trying to fine-tune Mistral 7b Instruct for text to SQL with mssql (T-SQL) syntax generation. Despite having created a custom dataset and run training on A1000 for 6000 segments, they were unsatisfied with the results and sought advice on improving them. The discussion included links to their custom dataset on huggingface, their process on Google Colab, and their project results on wandb.
Links mentioned:
- GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs: A high-throughput and memory-efficient inference and serving engine for LLMs - GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs
- dnovak232/sql_create_context-v4-mssql-instruct · Datasets at Hugging Face
- Google Colaboratory
- dino232: Weights & Biases, developer tools for machine learning
â· #announcements (1 messages):
sophiamyang: New paper on Mixtral of Experts: https://arxiv.org/pdf/2401.04088.pdf
â· #showcase (3 messages):
- Waiting no more:
@gbourdin
expressed frustration about being on the waiting list for several days with no additional info on the website.@joselolol.
reassured them that they were patching things up for a general release and happy to give early access. - Hello, Vanna:
@zain_vanna
announced the addition of Mistral integration into Vanna, a Python package that utilizes RAG for SQL generation for databases. This was accompanied by a link to the GitHub Repository.
Links mentioned:
GitHub - vanna-ai/vanna: đ€ Chat with your SQL database đ. Accurate Text-to-SQL Generation via LLMs using RAG đ.: đ€ Chat with your SQL database đ. Accurate Text-to-SQL Generation via LLMs using RAG đ. - GitHub - vanna-ai/vanna: đ€ Chat with your SQL database đ. Accurate Text-to-SQL Generation via LLMs using RâŠ
â· #la-plateforme (7 messages):
- Concerns About Waiting Time for Mistral API: User
@alimsss
raised an issue about the variance in waiting times for the Mistral API, with responses sometimes taking 5-6 seconds and sometimes arriving almost instantly.@michaelwechner
suggested that this could be due to peak traffic times resulting in queued requests, while@casper_ai
shared similar experiences with waits of up to 9 seconds. - Mistral Teamâs Response to API Latency Concerns:
@lerela
, a member of the Mistral team, addressed the issue by stating that they are actively working on improving the response times. - Suggestion for Local Function Implementation: User
@astel123457
discussed the possibility of Mistral incorporating a function akin to OpenAIâs âfunctionâ tokens, which would allow for calls to local functions and return responses based on the output of that function. This would give the bot greater versatility in coding tasks. - User Experiences on API Latency:
@sublimatorniq
and@casper_ai
discussed their experiences with API latency. Both have seen a range of response times, with@sublimatorniq
remarking that the potential indicated by the faster response times gives hope for the future.
LangChain AI Discord Summary
- LangChain Questions Needing Answers:
@aaravvv__
and@rajvir3
asked about sequence diagrams in Pinecone and prospects oflangchain/chat_models/messages
availability in node_modules respectively, spawning discussions on LangChain utilities. - Heavyweights Square Off: BM25Retriever vs. Large Docs:
@tigerinus
has been wrestling with usingBM25Retriever
over a significant volume of disk-stored documents, an issue that is ripe for an answer. - Saving the Day with Code!:
@uvizc_43278
reported that hisLangChain RetrievalQA.from_chain_type
app broke due to a deprecatedtext-davinci-003
model, but he had a solution ready. This might be a useful fix for anyone encountering a similar issue. - Llamafile, LangChainâs Hero?:
@rawwerks
excitedly outlined the potential of llamafile in simplifying LLM deployment across multiple OS, hinting at the dawn of a new era of LLMs. - Setting the Stage for Multi-Embeddings: User
@dejoma
initiated talks to find the ideal structure where the input and output are more than one embedding.
LangChain AI Channel Summaries
â· #general (28 messagesđ„):
- Sequence Diagram Loading Query:
@aaravvv__
asked if there is any way to load a sequence diagram in Pinecone using LangChain. - Issues on Using BM25Retriever over Large Documents:
@tigerinus
sought experience and help on usingBM25Retriever
over a huge amount of documents from disk. - LLM, ML and NLP Conferences in UK 2024:
@stuartjatkinson
inquired about any good conferences on LLM, ML, NLP in the UK in 2024. - Error on LangChain Import:
@azharudeen_02613
encountered an error when usingimport load_qa_chain
fromlangchain.chains.question_answering
, reporting a validation error regarding abstract classBaseLanguageModel
. - LangChain JS Import Issue:
@rajvir3
reported an issue when importingimport { HumanMessage, SystemMessage } from "langchain/chat_models/messages";
in LangChain JS, getting anERR_PACKAGE_PATH_NOT_EXPORTED
error message and confirming that there is nolangchain/chat_models/messages
in node_modules. - Potential of llamafile in LLMs:
@rawwerks
expressed how impactful llamafile could be in deploying fine-tuned models on multiple OSs, implying it could be a game-changer for LLMs. - Deprecated Model Issue:
@uvizc_43278
reported that his app using LangChainRetrievalQA.from_chain_type
stopped working due to thetext-davinci-003
model being deprecated and returned a solution to the issue. - Comparison of LangChain Agents and Assistant API:
@sheldada
and@evolutionstepper
engaged in a conversation discussing the difference and efficiency between LangChain Agents and the assistants API. - Message Threading Issue with Assistant:
@zainsheikh
had a problem with the thread ID while using an assistant invoke command, reporting that a new thread ID is created instead of adding a message to the specified thread ID. - Broken Link to Biweekly Release Notes:
@aaronsmith4931
reported that the link to subscribe to the biweekly release notes was broken and sought for help. - Python & JavaScript Import Errors:
@rajvir3
reported encountering errors when trying to import the LangChain OpenAI in Python and JavaScript. The issue was resolved by@hasan_34148
who suggested usingpip install langchain-openai
.
Links mentioned:
- Langchain with FastAPI in Docker with Traefik [Code Included]: A tutorial on how to use langchain with FastAPI.h
- LangChain Newsletter: Sign up for our biweekly updates!
- Issues · Mozilla-Ocho/llamafile,): Distribute and run LLMs with a single file. Contribute to Mozilla-Ocho/llamafile development by creating an account on GitHub.
â· #langserve (6 messages):
- KeyError bugs on the history variable:
@cryptossssun
noted that after fixing an error, a new KeyError linked to the history variable arose. - A call for model configuration advice:
@pgpaul161201
thanked<@1033432389516546158>
for their contributions, and inquired about a more streamlined method for allowing users to handle LLM backend configuration settings such as API key, endpoint, and organization name. - Potential example needed:
@a404.eth
requested an example related to the ongoing conversation. - Input type specification through Passthrough and Pydantic:
@veryboldbagel
provided advice on specifying input types in langchain chains and validated the importance of sanity-checking schema. They shared a code snippet from langserve for reference. - Dynamic field configuration with Pydantic:
@veryboldbagel
suggested a method for dynamically generating configurations for models by listing fields and their types using Pydantic, specifically by subclassing from ChatModel. They provided a code snippet demonstrating this technique.
Links mentioned:
langserve/examples/passthrough_dict/server.py at main · langchain-ai/langserve: LangServe đŠïžđ. Contribute to langchain-ai/langserve development by creating an account on GitHub.
â· #share-your-work (3 messages):
- LangChain FastAPI Starter Revealed: User
@evolutionstepper
shared a GitHub repository for LangChain FastAPI Starter. - LangChain FastAPI Starter Tutorial Available:
@evolutionstepper
also shared a YouTube tutorial titled Langchain with FastAPI in Docker with Traefik [Code Included], which provides guidance on how to use langchain with FastAPI. - In Search of Solution for Multi-Embeddings: User
@dejoma
initiated a discussion seeking suggestions for a structure where the input and output are more than one embedding. A specific use-case mentioned was finding the best match for a video that is too large to be represented in a single chunk and must therefore be divided into multiple chunks.
Links mentioned:
Langchain with FastAPI in Docker with Traefik [Code Included]: A tutorial on how to use langchain with FastAPI.h
â· #tutorials (1 messages):
- Discussing Multi-Embedding Structures for Matching Large Videos:
@dejoma
raised a question about constructing a mechanism that accommodates more than one embedding for both input and output. He is particularly interested in devising a solution to find the best matching large video file that cannot be represented in a single chunk.
LlamaIndex Discord Discord Summary
- Deploying With Ease:
@wenqi_glantz
unveiled a thorough guide on deploying a@llama_index
app to AWS Fargate through the use of Terraform and automated CI/CD pipeline with@github Actions
. Hereâs the guide. - Hackathon Galore:
@llama_index
is organizing their inaugural in-person hackathon this February, with over $4,000 in prizes. The event welcomes RAG enthusiasts for collaborative ventures in uncharted projects. Registration details can be found here. - Simplifying Structure:
@andrejusb
shared a handy video tutorial on extracting structured JSON from invoices using Pydantic classes with@OLLAMA
. To learn more, you can catch the video here. - RAGs and Freelancers: A lively discussion took place regarding the feasibility and costs of freelancers building Retriever-Augmented Generation (RAG) systems for businesses.
@.kamja
,@lolipopman
, and@mr.dronie
chimed in on the complexities of production implementation despite simplicity in prototyping. - LlamaIndex Integration Queried: Both
@jace93
and@sridhar_10158
queried about the possibility of integrating Sqlite-vss and DeepInfra respectively with LlamaIndex. - LlamaIndex Learning Resources: A query for courses and learning resources for LlamaIndex was posted by
@asdw2.
. - RAG Inroads: Strategies to address RAG limitations as well as appreciations for the in-depth insights provided by
@bushdid420
were key highlights in AI discussions. Important insights included document summarization and chunking to address limitations in language modelâs context windows. Here is the shared paper. - Game Changer Llamafile: With its ability to deploy fine-tuned models in 6 different OSs,
@rawwerks
hailed llamafile as a game-changer but lamented the teamâs lack of interest in adding RAG capabilities or Python support. The relevant Github issue was highlighted. - Handling Context:
@bushdid420
spurred discussion into the challenges of processing long textual context in LLMs. The degradation in performance due to critical facts positioned in the context documentsâ middle sections was highlighted and a possible solution provided.
LlamaIndex Discord Channel Summaries
â· #blog (3 messages):
- Deploying LLM Apps on AWS Fargate Simplified:
@wenqi_glantz
has shared a step-by-step guide on how to deploy a@llama_index
app to a service on AWS Fargate using Terraform (@HashiCorp
) and with an automated CI/CD pipeline with@github Actions
. Detailed how-to post can be found here. - First In-person Hackathon by LlamaIndex:
@llama_index
is organizing their first in-person hackathon on February 2nd-4th aimed at bringing together RAG enthusiasts to collaborate on exciting new projects. The event offers over $4,000 in prizes. Event registration details here. - Getting Structured Output from LLM:
@andrejusb
has shared an educational video where he explains how to use@OLLAMA
to run a local model and use Pydantic classes to output structured JSON from invoices. Watch the video tutorial here.
Links mentioned:
â· #general (22 messagesđ„):
- Freelancing for Building RAG Systems:
@.kamja
explored the field of freelancers building Retriever-Augmented Generation (RAG) systems for businesses and the cost associated with it. This inquiry also interested@lolipopman
and@mr.dronie
, who expressed the simplicity in prototyping but the complexity in production implementation. - Understanding File Compatibility in a New Project:
@pichart
is seeking information on all file types a recently discovered project can comprehend. - Integration of LlamaIndex with Sqlite-vss:
@jace93
raised a query about the possibility of using LlamaIndex with Sqlite-vss. - Understanding LongContextReorder:
@langzeitstudent_41429
was interested in how the LongContextReorder works, particularly how the relevancy of each document is measured for reordering. - Incorporating User Feedback in create-llama TS:
@ballwave
is utilising create-llama TypeScript and is curious if thereâs a known way to incorporate user feedback into the application, such as upvote-downvote on answers and written commentary, to avoid redundancy. - Possibility of Using LangChain ToolKit in LlamaIndex:
@7leven
and@cheesyfishes
explored whether it is possible to utilize the LangChain ToolKit as a tool in LlamaIndex. - Integration of Mistral Model in LlamaIndex:
@sridhar_10158
sought help to integrate the Mistral model in LlamaIndex, showing the specific parameters. - Understanding ColBERTv2âs Storage Needs:
@wizboar
inquired if ColBERTv2 can utilize vector stores, or if it must load data into RAM. - Mismatch in Dependency Versions in llamaâs files:
@pveierland
noticed a mismatch in openai dependency versions.pyproject.toml
listedopenai = ">=1.1.0"
whereaspoetry.lock
had it asopenai = ">=0.27.8"
. - Building a RAG with Document References:
@erizvi
is working on a RAG system where documents reference other documents. They are using the OpenAI chat engine and are trying to figure out how to include referenced documents in the context provided to the llm for synthesis. A possible solution was suggested by@erizvi
as well. - Integration of DeepInfra with LlamaIndex:
@sridhar_10158
asked if anyone has tried integrating DeepInfra with LlamaIndex. - Seeking Courses to Learn LlamaIndex:
@asdw2.
was interested in finding any good courses that might provide instruction for learning LlamaIndex.
â· #ai-discussion (9 messagesđ„):
- Strategies for Addressing RAG Limitations:
@bushdid420
shared insights about various strategies including document summarization and chunking to handle the common issue in the RAG space of language models not being able to make sense of all added information due to their limited context windows. An important conclusion is that in spite of the large context windows, itâs challenging to maintain information importance in the middle of the context. - Llamafile - The NGINX of LLMs:
@rawwerks
hailed llamafile as a game-changer for practically deploying fine-tuned models instantaneously on 6 different OSs. However, the llamafile team showed no interest in adding RAG capabilities or Python support as highlighted in a Github issue. He proposed that combining llamaindex and llamafile could enable a free and private paradigm for advanced RAG. - Developing Intelligent Systems with OpenLLM and LlamaIndex:
@andysingal
shared a Medium article about the rise of open-source Large Language Models (LLMs) and how tools like OpenLLM and LlamaIndex have reshaped developer engagements with these models. - Resolving Content Accessibility in Long Contexts:
@bushdid420
further discussed challenges in processing long textual contexts with LLMs, stating that critical facts positioned in the middle sections of context documents often lead to degradation in performance. He suggested a possible solution could be found in the document âDealing with Long Contexts: LLMs - How to Find Whatâs in The Middleâ. - Appreciation for Insightful Discussion:
@benjaminbascary
expressed appreciation for the in-depth insights provided by@bushdid420
on context handling with LLMs, indicating the conversationâs value.
Links mentioned:
- Building Intelligent Systems with OpenLLM and LlamaIndex using Phi-2: Ankush k Singal
- Support uploading more file formats · Issue #149 · Mozilla-Ocho/llamafile: Hi, is there a way to customize the UI and inputs? For example, currently, the UI allows the uploading of images, but Iâd want to update it to accept CSV and PDF formats as well. If shown where toâŠ
- Dealing with Long Contexts LLMs: How to Find Whatâs in the Middle: As language models continue to evolve to ingest longer textual contexts, an emerging question poses challenges to their real-world reliability â can these models truly make sense of all that addedâŠ
- LongContextReorder - LlamaIndex đŠ 0.9.28.post2
DiscoResearch Discord Summary
- Mixtral Implementation Launches Paper: Sebastian.bodza announced the publishing of Mixtralâs paper on Arxiv.
- TACO: A Hot Discussion Topic for Code Retrieval: @sebastian.bodza brought the TACO Dataset into focus, suggesting its potential usability for code retrieval tasks and stimulating a discussion on the creation of âhard negativesâ. Different strategies like using a âbad modelâ, BM25 for code similarity, and model permutation were proposed by members like @bjoernp and @philipmay.
- Great Third Reich of Synthetic Data?: The forum heated up with @thewindmom and @bjoernp discussing the possible effects of synthetic data generation on model quality and the importance of data curation, with the later arguing for structured collection of learned data within embedding models.
- Model Troubleshooting: @philipmay inquired about a model that claimed an MRR@10 of 0.9139 on a German Dataset, sparking a question chain on different model-specific issues.
- Colbert Meets SQuAD2.0 & LLM Fine-tuning Datasets: @thewindmom shared a machine-translated version of SQuAD2.0 to Turkish for training Colbert model and introduced a GitHub repository for trending instruction fine-tuning datasets.
- E5-mistral-7b-instruct Chimes in: @aiui raised the question over quantized weights for the E5-mistral-7b-instruct model. @sebastian.bodza expressed reservations about the modelâs performance yet suggested potential help from a tutorial in the AWQâs pip project for its quantizing.
- Python DPO Dataset for Code Retrieval Strikes Chords: @bjoernp showcased Jon Durbinâs approach of using a Python DPO dataset for code retrieval tasks as seen in Durbinâs tweet, marking this usage of Vezora/Tested-22k-Python-Alpaca âchosenâ responses and 13b/7b model generations as rejected responses.
DiscoResearch Channel Summaries
â· #mixtral_implementation (1 messages):
sebastian.bodza: Paper for mixtral is released: https://arxiv.org/abs/2401.04088
â· #embedding_dev (16 messagesđ„):
- Discussing TACO Dataset for Code Retrieval Tasks:
@sebastian.bodza
shared the TACO Dataset and suggested that it could be useful for code retrieval tasks as it contains multiple code solutions for each problem.@bjoernp
reinforced the idea and wondered whether creating hard negatives by allowing a bad model to write the code could be an effective strategy. The discussion evolved with@philipmay
proposing alternative creation of hard negatives including algorithm BM25 for code similarity and model permutation. - Question on Modelâs Performance:
@philipmay
asked about a specific model that reached MRR@10 of 0.9139 on a German Dataset. - Regarding Synthetic Data Generation:
@thewindmom
quoted a valuable point which emphasized that synthetic data without new external knowledge could lead to worsening quality and stressed the importance of data curation.@bjoernp
disagreed for the case of embedding models stating the need for structured collection of already learned data. - Using SQuAD2.0 for Colbert Model and Datasets for LLM Fine-tuning:
@thewindmom
shared tweet about machine translation of SQuAD2.0 to Turkish for Colbert model training and a GitHub repository as a quick guide for trending instruction fine-tuning datasets. - E5-mistral-7b-instruct Model Question and Opinions:
@aiui
asked if there are any quantized weights available anywhere for the E5-mistral-7b-instruct model.@sebastian.bodza
expressed his skepticism about the model performance considering its size, but also suggested that the model could likely be quantized with the help of a tutorial in the AWQâs pip project. - Python DPO Dataset for Code Retrieval:
@bjoernp
shared a tweet of Jon Durbin, showing a similar approach of using a Python DPO dataset for code retrieval tasks, using items from Vezora/Tested-22k-Python-Alpaca as the âchosenâ responses, while using 13b/7b model generations as rejected responses.
Links mentioned:
- Tweet from Jon Durbin (@jon_durbin): đą Python DPO dataset This is uses items from Vezora/Tested-22k-Python-Alpaca as the âchosenâ responses, and 13b/7b gens as rejected (assumed to be worse, not ranked/validated). https://huggâŠ
- BAAI/TACO · Datasets at Hugging Face
- GitHub - Zjh-819/LLMDataHub: A quick guide (especially) for trending instruction finetuning datasets: A quick guide (especially) for trending instruction finetuning datasets - GitHub - Zjh-819/LLMDataHub: A quick guide (especially) for trending instruction finetuning datasets
- intfloat/e5-mistral-7b-instruct · Hugging Face
Latent Space Discord Summary
- New Age of CoPilot: @guardiang provided a fresh perspective on AI coding in a YouTube video titled âCopilot Prompt Engineering: 3 UI Frameworks, 2 AI Agents, 1 Coding Assistant (AIDER CCC)â, focusing on augmenting engineering abilities âat a rapid paceâ.
- In Search of Retriever-Augmented Generation (RAG) Document Set: A dialogue sprouted around @dgross211âs query on suitable document sets for a RAG project, with @swizec responding with questions about the specific nature of the documents required.
- R1, the Next Big Thing?: Igniting more interest in the R1 device, @mdcker shared a keynote presentation introducing the mysterious tech.
- Advancements in the Field of Few-Shot Prompting: @henriqueln7 pointed to a GitHub document of openai-python advocating âsystemâ role for few-shot prompting.
- Evaluating AI Assistants Simplified: @henriqueln7 initiated a thread for locating resources for straightforward evaluation metrics of AI assistants, remarking a need for simpler alternatives than OpenAI Evals.
- Language Learning Machine (LLM) State Machine Delivers: In a triumphant declaration, @davidkpiano shared the successful use of an LLM state machine in project langgraph, providing its GitHub repository for reference.
- Official OpenAI API Got Some Attention: @swyxio heralded the openai-python library on GitHub, an important resource for using the official OpenAI API.
- Mixture of Experts (MoE) Approach Leads to Phi-2 Victory: @swyxio shared news from @maximelabonne via a tweet about the success of their MoE model using phi-2, creating the efficient
Phixtral
, accessible on Hugging Face and Hugging Face. - Massive Language Model Reading List Available: For avid researchers, @eugeneyan shared a Language Model Reading List, a compilation of over 40 papers, while welcoming suggestions and issue submissions via their GitHub repository.
- Mixtral Gets Noticed: Along with the other models discussed, @swyxio highlighted the importance of another model, the
Mixtral
.
Latent Space Channel Summaries
â· #ai-general-chat (11 messagesđ„):
-
CoPilot through a Different Lens:
@guardiang
recommended a YouTube video about utilizing AI in a unique way to code with AI titled âCopilot Prompt Engineering: 3 UI Frameworks, 2 AI Agents, 1 Coding Assistant (AIDER CCC)â. The video is aimed at enhancing engineering abilities rapidly. -
Quest for RAG Doc Set:
@dgross211
asked for suggestions on finding a document set for a Retriever-Augmented Generation (RAG) project leading to a discussion on the matter.@swizec
responded by asking about the nature of the documents required. -
Introduction to Keynote Presentation of R1 device:
@mdcker
shared a link to the keynote presentation of a device known as R1. -
Exploration on Few-Shot Prompting:
@henriqueln7
shared a link to a GitHub document of openai-python, revealing that the âsystemâ role is the recommended one for few-shot prompting. -
Help requested for AI Assistant Evaluation Materials:
@henriqueln7
asked for recommendations for simple materials to aid in the evaluation of AI assistants that are in production. They mentioned having checked OpenAI Evals but were seeking simpler resources. -
LLM State Machine Success:
@davidkpiano
shared the GitHub repository of a project known as langgraph, indicating their success with LLM (Large Language Model) state machine. -
OpenAI Python Library Highlight:
@swyxio
shared the link to openai-python library on GitHub, a library for the official OpenAI API.
Links mentioned:
- rabbit keynote on Vimeo
- 01-rtk-query-generation.md: GitHub Gist: instantly share code, notes, and snippets.
- openai-python/chatml.md at logankilpatrick-patch-1 · openai/openai-python: The official Python library for the OpenAI API. Contribute to openai/openai-python development by creating an account on GitHub.
- Copilot Prompt Engineering: 3 UI Frameworks, 2 AI Agents, 1 Coding Assistant (AIDER CCC): Code like youâre in the future. This is the best way to progress your engineering abilities at a rapid pace. In groundbreaking video, we delve into pair progâŠ
- GitHub - langchain-ai/langgraph: Contribute to langchain-ai/langgraph development by creating an account on GitHub.
- GitHub - openai/openai-python: The official Python library for the OpenAI API: The official Python library for the OpenAI API. Contribute to openai/openai-python development by creating an account on GitHub.
â· #llm-paper-club (5 messages):
- MoE Approach Adopted for Phi-2: User
@swyxio
shared a tweet from@maximelabonne
describing their successful creation of an efficient Mixture of Experts (MoE) model using phi-2. The model, named Phixtral, combines 2 to 4 fine-tuned models and outperforms each individual expert. The models,phixtral-2x2_8
andphixtral-4x2_8
, are accessible at Hugging Face and Hugging Face respectively. - More than 40 Papers on Language Modeling Reviewed in 2023:
@eugeneyan
shared their Language Model Reading List which includes over 40 papers that were reviewed in 2023. They also encouraged members to suggest new papers or raise issues on their GitHub repository. - Further Reading on Language Modeling Suggested by
@swyxio
: In addition to the models discussed,@swyxio
mentioned a paper on another model namedMixtral
.
Links mentioned:
- Language Modeling Reading List (to Start Your Paper Club): Some fundamental papers and a one-sentence summary for each; start your own paper club!
- Tweet from Maxime Labonne (@maximelabonne): đ Phixtral I made the first efficient Mixture of Experts with phi-2 models. đ„ł It combines 2 to 4 fine-tuned models and is better than each individual expert. đ€ phixtral-2x2_8: https://huggingfacâŠ
Skunkworks AI Discord Summary
- Mixtralâs Mysterious Specialization:
@interstellarninja
observed that in Mixtralâs routing analysis, experts didnât showcase domain specialization apart from DM Mathematics, which was non-uniformly dispersed. @baptistelqt in the #core-moe supported this by expressing how the load balancing loss for the router could deter domain specialization. Both noted that consecutive tokens often enter the same portals. - Pythonâs âselfâ & Englishâs âquestionâ, Seat-Mates in Mixtral:
@interstellarninja
highlighted the peculiar syntactic behavior in Mixtralâs router as it paired âselfâ in Python and âquestionâ in English, showing that the model gravitates towards syntax. - PyTorch or Jax? The Devil We Know:
@dook4
wondered why AI Engineers prefer PyTorch over Jax notwithstanding its usage in Llama. Responding,@yikesawjeez
confessed that rewriting everything in Jax for Googleâs TRC grant was a daunting task. They concluded that the familiar proves to be the winner. - Game Changer for Fine-Tuning?:
@nisten
dropped a cryptic comment about a chart revolutionizing the game for fine-tuning. However, the enigmatic chart wasnât revealed in the referenced discussions.
Skunkworks AI Channel Summaries
â· #general (7 messages):
- Expertsâ specialization in Mixtral routing analysis:
@interstellarninja
noted that in a Mixtral routing analysis, experts didnât specialize in specific domains with the exception of DM Mathematics, which was non-uniformly distributed. - Syntactic behavior in Mixtral router:
@interstellarninja
also mentioned that the routers do exhibit some syntactic behavior with âselfâ in Python, âquestionâ in English, and indentation in code. It was noted that consecutive tokens often get routed to the same experts. - Mixtralâs model leans heavily on syntax:
@interstellarninja
also highlighted that the model shows strong specialization in syntax, particularly evident in how indentation is routed to the same experts. - Debate on PyTorch vs Jax:
@dook4
asked why people are using PyTorch over Jax apart from its use in Llama.@yikesawjeez
suggested that people use PyTorch mainly because they know how to write it, even mentioning the difficulty they experienced when they had to rewrite everything in Jax while using Googleâs TRC grant. - Implications on the fine-tuning game:
@nisten
suggested that a specific chart completely alters the game for fine-tuning. The specific chart was not included in the cited messages.
â· #core-moe (4 messages):
- Discussions on Domain Specialization in Mixtral Experts:
@baptistelqt
raised a point about Mixtralâs experts not specializing in specific domains, theorizing that the load balancing loss for the router could be a deterrent. They also mentioned successfully implementing an MoE (Mixture of Experts) model that encouraged domain specialization, and sought insights on any potential misunderstandings.@snowclipsed
expressed similar curiosity.
LLM Perf Enthusiasts AI Discord Summary
- Whatâs the temperature? Itâs AI Time!:
@thebaghdaddy
questioned about the context of data in temperature adjustments in AI models.@sehaj.dxstiny
shed light on it as a concept linked to the latent representation of images and Codebooks embeddings of VQ-GANs. - Stepping into the Hyper Zone:
@thebaghdaddy
suggested exploring hyper parameter tuning, methods for controlling data quality inputs, and also hinted towards using regularization techniques for potential improvements. - Embracing the Spirit of Trial and Error:
@sehaj.dxstiny
showed willingness to experiment with the suggested methods for enhancing AI models. - Hey OCR, get out of the way!: As per
@jeffreyw128
, there is a reliance on detecting bad text from methods other than OCR. - Private Company Docs - The Data Hunt:
@res6969
keenly searched for a dataset related to private company descriptive documents.@jeffreyw128
swung into action, pointing out Metaphor holds data that aligns with the required description and offered to provide more insights privately.
LLM Perf Enthusiasts AI Channel Summaries
â· #general (5 messages):
- Decoding âAdjust Temperatureâ: User
@thebaghdaddy
asked about the context of data involved in adjusting the temperature in AI models. - VQ-GANs and Latent Representation of Images: User
@sehaj.dxstiny
clarified that it pertains to the latent representation of images and the Codebooks embeddings of VQ-GANs. - Suggesting Regularization Techniques: In a quest to help,
@thebaghdaddy
recommended exploring hyper parameter tuning, controlling data quality inputs and also hinted toward using regularization techniques for potential improvements. - Open to Experimentation: Responding to the suggestions,
@sehaj.dxstiny
acknowledged that they havenât tried these methods yet but expressed openness to consider them.
â· #rag (1 messages):
jeffreyw128: we rely on detecting if thereâs bad text from non-OCR methods
â· #datasets (2 messages):
- Seeking Dataset for Private Company Documents: User
@res6969
inquired about the availability of a dataset for private company descriptive documents, including information such as board decks, financial statements, and quarterly letters. - Dataset Offer from Metaphor: In response,
@jeffreyw128
mentioned that Metaphor has data that fits the inquiry and offered to provide more information through private messages.
Datasette - LLM (@SimonW) Discord Summary
Only 1 channel had activity, so no need to summarizeâŠ
- Inflationâs Impact on Music Industry Earnings: User
@justinpinkney
questioned how adjusting for inflation might alter the perception of the music industryâs revenue trends. - Streaming vs Downloading:
@dbreunig
clarified that streaming was the catalyst for damaging parts of the music industry, not downloading. - The Golden Window for Mid-Tier Musicians:
@dbreunig
mentioned a âgolden windowâ of opportunity for mid-tier musicians to earn a living prior to the Spotify era. - The Pre-Digital Music Cartel:
@dbreunig
argued that the pre-digital market operated as a cartel, and once consumers were allowed to purchase individual songs instead of full albums, it crumbled. - Influence of Spotify on Mid-Tier Musicians: In response to
@dbreunig
,@antisimplistic
expressed doubts about Spotify leading to harder conditions for mid-tier musicians, suggesting that the industryâs transition to unlimited shelf-space and a fragmented market made it challenging for all artists, regardless of business model. Also,@antisimplistic
suggested that inflation adjustments and larger spending patterns may be factors in assessing the industryâs trends.
Alignment Lab AI Discord Summary
- AI Agents Pitch In For Explanation @burnydelic shared an engaging article by MIT News that reports on how AI agents can potentially help in elucidating the mechanisms of other AI systems.
- Comparative Performance Analysis of Llama2-70B: @tirmizi7715 raised the question of why Llama2-70B performs almost as well as Mixtral and GPT-3.5 in several evaluations, but is significantly worse at MT Bench?
- Cryptic Discussion Leaves Users Puzzled: User @m8thanâs comment âwtf is this lolâ underscored an apparent lack of clarity on the discussion about the performance comparison between Llama2-70B, Mixtral and GPT-3.5.
- NousResearch Simulation Topic: @teknium highlighted a Twitter post from NousResearch about simulation pertaining to the out-of-order OO environment.
Alignment Lab AI Channel Summaries
â· #ai-and-ml-discussion (1 messages):
burnydelic: https://news.mit.edu/2024/ai-agents-help-explain-other-ai-systems-0103
â· #general-chat (2 messages):
- Llama2-70Bâs performance compared to Mixtral and GPT-3.5: User
@tirmizi7715
asked why Llama2-70B is almost equally good at almost all evaluations compared to Mixtral and gpt3.5, but significantly worse at MT Bench. - Confused Participant: User
@m8than
seemed confused about the preceding discussion, and commented âwtf is this lolâ.
â· #oo (1 messages):
teknium: https://fxtwitter.com/NousResearch/status/1744865872563618128
YAIG (a16z Infra) Discord Summary
Only 1 channel had activity, so no need to summarizeâŠ
- Speculation on Googleâs Gemini Training Technique: User
@stevekamman
was keen on the approach defined in an arXiv research paper which outlines a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that allows training of language models on poorly connected device clusters. This technique is potentially linked to how Google trained Gemini. The algorithm is a federated averaging variant, with AdamW as the inner optimizer and Nesterov momentum as the outer optimizer.
Links mentioned:
DiLoCo: Distributed Low-Communication Training of Language Models: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accâŠ