> This summarizes **18** guilds, **277** channels, and **1566** messages. Estimated reading time saved (at 200wpm): **193 minutes**.

Nous announced their seed, and the business focus is Nous Forge: image.png

Rabbit R1 also launched their demo at CES and opinions were very divided.

In other news, OpenAI shipped the GPT store today, and briefly leaked their upcoming personalization feature.

image.png

—

Table of Contents

[TOC]

Nous Research AI Discord Summary

  • Breaking the LLM’s Context Window Limit with Activation Beacon: @kenakafrosty shared an arXiv paper on Activation Beacon, a new solution that could potentially solve Large Language Models (LLMs) context window issue. @_stilic_ confirmed that the code will be available on GitHub.
  • Vibes on Tech Gadgets and AI Use: In the off-topic channel, topics revolved around M2-equipped Apple Vision Pro, Rabbit product, WEHEAD AI companion, layoffs at Humane, and humor about Language Learning Models (LLMs) and their use.
  • Curated Tech and AI Links: Interesting links shared include tools like Light Activation Beacon Training and MaybeLongLm Self-Extend for LLM, AI for explaining AI systems, discussions about Rabbit.tech, model interpolation, 2 MoE model and WikiChat.
  • Nous Research’s Exciting Seed Financing and Future Plans: @teknium announces Nous Research’s successful $5.2 million seed financing and their plan to etch transformer architecture into chips, creating powerful servers capable of real-time voice agents, improved coding, and running trillion parameter models. Further open-source research and the development of Nous-Forge are also in the pipeline.
  • OpenAI Community’s Various Projects on AI & LLMs: A broad range of topics were covered on the general channel, including the development progress of QLORA, research on wearable AI mentor, discussions on fine-tuning Large Models, use of custom architectures, experiments with WikiChat Dataset, and a spontaneous Spanish Speaking Session.
  • LLM-related discussions and inquiries: In the ask-about-llms channel, discussions centered on replicating Phi Models, solutions to VRAM issues with Mixtral and Ollama, and strategies for building LLMs tailored to a proprietary knowledge base. Users considered the use of Synthesizer tool and suggested ways to create synthetic data sets.
  • Obsidian Project Code Request: In the project-obsidian channel, users expressed interest in the Obsidian script that @qnguyen3 has used for their work. The script, when shared, would be valuable for other guild members in their own projects.

Nous Research AI Channel Summaries

▷ #ctx-length-research (3 messages):

  • Activation Beacon: A solution for LLM’s context window issue: @kenakafrosty shared a link to an arXiv paper on a new solution named Activation Beacon. The paper states that Activation Beacon “condenses LLM’s raw activations into more compact forms such that it can perceive a much longer context with a limited context window”. This tool seems to have the ability to balance both memory and time efficiency during training and inference.
  • Upcoming Code for Activation Beacon on GitHub: @_stilic_ offered an update, saying that the code for the Activation Beacon will be available here on GitHub.

Links mentioned:

▷ #off-topic (95 messagesđŸ”„đŸ”„):

  • M2-equipped Apple Vision Pro, and the Rabbit product discussed: @nonameusr was excited about the M2 on the Apple Vision Pro, while .beowulfbr expressed skepticism about the Rabbit product’s cost and inference coverage. They also speculated that Apple might release a similar product this year.
  • Weighing in on WEHEAD AI companion in 2024: Several users had resonating and humorous opinions about the WEHEAD AI companion in 2024, following a link shared by @teknium. @everyoneisgross admired the lowpoly aesthetic, and @youngphlo imagined carrying the AI around like a baby. Check the post here.
  • Chats on Humane layoffs ahead of their first device launching: @mister_poodle shared a link regarding layoffs at Humane ahead of the startup shipping its first device, a preordered $699, screenless, AI-powered pin. The article is available at this link.
  • Humor about Language Learning Models (LLMs) and their use: Users @Error.PDF and @n8programs joked about the usefulness of LLMs in understanding and communicating in foreign languages. They also humorously speculated on the next advancements in LLMs, such as Discord mods that automatically translate all screen text to the user’s native language.

Links mentioned:

  • Exploring Light Activation Beacon Training with MaybeLongLM Self-Extend: @kenakafrosty raised the idea of combining Light Activation Beacon Training with MaybeLongLm Self-Extend to potentially eliminate the context window problem.
  • AI Agents Unraveling AI Systems: @burnydelic shared an article on the novel approach taken by MIT’s CSAIL researchers who used AI models to experiment on other systems and explain their behavior.
  • Is Rabbit.tech the Next Big Thing?: @kevin_kevin_kevin_kevin_kevin_ke sparked a discussion on Rabbit.tech, a tech company offering standalone hardware for artificial intelligence. Some users expressed skepticism about the need for a separate device when smartphone apps could offer similar functionality (@georgejrjrjr and @teknium), while others (@gezegen) defended the uniqueness of specialized hardware for AI companions.
  • Criticisms and Defence of Model Interpolation: In the context of AI model development, @georgejrjrjr, @romaincosentino and @charlie0.o discussed the limitations and potential advantages of model interpolation. @romaincosentino posited that there’s a lack of theoretical foundation in model interpolation, while @charlie0.o considered it as a form of regularization.
  • Stumbled Upon Large MoE Model and WikiChat: @nonameusr shared two links, one to a 2 MoE model based on Intel-neural series v3, and a tweet mentioning WikiChat, a tool boasting improved factual accuracy over GPT-4. The latter prompted @decruz to query its difference from systems grounded with RAG.

Links mentioned:

▷ #announcements (1 messages):

  • Nous Research Announces $5.2M Seed Financing Round: @teknium announces the successful conclusion of the $5.2 million seed financing round, with co-leads Distributed Global and OSS Capital, and participation from several angel investors, including Vipul, founder and CEO at Together AI, Yonatan Ben Shimon, founder at Matchbox DAO, and Balaji, Thibaud, founder at OpenRouter and OpenSea, Chris Prucha, founder at Notion, founder and CEO at Glaive AI, and Gavin, founder and CEO at etched.ai.
  • Burning Transformers Architecture Into Chips: The intention was revealed to create the world’s most powerful servers for transformer inference by burning the transformer architecture into chips.
  • Products Impossible with GPUs: @teknium outlines the projected capabilities of Nous Research’s servers, emphasising real-time voice agents, improved coding through tree search, and multicast speculative decoding.
  • Room for Trillion Parameter Models: The upcoming servers are expected to be able to run trillion parameter models, featuring a fully open-source software stack, expansibility to 100T param models, beam search, and MCTS decoding.
  • Open-Source Pursuits & Future Project, Nous-Forge: Stressing the importance of open-source research, @teknium announces that the funding will allow for continued investment in LLM Architecture, Data Synthesis, Simulation, & Agent Engineering research, and the development of Nous-Forge, set for 2024. The team of developers and advisors mentioned includes <@153017054545444864>, <@387972437901312000>, <@265269014148808716>, and <@187418779028815872>.

Links mentioned:

Etched | The World’s First Transformer Supercomputer: Transformers etched into silicon. By burning the transformer architecture into our chips, we’re creating the world’s most powerful servers for transformer inference.

▷ #general (377 messagesđŸ”„đŸ”„):

  • Nous Team raises $5.2 million: The Nous team posted a tweet announcing that they have successfully raised $5.2m in seed funding. Members and friends of the Nous team expressed their excitement and offered congratulations. Source: @youngphlo
  • Development of QLORA: User @n8programs shared progress on the development of QLORA, a method to fine-tune OpenAI models. He successfully trained QLORA on a single m3 max and concluded that QLORA generally performed better than ordinary mistral. Source: @n8programs
  • Researching Wearable AI mentor: User @mg_1999 is working on a wearable AI mentor and consulted the community about the best model to use from Nous Research. They shared a link to the product’s website: AISAMA
  • Discussion on Finetuning Large Models: The community discussed the benefits and challenges of training and merging large versus small models. Users argued for different strategies such as merging fine-tuned models with base models and using multiple adapters. Notable links shared include LM-Cocktail and CFG
  • Suggestion of Modal and Runpod platforms: User @decruz discussed modal.com, a tool used for inference, and facilitated introductions to people at the company for those in need of GPUs. User @kenakafrosty mentioned runpod as an alternative platform with similar capabilities.
  • Use of custom architectures: User @mihai4256 asked for advice on sharing custom architectures inherited from LlamaForCausalLM with others. He was guided to use a similar method as implemented by Qwen, including a custom modeling file and allowing import with trust_remote_code=True.
  • Interest in WikiChat Dataset: User @emrgnt_cmplxty expressed interest in the dataset used by the WikiChat team, stating their own tests with it produced promising results. They raised the idea of reproducing the dataset to fine-tune OpenHermes-2.5-Mistral-7B. Source: stanford-oval/WikiChat
  • Spontaneous Spanish Speaking Session: Various users engaged in a fun and humorous conversation in Spanish. The conversation had no informative value but ended with the conversation in fun and good cheer.

Links mentioned:

▷ #ask-about-llms (44 messagesđŸ”„):

  • Exploration of Replicating Phi Models: @gson_arlo inquired about the existence of open models that aim to replicate the Phi series. In response, @georgejrjrjr pointed out that Owen from sci-phi has been focusing on synthetic data more than anyone else they know and also mentioned related projects such as refuel.ai and Ben Anderson’s Galactic.
  • Addressing VRAM Issues with Mixtral and Ollama: @colby_04841 asked for advice on handling VRAM limitations while using Mixtral 8x7b and Ollama on a system with 4 RTX 3090 GPUs.
  • RAG vs Fine-Tuning for Proprietary Knowledge Base: @bigdatamike sought insights on whether to use RAG, fine-tuning, or both for building a language model tailored to a company’s proprietary knowledge base. The users gave mixed opinions, with @colby_04841 favoring RAG and @georgejrjrjr suggesting potentially transforming the data in the retrieval store to better match the output format.
  • Usefulness of Synthesizer Tool in Data Creation: @georgejrjrjr recommended Synthesizer, a tool developed by SciPhi-AI, for multi-purpose language model framework for RAG and data creation. User @everyoneisgross confirmed its addition to their project list.
  • Best Way to Create Synthetic Data Sets: @gezegen asked about generating synthetic datasets, where @emrgnt_cmplxty pointed to open source models as a scalable solution, emphasizing the need for maintaining accuracy.

Links mentioned:

GitHub - SciPhi-AI/synthesizer: A multi-purpose LLM framework for RAG and data creation.: A multi-purpose LLM framework for RAG and data creation. - GitHub - SciPhi-AI/synthesizer: A multi-purpose LLM framework for RAG and data creation.

▷ #project-obsidian (4 messages):

  • Obsidian script sharing request:
    • @qnguyen3 mentioned they just ran Obsidian for their work. In response to this, @vic49. requested @qnguyen3 to share the script. There is also an indicated interest in the script from @thewindmom.

OpenAI Discord Summary

  • The Wait Game for GPT-4 Turbo: Member @_treefolk_ pointed out their anticipation for the full release of GPT-4 Turbo for cheaper usage and increased token limit. The individual highlighted the vagueness of “early January” as the promised release timeframe.
  • The AI vs. Coders Debate:
    • @you.wish and @ă€ïœïœ…ïœŽïœ•ïœŒïœ”ïœ‰ïœïœïœ”ïœ…ă€‘ engaged in a spirited discussion about the implications of future GPT versions for coders. While @ă€ïœïœ…ïœŽïœ•ïœŒïœ”ïœ‰ïœïœïœ”ïœ…ă€‘ envisioned a future where “Pretty much everybody on earth will be the world’s best coder with the next versions of GPT,” @you.wish contested this with the fact that current AI models can only perform very elementary tasks.
  • When Discord Rules Stir Discussions: The guild experienced an exhaustive rule interpretation discourse when @you.wish asked for upvotes on a Reddit post, prompting dialogues about Discord’s Rule 7 prohibiting “self-promotion, soliciting, or advertising”.
  • LimeWire Takes AI to Music: @shartok brought a musical piece composed by LimeWire AI Studio into the conversation, sparking a discussion on AI-generated music.
  • Mixed Reviews on Midjourney’s Latest Version: Voices in the guild like @dino.oats, @darthgustav. and @satanhashtag shared their experiences and perceptions of the newest version of Midjourney (MJ), discussing its deviation from Discord and the inclusion of privacy features. However, @you.wish expressed dissatisfaction with the version.
  • Brand Guidelines Puzzle: In the GPT-4 discussions, @mrbr2023 posed questions over a black-marked section of the brand guidelines documentation for OpenAI GPTs received in an email and the inability to share a screenshot or link to the document.
  • Roadblocks in GPT Publishing: @mrbr2023 struggled with selecting ‘Publish to EVERYONE’ for their GPTs, which was resolved upon realizing that both the ‘Name’ and ‘Domain’ boxes in the builder profile settings need to be checked for the option to be available.
  • Exploring GPT Personalization: Venturing into the new GPT memory feature (personalization), @winsomelosesome and @darthgustav. shared their experiences of the GPT learning from their chats.
  • ChatGPT Romantically Challenged: In the prompt engineering channel, @rchap92 pointed out ChatGPT’s difficulty in crafting even a simple romantic scene without violating guidelines, a fact confirmed by @rjkmelb who stated that ChatGPT is designed to be “G-rated”.
  • ChatGPT’s Conservative Guidelines; Possible Workaround?: When it comes to creating content that might violate guidelines, @exhort_one suggested an interesting workaround - involve ChatGPT in crafting a censored version* of the content and then let the user fill in the blanks.
  • AI’s Potential Across Different Fields: @shoga4605 pondered over the vast potential of AI and language models and their potential impact on various fields such as linguistics, ecology, environments, and more. They also hypothesized about AI’s application in agriculture for theoretically determining the amount of food that could be produced from lawn space.

OpenAI Channel Summaries

▷ #ai-discussions (160 messagesđŸ”„đŸ”„):

  • Impatience for GPT-4 Turbo’s official release: @_treefolk_ expressed eagerness for GPT-4 Turbo to move out of preview for cheaper usage and increased token limit, questioning the specifics of the promised “early January” release.
  • Discussion on the future of coding with GPT: A lively debate transpired between @you.wish and @ă€ïœïœ…ïœŽïœ•ïœŒïœ”ïœ‰ïœïœïœ”ïœ…ă€‘ about the potential for future GPT versions to replace coders. While @ă€ïœïœ…ïœŽïœ•ïœŒïœ”ïœ‰ïœïœïœ”ïœ…ă€‘ believes that “Pretty much everybody on earth will be the world’s best coder with the next versions of GPT,” @you.wish argued that current models can only undertake very basic tasks.
  • Self promotion content prompts Discord rule debate: An extensive discussion about the application of Discord’s Rule 7, which prohibits “self-promotion, soliciting, or advertising,” was initiated by @you.wish's request for upvotes on a Reddit post.
  • AI generated music shared: @shartok shared a link to an AI-generated music piece created with LimeWire AI Studio.
  • Chat about latest Midjourney (MJ) version: Messages from @dino.oats, @darthgustav. and @satanhashtag detailed their experiences and opinions on the newest version of MJ, including its move away from Discord and introduction of privacy features. @you.wish expressed dissatisfaction with the version.

Links mentioned:

▷ #gpt-4-discussions (154 messagesđŸ”„đŸ”„):

  • Confusion Over Brand Guidelines: User @mrbr2023 showed confusion about why a section of the brand guidelines documentation for OpenAI GPTs given in an email had a black mark over it. They were also puzzled about being unable to share a screenshot or link to the document in the group.
  • Voice Quality for Custom GPTs: @vantagesp asked why the voice for custom GPTs was subpar, without any response or discussion following.
  • Struggles with Publishing GPTs: User ‘mrbr2023’ displayed frustration about not being able to select ‘Publish to EVERYONE’ for their GPTs, eventually figuring out that both the ‘Name’ and ‘Domain’ boxes need to be ticked in the builder profile settings for the ‘EVERYONE’ option to become available.
  • Users Encounter Technical Issues with GPTs: Several users reported that their GPTs disappeared and some website pages were not accessible, attributing it to a new update from OpenAI.
  • Explored Personalization Feature of GPT: @winsomelosesome and @darthgustav. explored the new GPT memory feature (personalization) in Settings, which allows GPT to learn from your chats. However, @darthgustav. also noted that it seemed to be removed shortly after discovery.
  • Appeal for GPT Feedback: User @faazdataai_71669 asked for feedback on their GPT, ‘Resume Tailor’, sharing a link to their GPT.

Links mentioned:

Brand guidelines: Language and assets for using the OpenAI brand in your marketing and communications.

▷ #prompt-engineering (6 messages):

  • ChatGPT struggles with romantic scenarios: @rchap92 raises a concern about ChatGPT struggling to outline even a basic romantic scene without getting the ‘may violate guidelines’ highlight. @rjkmelb confirms that Indeed, ChatGPT is designed to be “G-rated”.
  • Workarounds for ChatGPT’s conservative approach: @exhort_one provides a workaround suggestion that involves asking ChatGPT to censor any part that might violate guidelines, and then the user can fill in the blanks.
  • Prompt-engineering guide shared: @scargia shares a link to the Prompt Engineering guide on OpenAI’s website.
  • Inspiring potentials use-cases for AI: @shoga4605 discusses potentials uses cases for AI and language models in understanding and modeling ecology, environments, habitats, and overall biodiversity. They also consider the possibilities of AI in agriculture, like determining how much food could theoretically be produced from lawn space.
  • Warm welcome to a new user: @beanz_and_rice welcomes @shoga4605 to the community, appreciating their enthusiasm and ideas.

▷ #api-discussions (6 messages):

  • Censoring needed content: User @rchap92 asked if chatGPT cannot create a romantic scene beyond a kiss without getting a violation warning. @rjkmelb confirmed that ChatGPT is designed to be G rated.
  • Working around ‘violation’ highlights: User @exhort_one suggested asking ChatGPT to censor any part that may violate guidelines, thereby enabling users to fill in the blanks.
  • Prompt Engineering Guide: @scargia shared a link to OpenAI’s guide on prompt engineering.
  • Enthusiasm for AI potential: @shoga4605 expressed excitement about the potential of AI and language models, and contemplated their application in fields such as linguistics, ecology, environments, and more.
  • Welcome to the discussion: @beanz_and_rice greeted and welcomed @shoga4605 to the chat.

LM Studio Discord Summary

  • Ubuntu Server and LMStudio Compatibility Issue: User @m.t.m inquired about running LMStudio on Ubuntu 22.04 server without an X server. Responding member @heyitsyorkie indicated that LMStudio does not support a headless or CLI option and recommended llama.cpp for such needs.

  • The GPU Debate RTX 4070 vs. RTX 4090: A discussion was held between @b0otable, @heyitsyorkie, @fabguy, @senecalouck, and @rugg0064 on whether to purchase an RTX 4070 or RTX 4090, focusing on performance benefits, VRAM consideration, and price.

  • Use of LM Studio for LM-as-a-service Queried: User @enavarro_ questioned using only the LM Studio backend to set up a LM-as-a-service. They were informed by @heyitsyorkie that such a feature isn’t currently offered and was directed to llama.cpp as a potential solution.

  • Forward Looking Talk on ROCm Support & ML’s Future: ROCm support and the future landscape of machine learning sparked a conversation. @senecalouck pointed out that ROCm support is already enabled in ollama, hoping it would soon be in LM Studio. The focus then shifted to the future implementation of ML and emerging players like TinyBox.

  • Finding 7B-13B Model for Tool Selection and Chat Desires: User @_anarche_ conveyed their struggle to find a locally usable 7B-13B model that can perform both tool selection (function calling) and chat, possibly in a franken/merged form. Their goal is to shift away from the gpt-3.5-turbo model.

  • Stanford DSPy Highlighted as a Potential Solution: @nitex_dkr highlighted the Stanford DSPy, a framework for programming—not prompting—foundation models, as a potential solution to @anarche’s challenge.

  • Linux Loading Issue and Misleading Version Number Flagged: User @moko081jdjfjddj reported that the model won’t load on Linux and noticed a discrepancy with the Linux version number on the website. These issues were addressed by @heyitsyorkie and @fabguy who clarified the version issue and directed the user to the specific Linux Beta channel for further support.

  • Unsupported Platform Issue Encountered: @keryline received an error message that their platform is not supported by LM Studio due to their processor not supporting AVX2. @dagbs suggested trying the avx beta to resolve this.

  • Mac v/s PC for Running Large Models Provokes Discussion: A conversation was initiated by @scampbell70 about the hardware requirements for efficiently running larger models like Mistral 8x7b, Falcon 180b, or Goliath 120. Some members praised Macs (particularly the Mac Studio) for better performance, while concerns about their lack of upgradability were raised.

  • GPUs with Memory Slots Unavailable: @doderlein asked where they could buy a GPU with memory slots, which @ptable stated isn’t possible. @heyitsyorkie pointed to a unique solution from ASUS that couples a GPU with a SSD M.2 NVME to create a hybrid storage-graphics card.

  • Hardware Usage in LM Studio Clarified: User @besiansherifaj inquired if a CPU is necessary in LM Studio while having a 4090 RTX GPU. @fabguy clarified that the CPU will always be used even if the full model is on the GPU.

  • Autogen Studio and LMStudio Usage Questioned: In the autogen channel, thelefthandofurza brought up a question about anyone’s experience using autogen studio with lmstudio. The discussion did not proceed further.

LM Studio Channel Summaries

▷ #💬-general (71 messagesđŸ”„đŸ”„):

  • LmStudio Install Issues on Ubuntu Server: User @m.t.m. asked if it’s possible to run LMStudio on Ubuntu 22.04 server without an X server. @heyitsyorkie responded that LMStudio currently does not support a headless or CLI option and recommended llama.cpp for such needs.

  • Nicknames and Server Rules: User @sexisbadtothebone asked if their nickname breaks the server rules about SFW content. @heyitsyorkie advised editing nicknames to abide by the rules and maintain a work environment.

  • The RTX 4070 vs. RTX 4090 Debate: @b0otable sought advice on whether to purchase an RTX 4070 or RTX 4090. The discussion involved @heyitsyorkie, @fabguy, @senecalouck, and @rugg0064, focusing on performance benefits, VRAM consideration, and price.

  • Using LM Studio for LM-as-a-service: @enavarro_ inquired about the possibility of using only the LM Studio backend to set up a LM-as-a-service. The user was informed by @heyitsyorkie that such a feature is not available and was pointed to llama.cpp as a potential solution.

  • Discussing ROCm Support & Future of ML: A discussion was held considering ROCm support and the future landscape of machine learning. @senecalouck mentioned that ROCm support is available in ollama, with hopes of seeing it in LM Studio soon. The discussion then evolved into exploring the future implementation of ML technology and new players like TinyBox.

Links mentioned:

▷ #đŸ€–-models-discussion-chat (18 messagesđŸ”„):

  • Seeking 7B-13B Model for Tool Selection and Chat: @_anarche_ voiced the struggle to find a local 7B-13B model capable of both tool selection (function calling) and chat, potentially in a franken/merged form. They aim to transition away from the gpt-3.5-turbo model.
  • Dolphin Model Serves as Suitable All-Around: @dagbs recommended the Dolphin model as a good generalist option for coding, functions, tools, etc., despite mixed results with certain function-calling tools like crewai and autogen.
  • Langchain Compatibility Considerations: @_anarche_ detailed their intention to integrate the new model into Langchain, for which they have already adjusted their bot to use the LM Studio API.
  • Uncensored Model Concerns on Discord: Drawing attention to potential issues with using an uncensored model in a Discord environment, @dagbs cautioned that such a move could result in a ban, exposing the need for careful model selection.
  • Stanford DSPy Shared as Possible Solution: @nitex_dkr flagged the Stanford DSPy, a framework for programming—not prompting—foundation models, which could offer a promising solution to @anarche’s challenge.

Links mentioned:

GitHub - stanfordnlp/dspy: Stanford DSPy: The framework for programming—not prompting—foundation models: Stanford DSPy: The framework for programming—not prompting—foundation models - GitHub - stanfordnlp/dspy: Stanford DSPy: The framework for programming—not prompting—foundation models

▷ #🧠-feedback (10 messagesđŸ”„):

  • Linux Loading Issue:
    • User @moko081jdjfjddj reported that the model doesn’t load on Linux. @heyitsyorkie and @fabguy directed the user to Channels and Roles to select the Linux Beta role and post the issue in the specific channel.
    • @moko081jdjfjddj also noticed a discrepancy with the version number for Linux provided on the website, to which @fabguy responded saying that the Beta version 0.2.10 had stability issues, hence not updated on the site.
  • Unsupported Platform Issue:
    • @keryline experienced a problem with LM Studio on their Windows machine, getting an error message stating that their platform is not supported as their processor does not support AVX2 instructions.
    • To resolve this, @dagbs suggested the user to try the avx beta.
  • New Beta Version Request:
    • @logandark requested a new beta version that includes a specific commit from the llama.cpp repository.

▷ #🎛-hardware-discussion (29 messagesđŸ”„):

  • Mac v/s PC for Running Large Models: User @scampbell70 initiated a discussion on the hardware requirements for efficiently running larger models such as Mistral 8x7b, Falcon 180b, or Goliath 120 with the least amount of loss and best performance. @telemaq and @heyitsyorkie suggested a Macbook Pro or a Mac Studio for better performance, while @pydus acknowledged the cost-effectiveness of a Mac Studio with 192GB of unified memory for a price of $7K. However, @scampbell70 expressed concerns about Macs due to their lack of upgradability(source).

  • VRAM allocation on Apple machines: @heyitsyorkie shared a Reddit thread detailing how the amount of VRAM allocation can be controlled at runtime using a command sudo sysctl iogpu.wired_limit_mb=12345(source).

  • Purchasing GPUs with Memory Slots: @doderlein asked where to buy a GPU with memory slots, a question that @ptable answered as not being possible. @heyitsyorkie mentioned a unique solution from ASUS that pairs the GPU with an SSD M.2 NVME, creating a hybrid storage-graphics card (source).

  • Mac Performance with Goliath 120b Q8: @telemaq shared a Reddit post of a user who ran Goliath 120b Q8 on a Mac Studio M2 Ultra with 192GB memory, achieving about 7tok/s, proving the Mac’s capability to handle larger models (source).

  • Hardware Usage in LM Studio: User @besiansherifaj inquired about whether CPU is necessary in LM Studio while having a 4090 RTX GPU. @fabguy clarified that the CPU will always be utilized even if the full model is on GPU.

Links mentioned:

▷ #autogen (1 messages):

thelefthandofurza: Has anyone used autogen studio with lmstudio?


Eleuther Discord Summary

  • Model Performance Rises without Additional Data, Training, or Scale: A query launched by @sehaj.dxstiny regarding improving performance without extra resources turned up some interesting resources, courtesy of @ad8e and @vatsala2290, including the Machine Learning workshop poll results.
  • Taking AI Development Mobile: @pawngrubber was steered towards mlc-llm by @_fleetwood for starting machine learning development on mobile devices. This open source tool develops, optimizes, and deploys AI models natively.
  • Unpacking Llama-2-70B’s Benchmark Conundrum: @tirmizi7715 expressed confusion over why Llama-2-70B performs worse on MT-Bench while doing well on other benchmarks.
  • Any Language for Mistral: A discussion headed by @maxmatical clarified that modern tokenizers handling all Unicode characters can allow language transfer, as cited by @thatspysaspy and @stellaathena.
  • Unravelling Huggingface Model Structures: @sk5544 received a coding approach for extracting the PyTorch model definition code from Huggingface, shared by @thatspysaspy.
  • Kaggle LLM Contest Peaks Interest: @grimsqueaker highlighted an ongoing Kaggle LLM contest of potential interest to the community.
  • Mechanism Behind MLM Loss Calculation: A discussion started by @jks_pl clarified why MLM loss is only computed on masked/corrupted tokens, explained by @bshlgrs and @stellaathena.
  • AI Behavior Explained through Evaluation: @burnydelic shared an interesting MIT News article discussing the development of AI models that evaluate and are able to explain the behavior of other AI systems.
  • muP for Simplified Hyperparameter Tuning: Users @ad8e, @thatspysaspy, @ricklius, and @cubic27 shared thoughts on muP’s ability to simplify hyperparameter tuning across scales, albeit not being a magic solution.
  • Twitter Data Limited in Datasets: @stellaathena clarified to @rybchuk that significant amounts of Twitter data are unlikely in certain datasets.
  • The Truth Behind Mixtral Routing Analysis: A tweet highlighting a Mixtral routing analysis misconception was shared by @tastybucketofrice, and further discussed by @stellaathena and @norabelrose.
  • Gaining Insight with GPT/LLM Visualization Tool: @brandon_xyz announced a new tool that visualizes GPT/LLM’s cognitive process citing a tweet and received requests for private tool access.
  • Understanding Mechanistic Interpretability and BIMT: The role of Brain-Inspired Modular Training (BIMT) in boosting neural network interpretability was discussed by @g_mine, who pointed out a paper on this issue.
  • Pythia Data Preparation Standard: Queries by @joshlk about Pythia data prep received clarification from @pietrolesci that it is a standard pre-training process, even if online information is sparse.
  • EOD Token Issue in Pythia-Deduped Dataset: @pietrolesci noted an issue that the Pythia-deduped dataset lacked EOD tokens, with possibilities about why raised by @hailey_schoelkopf including the omission of the --append-eod option during tokenizing.
  • EOD Tokens & Packer for Pythia Models: @pietrolesci and @hailey_schoelkopf discussed whether the difference raised due to missing EOD tokens would affect the ‘packed’ dataset that the Pythia models see during training.
  • Masking Role in Document Attention: A question about the function of masks from @joshlk got answered by @hailey_schoelkopf, who clarified that they are not used to prevent cross-attention between documents.

Eleuther Channel Summaries

▷ #general (61 messagesđŸ”„đŸ”„):

  • Optimization Options for Model Performance: User @sehaj.dxstiny raised a question about improving a model’s performance without additional data, training, or scale. Recommended resources included a Machine Learning workshop poll shared by @ad8e and work by the Amitava Das group at USC shared by @vatsala2290.

  • Exploring ML development on Mobile: @pawngrubber showed interest in starting machine learning development (specifically, inference) on mobile devices. @_fleetwood suggested mlc-llm as a start point, a tool to develop, optimize, and deploy AI models natively on devices.

  • Understanding Llama-2-70B’s MT-Bench Performance: @tirmizi7715 queried why the language model Llama-2-70B performed worse on MT-Bench compared to Mixtral and Gpt 3.5, while performing equally well on other benchmarks.

  • Japanese Pretraining on Mistral: In a discussion initiated by @maxmatical concerning StableLM’s Japanese pretraining on the English language model Mistral, it was clarified by @thatspysaspy and @stellaathena that modern tokenizers can handle all Unicode characters, allowing for language transfer.

  • Understanding Huggingface Model Structure: @sk5544 sought a way to obtain the PyTorch model definition code of a model loaded from Huggingface. @thatspysaspy shared a coding approach to facilitate the same.

Links mentioned:

▷ #research (25 messagesđŸ”„):

  • Kaggle LLM Contest: @grimsqueaker mentioned an ongoing Kaggle LLM contest that might be of interest to the research community.
  • Discussion on Masked Language Modeling (MLM) Loss Computation: @jks_pl initiated a discussion questioning why MLM loss is calculated only on masked/corrupted tokens. @bshlgrs and @stellaathena provided responses, indicating that this setup is due to the original unmasked token being an easy task for the MLMs which might not provide much informational value for learning.
  • Novel Method of Explaining AI Behavior: @burnydelic shared an MIT News article about researchers at MIT’s CSAIL who have developed AI models that can conduct experiments on other AI systems to explain their behavior.
  • muP: A Boon for Hyperparameter Tuning: @ad8e shared his key takeaways about muP, emphasizing that muP simplifies the tuning of hyperparameters across different model scales. However, he also noted that muP is not a magic solution and may face issues with certain setups like tanh activations. @thatspysaspy, @ricklius, and @cubic27 concurred that the main benefit of muP is to facilitate hyperparameter transfer across scales.
  • Absence of Twitter Data in Certain Datasets: In response to @rybchuk’s query about extracting Twitter data from certain datasets, @stellaathena responded that none of the datasets in discussion are likely to contain a significant amount of Twitter data.

Links mentioned:

AI agents help explain other AI systems: FIND (function interpretation and description) is a new technique for evaluating automated interpretability methods. Developed at MIT, the system uses artificial intelligence to automate the explanati


▷ #interpretability-general (8 messagesđŸ”„):

  • Mixtral routing analysis finds lack of specialization: @tastybucketofrice shared a tweet from @intrstllrninja stating that a Mixtral routing analysis showed that experts did not specialize to specific domains. @stellaathena expressed confusion about the widespread misunderstanding.
  • Previous findings align with Mixtral analysis: User @norabelrose commented that there was some other analysis showing the same thing about a year ago, indicating that this finding isn’t entirely new.
  • Trigram frequencies compile on Pythia-deduped training set: @norabelrose also shared a link to a document detailing trigram frequencies computed on 11.4% of the Pythia-deduped training set.
  • Innovative tool for GPT/LLM visualization: @brandon_xyz mentioned the creation of a new tool that visualizes the thinking and understanding processes of a GPT/LLM, showcasing his tweet for example and welcomed private requests for tool access.
  • Mechanistic Interpretability and BIMT: @g_mine pointed out a paper discussing large language models’ mechanistic interpretability and the role of Brain-Inspired Modular Training (BIMT) in enhancing neural networks’ interpretability.

Links mentioned:

▷ #gpt-neox-dev (13 messagesđŸ”„):

  • Pythia Data Preparation: @joshlk noted that there is a lot of information online about data prep for fine-tuning but not much on pre-training. They asked whether Pythia’s process is typical or different. @pietrolesci commented that it is generally a standard process for (decoder-only) Language Model (LLM) training, not just for Pythia.

  • Missing EOD Tokens: @pietrolesci brought up the issue that EOD tokens weren’t found in the Pythia-deduped dataset. @hailey_schoelkopf found this surprising and mentioned a possibility that the option --append-eod wasn’t included while tokenizing the Pile + deduped Pile into the Megatron format.

  • Different Packer for Pre-Trained Pythia Models?: @pietrolesci pointed out that if no EOD token was added, the resulting “packed” dataset would be different from what the Pythia models saw during training because there would be N missing tokens for N documents, shifting every token in the pack. @hailey_schoelkopf concurred that if both the pre-shuffled and the raw idxmaps datasets don’t have EOD tokens, they should match each other. But when packing, the NeoX codebase does not add EOD tokens itself (Source).

  • Masking in Document Attention: @joshlk inquired about the appearance of masks and whether they are used to prevent cross-attention between documents. @hailey_schoelkopf clarified that masks are not used for this purpose.

Links mentioned:


OpenAccess AI Collective (axolotl) Discord Summary

  • Optimizing Mistral Training”: User @casper_ai shared in-depth details about optimizing the Mistral model training: “MoE layers can be run efficiently on single GPUs with high performance specialized kernels. Megablocks casts the feed-forward network (FFN) operations of the MoE layer as large sparse matrix multiplications, significantly enhancing the execution speed”.
  • Potential slow-down in Deepspeed multi-gpu usage: @noobmaster29 highlighted an issue with accelerate==0.23 (Deepspeed integration), causing slower training for users. Downgrading to accelerate==0.22 or using the main branch was suggested, with the fix awaiting release source.
  • Tracking Experiments with Axolotl and MLFlow: @caseus_ and @JohanWork discussed adding MLFlow into Axolotl for experiment tracking Pull Request #1059.
  • Axolotl WebSocket for External Job Management @david78901 proposed a websockets endpoint in the Axolotl project for better external job management. @caseus_ expressed interest in incorporating the idea into the main project.
  • A Discussion on the Impact of System Messages Training: In the context of model training, @le_mess stated the content of system messages have no significant impact on model performance and can be as random as “ehwhfjwjgbejficfjeejxkwbej” source.
  • Implementing “Shearing” in ShearedMistral Training: @caseus_ pointed out a method for the process of shearing, specifically referencing a GitHub repo. He also discussed the merit of using SlimPajama over RedPajama v2 for data deduplication and quality, noting RedPajama v2 no longer includes subsets source.

OpenAccess AI Collective (axolotl) Channel Summaries

▷ #general (7 messages):

  • Mistral training optimization methods explained: In the channel, @casper_ai detailed some key information on how to optimize the training for the Mistral model. In particular, it was mentioned that MoE layers can be run efficiently on single GPUs with high performance specialized kernels. Megablocks [13] casts the feed-forward network (FFN) operations of the MoE layer as large sparse matrix multiplications, significantly enhancing the execution speed and naturally handling cases where different experts get a variable number of tokens assigned to them.
  • Request for Mistral model file on Ollama: @dangfutures inquired about the appropriate Mistral model file to be used on Ollama.
  • Potential regression in Accelerate/Deepspeed multi-gpu users: User @noobmaster29 shared a tweet from @StasBekman warning about a regression issue in accelerate==0.23 (a Deepspeed integration). Downgrading to accelerate==0.22 or using the latest main branch was advised to overcome this, with the fix awaiting release.

Links mentioned:

Tweet from Stas Bekman (@StasBekman): Heads up to Accelerate/Deepspeed multi-gpu users There was a regression in accelerate==0.23 (deepspeed integration) which would make your training much slower. The fix has just been merged - so you c


▷ #axolotl-dev (37 messagesđŸ”„):

  • Axolotl to incorporate MLFlow for experiment tracking: @caseus_ discussed the addition of MLFlow for experiment tracking in the Axolotl project. This was proposed by @JohanWork in Pull Request #1059.
  • System prompts in YAML configuration: Conversation around configuring initial system prompts for the sharegpt within the YAML file. @dctanner suggested this would be cleaner than adding it to all dataset records while @le_mess shared that they are currently adding it manually each time.
  • Peft update fixed Phi LoRA issue: @marktenenholtz identified that an error in Phi LoRa handling was fixed in the update to peft==0.7.0. The issue related to shared memory not being handled correctly by previous peft versions and that identification of LoRa modules needed to be specific for embedding and linear layers.
  • Websockets added to Axolotl for external job management: @david78901 proposed adding a websockets endpoint to the Axolotl project to allow external triggering and monitoring of jobs. @caseus_ showed interest in incorporating this into the main project.
  • Accelerate Pinning: @caseus_ suggested that Accelerate needs to be pinned to the correct version as indicated by @nanobitz, which was an issue highlighted in the Pull Request #1080.

Links mentioned:

▷ #other-llms (1 messages):

leoandlibe: I use the exllamav2 convert.py to make EXL2 quants 😄

▷ #general-help (10 messagesđŸ”„):

  • Searching for Chat UI that Supports ChatML or chat_template: @le_mess inquired about any chat interfaces that support ChatML or chat_template out of the box. In response, @nanobitz suggested ooba.
  • Interest in Testing gguf: @le_mess expressed his/her interest in testing gguf. @nanobitz recommended either lm studio or ollama but didn’t provide specific operating instructions for ollama.
  • Query about Zero2 Training Speed and GPU: @athenawisdoms asked if there is a significant difference in Zero2 training speed between two multi-GPU systems (e.g., a6000), one using pcie3.0x16 and the other one using pcie4.0x16. The response to this query was not recorded.

▷ #rlhf (3 messages):

  • Prompt Strategy a Necessity for AI Interaction: User @caseus_ suggested that a prompt strategy, which would involve formatting prompts and combining previous turns into the input, is necessary for optimal AI interaction.
  • Prompt Strategy Already in Action: Following up on the discussion, user @xzuyn mentioned that they have already been implementing a prompt strategy.

▷ #community-showcase (1 messages):

  • System message has no impact on training performance: @le_mess stated that the content of the system message has no significant impact on the performance of the trained model. In their words, “The system message could be “ehwhfjwjgbejficfjeejxkwbej” and the performance would probably still be the same.”

▷ #shearedmistral (8 messagesđŸ”„):

  • Referencing Shearing Method Implementation: @caseus_ pointed out a particular step in the shearing process via a link to a GitHub repository. He suggested utilizing the pre-processed data from the project’s Google Drive, but cautioned this would mean being tied to the same dataset.
  • Consideration for SlimPajama Use: @caseus_ contemplated if it’s worthwhile to choose SlimPajama over the RedPajama v2 dataset for improved deduplication and quality. He also observed that RedPajama v2 no longer includes subsets source.
  • Positive Reception to Dataset Subsets: Responding to this, @emrgnt_cmplxty voiced liking the subset feature and questioning why it was removed.
  • Potential Shift to Slim Pajamas: @emrgnt_cmplxty suggested a possible pivot to using Slim Pajamas for the project.

Links mentioned:


HuggingFace Discord Discord Summary

  • New Channels & Updates to Open Source Libraries: @lunarflu announced the launch of two new discussion channels transformers.js and ML and cybersecurity while celebrating diffusers reaching 20,000 Github stars (source tweet). Further, vishnu5n showed users how to optimize machine learning codes for machine learning models of skew detection (source link).
  • Attention Shifted to Attention & Self-attention: In diffusion-discussions @grepolian asked about the difference between attention and self-attention, but didn’t receive a response (source link).
  • Advancements in AI Tech for Gaming: @papasancho in cool finds discussed AI integration in video games using Herika and Mantella, attributing it as a significant advance in gaming (Herika link), (Mantella link).
  • Solving the Mystery of Phi-2 Behaviour: In the general channel, @admin01234 discussed a peculiar behavior of the Phi-2 model, where a correct response is followed by seemingly random answers that didn’t relate to the input.
  • LLMs & SQL Injection Attacks: In the NLP channel, they discussed the potential vulnerabilities of web applications integrated with LLMs, particularly to SQL injection attacks, using a paper on arXiv as a reference (source link).
  • Conversations on Deep Reinforcement Learning: In today-im-learning, @couldhu announced finishing a Deep RL course, while others like @muhammadmehroz, @gduteaud and @cloudhu provided insights about the course and shared the link to the course (source link).
  • CCTV Query and Skew Detection: In the computer vision channel, user @iloveh8 discussed implementing GPT-V or LLAVA for real-time CCTV usage, and @vishnu5n shared their work on skew detection in document images (source link).

HuggingFace Discord Channel Summaries

▷ #announcements (1 messages):

  • New channels and open source updates grace HuggingFace’s roster: @lunarflu announced the launch of two new discussion channels transformers.js and the intersection of ML and cybersecurity. Plus, diffusers celebrated reaching 20,000 GitHub stars and the new training script integrating the pivotal tuning (referred from @cloneofsimo cog-sdxl) and prodigy optimizer (referred from kohya's scripts) alongside compatibility with AUTO1111 was released. See the tweet for details.
  • Transformers.js gets a 2024 glow-up: @xenovacom revealed significant improvements for Transformers.js developers; including features like conditional typing of pipelines, inline documentation with code snippets, and pipeline-specific call parameters and return types. See here for more.
  • Direct Mistral / Llama / TinyLlama safetensors pull from the Hub: MLX: @reach_vb confirmed that MLX can now pull Mistral/ Llama/ TinyLlama safetensors directly from the Hub, including the support for all mistral/ llama fine-tunes too! More information about the installation here.
  • Gradio brings out 4.13 version with critical fixes and compatibility: Version 4.13 will come with fixes for Button + .select() + Chatbot, security patches, and compatibility with Python 3.12. Check out the comprehensive Changelog.
  • Swifter Whisper with speculative decoding: A noteworthy improvement cited was a 200% faster Whisper thanks to speculative decoding. See the tweet for more.

Links mentioned:

▷ #general (22 messagesđŸ”„):

  • Event Reading Group on the Block: @admin01234 enquired about the timings of the event reading group, to which @lunarflu responded affirming its occurrence later in the week with additional synchronous discussions happening in the discord thread.

  • Machine Learning Courses Query: @daksh3551 sought suggestions for a structured course (both paid and free) on Machine Learning.

  • Enigma of the Phi-2 Behaviour: @admin01234 reported a peculiar phenomenon where the Phi-2 model would output a correct response followed by seemingly random responses.

  • Desire for High Power Computing Environments: In a lengthy discourse, @s4vyss expressed difficulty using free computing resources like Kaggle and Google colab notebooks for larger projects due to lack of auto completion, error debugging, and limitations of working in a single notebook. The user wondered about alternative machine learning coding environments that could offer free computing power for local coding.

  • Identifying PII in Headers using StarPII: @benny0917 shared his experience attempting to identify Personal Identifiable Information (PII) in headers using the StarPII model from Hugging Face. The model struggles to correctly identify context-dependent PII headers.

Links mentioned:

▷ #today-im-learning (7 messages):

  • Benchmark Results for Mistral-7B-instruct & vLLM: User @harsh_xx_tec_87517 detailed their benchmark results for Mistral-7B-instruct with vLLM on LinkedIn, stating it’s a great library for deploying OSS LLMs. Detailed benchmarking results can be found on their LinkedIn post.
  • Completion of the DRL course: @cloudhu announced the completion of their DRL course and received congratulations from @osanseviero.
  • Queries about the DRL course: @muhammadmehroz showed interest in pursuing the DRL course. In response, both @gduteaud and @cloudhu suggested the Deep Reinforcement Learning Course provided by Hugging Face, which can take one from beginner to expert level.

Links mentioned:

Welcome to the đŸ€— Deep Reinforcement Learning Course - Hugging Face Deep RL Course

▷ #cool-finds (3 messages):

  • Fine-tuning VLM like LLaVa: User @silamine asked for any research papers or GitHub repos that provide guidance on fine-tuning a VLM like LLaVa.
  • Embed charts in readme with mermaid: @not_lain recommended a tool for embedding charts into readme files called mermaid, and shared the GitHub link of the tool.
  • AI Integration in Video Games: @papasancho shared their perspective on AI integration in video games, considering it a significant advance since the Atari 2600. @papasancho identified Herika and Mantella as examples of this innovation and shared links to the Nexus Mods pages for both Herika and Mantella which use AI technology to enhance in-game interactions.

Links mentioned:

▷ #i-made-this (3 messages):

  • World’s Fastest Conversational AI Unveiled: @vladi9539 shared a YouTube video of their attempt at creating the world’s fastest conversational AI software. The software is about real-time conversations with AI and its implementation boasts the lowest latency conversational algorithm compared to current technologies.

  • AlgoPerf Competition Launched: @franks22 announced the recent launch of the AlgoPerf competition, aimed at finding the best algorithm for training contemporary deep architectures. The competition, open to everyone, offers a $25,000 prize in each of its two categories. More information can be found in their GitHub repository.

  • MusicGen Extension Update Announced: @.bigdookie updated everyone about new features added to the MusicGen browser extension. This includes undo functionality as well as the ability to crop AI-generated music. They also invited community members to test the extension, shared through a YouTube link, and asked for help in enhancing the speed of MusicGen outputs.

Links mentioned:

▷ #reading-group (5 messages):

  • Splitting Databases for Efficient Learning: @chad_in_the_house suggested a method involving the separation of databases into development db and testing db. The process involves answering questions in the development db, storing correct answers along with their text embeddings and chains of thought, and using this information for in-context learning for new questions.
  • SG Event Proposal: @lunarflu indicated plans to set up an event tomorrow but did not disclose further details.
  • Invitation to Discuss in Reading Group: @lunarflu showed interest in @bluematcha’s topic and extended an invitation for a deeper coverage in the next Reading Group discussion.
  • Variational Inference Book Promotion: @ypbio shared information about a book on variational inference that claims to contain comprehensive review of the topic and everything needed to develop world-class foundational machine learning expertise. They included a link to the book’s website, www.thevariationalbook.com.
  • Time Zone Challenges: @skyward2989 expressed disappointment that the discussion would take place at an inconvenient time for them, specifically at 3 AM.

Links mentioned:

The Variational Inference Book: A comprehensive review of variational inference in one concise book.

▷ #diffusion-discussions (3 messages):

  • Loading and Fusing LoRA Weights: @sayakpaul shared detailed instructions on how to load and fuse LoRA weights into the base UNet model. The commands shared were: pipeline.load_lora_weights() and then pipeline.fuse_lora().

  • Query on Attention Mechanisms: @grepolian asked about the differences between attention and self-attention, but no explanation was provided in this chat history.

  • Efficient LoRA Inference Discussed in Blog Post: @yondonfu posted a link to a recent Huggingface blog post on optimizing LoRA inference, elaborating efficient ways to load LoRA adapters and speed up inference. Key points include the observation that batching was not significantly improving throughput for diffusers and increased the latency six times.

  • Batching with Diffusers Not Effective?: @yondonfu drew particular attention to subject of batching with diffusers, questioning the utility of the technique given the minor throughput increase with an 8 batch size contrasted with a six-fold increase in latency, and asked for further insight into why this might be the case. The questions were left unanswered at the close of the chat.

Links mentioned:

Goodbye cold boot - how we made LoRA Inference 300% faster

▷ #computer-vision (2 messages):

  • Query on Real-Time CCTV Use Case: User @iloveh8 brought up a discussion around implementing GPT-V or LLAVA for real-time CCTV use such as theft detection or baby monitoring.
  • Skew Detection Resource Shared: In response, @vishnu5n shared their work which models skewed document images with their respective skewness. The detailed work can be found on Kaggle at this link which could potentially serve as a reference for similar problem statements.

Links mentioned:

skew_detection: Explore and run machine learning code with Kaggle Notebooks | Using data from rdocuments

▷ #NLP (18 messagesđŸ”„):

  • SQL Injection Vulnerabilities in LLMs: @jryarianto discussed potential latency issues with computational resources in a developing country and asked about strategies to defend against SQL injection attacks. They suggested using parameterized queries for this purpose. They referenced a paper on arXiv that provides a comprehensive examination of a potential type of SQL injection attack that could occur with Language Models (LLMs).
  • Open Source LLM Model Suggestions for Conversational Chatbot: @jillanisofttech asked for suggestions for an open-source LLM model suitable for fine-tuning on a large custom dataset of PDF, txt, and docs files. They need to develop a conversational chatbot that can handle text and voice input, and were also interested in knowing an appropriate framework for building the application.
  • Text Generation Using NSQL-2B Model: @madi_n asked about setting max_new_tokens to a value greater than 2048 in a text generation task using the NSQL-2B model. They sought clarity on whether it is alright to increase max_new_tokens, considering the model’s predefined max length.
  • Fine-tuning Mistral 7B—Identifying the Correct Syntax: @denisjannot inquired about the correct syntax to use while fine-tuning Mistral 7B, noticing some irregularities when using the trained model. @asprtnl_50418 helped by suggesting the consistent use of the same prompt template that was used during the initial model training, and also provided a link to the End Of String (EOS) token.
  • GPU Usage for the Suno/bark-small Model in TTS Task: @x_crash_ presented a question about enabling explicit GPU use with suno/bark-small model on Google Colab, after noticing that the model did not seem to be utilizing the GPU resources. They provided their Python script to illustrate their attempt.

Links mentioned:

▷ #diffusion-discussions (3 messages):

  • Instructions on loading LoRA: @sayakpaul shared the method to load the LoRA weights into the base UNet model by calling pipeline.load_lora_weights() and pipeline.fuse_lora().
  • Question about attention mechanisms: @grepolian posed a question about the difference between attention and self-attention. (Further discussion or responses not provided).
  • Deep dive into LoRA inference optimization: @yondonfu referenced a section from a HuggingFace blogpost discussing how LoRA inference has been optimized. They noted that it explained batching with diffusers doesn’t improve performance significantly and usually results in a higher latency. Two questions derive from this finding:
    • Is batching with diffusers generally not worthwhile due to marginal throughput gains and significant latency increases?
    • What’s the logic behind the observation that batching with diffusers does not improve performance given that there is enough VRAM available?

Links mentioned:

Goodbye cold boot - how we made LoRA Inference 300% faster


Perplexity AI Discord Summary

  • API and Model Outages Notices: User @phinneasmctipper reported outages for pplx-7b-online and pplx-70b-online models via the API and API sandbox. User @monish0612 in #pplx-api reported a separate 500 internal server error. @icelavaman acknowledged the issue.
  • Call for Improved Citation Format in AI Responses: @Chris98 suggested replacing numerical citation references with hyperlinks in Perplexity’s AI responses. The idea received support from @byerk_enjoyer_sociology_enjoyer.
  • Clarification on Subscription Billing During Free Trial: @alekswath questioned unexpected immediate billing when switching from a monthly to an annual plan during a free trial, triggering a discussion on subscription pricing.
  • Integration of Perplexity as a Search Engine: User @bennyhobart wanted to set up Perplexity as a default search engine in Chrome. @mares1317 shared a link to the Perplexity - AI Companion extension on the Google Web Store.
  • Changes Noticed in Claude 2.1 Responses: A shift in Claude 2.1’s tone was remarked by @Chris98 and @Catto who were unsatisfied with the AI’s recent responses, comparing them to GPT Copilot’s style and expressing a wish for Claude 2.1’s original voice.
  • Issue in Citing Sources: In the #pplx-api channel, @hanover188 asked about pplx-70b-online model’s capability to cite sources and @brknclock1215 clarified that it currently does not cite sources like the Perplexity app does, hinting at a potential future update but corrected later that it is currently not on Perplexity’s roadmap.

Perplexity AI Channel Summaries

▷ #general (51 messagesđŸ”„):

  • Possible Outages in Perplexity’s Models: @phinneasmctipper reported encountering 500 error codes when trying to access the pplx-7b-online and pplx-70b-online models via the API and API sandbox. @icelavaman acknowledged the issue and promised a fix despite currently being outside working hours. Link to the conversation
  • Request for Citation Format Change in Responses: User @Chris98 raised a request to replace the numerical citation references with hyperlinks in Perplexity’s AI responses. @byerk_enjoyer_sociology_enjoyer agreed and got support from @Chris98 via a ⭐ emoji reaction on a relevant issue they had previously raised.
  • Question on Subscription Pricing: @alekswath questioned why they were immediately billed $200 upon trying to switch to an annual plan from a monthly one during a free trial. They inquired if there were issues with the free trial.
  • Utilizing Perplexity as Default Search Engine: @bennyhobart asked how to set up Perplexity as the default search engine in Chrome. @mares1317 shared a link to the Perplexity - AI Companion extension on the Google Web Store to help with this.
  • Change in Claude 2.1 Responses: @Chris98 and @Catto expressed dissatisfaction with Claude 2.1’s recent responses, noticing a perceived decrease in quality and a shift in tone to sound more like GPT Copilot. They wished for a return to the original Claude 2.1.

Links mentioned:

▷ #sharing (8 messagesđŸ”„):

  • Stress Testing Web Application Resources Shared: @whoistraian provided a link for resources on stress testing a web application.
  • Inquiry about Perplexity AI’s Functionality: @myob171 asked if Perplexity AI is an AI search engine.
  • Discord Channels Links Shared: @mares1317 shared two Discord channel links, possibly containing other related discussions.
  • OpenAI Response Shared: @__sahilpoonia__ posted a link regarding how OpenAI responds to certain queries.
  • Praise for Perplexity’s Calendar Integration: @clockworksquirrel highlighted how Perplexity simplifies calendar management via natural language, especially beneficial due to their physical disability. They also noted the usefulness of copying and pasting within the tool.
  • Gratitude Expressed for Perplexity: @siriusarchy expressed their appreciation for Perplexity.
  • Volkswagen Incorporates ChatGPT: According to @ipsifu, Volkswagen has integrated ChatGPT into its car systems.

▷ #pplx-api (6 messages):

  • 500 Internal Server Error: User @monish0612 reported experiencing a 500 internal server error with the API for several hours. They are a paid user and are hopeful for a quick resolution.

  • PPLX-70b-online model’s Source Citation Issue: @hanover188 asked if it’s possible for pplx-70b-online models to cite their sources like the Perplexity app does. They mentioned needing this for a build that requires summarized real-time data with actionable source information.

  • No Direct Citation in PPLX-70b-online: In response to @hanover188’s query, @brknclock1215 provided a link and a summary of the source stating, “no - the pplx-70b-online model does not directly cite sources like the Perplexity app does
 Adding support for grounding facts and citations is on Perplexity’s roadmap for the future.”

  • Feature Not on the Roadmap?: Contrary to earlier information, @brknclock1215 later corrected that support for grounding facts and citations is actually not on Perplexity’s roadmap, providing another discord link to a discussion confirming this.


LAION Discord Summary

  • AI Debates Heat Up - “Utopia or Not?”: A spirited discussion led by @SegmentationFault sought to dissect the viewpoints of anti-AI critics, concluding that their stances largely stem from virtuous signaling and unrealistic utopian considerations. @mkaic further added to the discourse, asserting that the utopian goal of guaranteed income is more attainable with AI permeation. User .undeleted humorously posited the three-step plan of anti-AI proponents: ban AI, preserve current jobs, and prevent future job elimination through technology. These assertions were met with resistance by @SegmentationFault, who championed AI’s role in productivity and global competitiveness.

  • Pizza AI or Human?: Amidst the steady flow of AI discourse, @thejonasbrothers injected an amusing angle to the conversation, posing a lighthearted question as to whether the sentence “I am a pizza” was crafted by AI or not.

  • Game-changing AI Training Hacks surfaced: @pseudoterminalx introduced innovative AI training techniques in an in-depth discussion, expounding on the advantages of biasing towards early timesteps in the selection process. By manifesting successful results using Euler’s image with zero-terminal SNR that surpassed the previous midjourney v6, this user circumvented traditional precedent, additionally endorsing the concurrent use of random crops and full frames.

  • AI Detector Faces Credibility Crisis: @lixiang01 expressed doubt about the efficacy of a specific AI detector, arguing that it can be effortlessly deceived with carefully constructed prompts.

  • State Space Models vs Transformers: A research paper shared by @thejonasbrothers unveiled the rising dominance of State Space Models and Mixture of Experts over Transformers with the development of MoE-Mamba. The paper can be accessed here.

  • Watermarking Woes for Generative Models: A blog post, based on a research paper titled Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models, was highlighted by @chad_in_the_house. The post delves into the challenges of watermarking generative models while preserving output quality and enabling AI verification. The full post can be found here.

  • HiDiffusion Framework - A New Viable Option: @gothosfolly drew attention to a breakthrough text-to-image diffusion models framework called HiDiffusion, which can create high-resolution images. The research paper on HiDiffusion can be viewed here.

  • Precise Location of RAU Block Questioned: @gothosfolly also raised questions about the exact location of the RAU block in the SDXL of HiDiffusion’s architecture, as described in the paper.

LAION Channel Summaries

▷ #general (44 messagesđŸ”„):

  • Captain AImerica and the Luddite League:

    • There was a lively debate led by @SegmentationFault concerning the views and plans of anti-AI critics, especially those active on Twitter. Largely critical of their stances, he probed for insight into their greater scheme for halting AI development, before dismissing most of them as more inclined towards virtue signaling and unrealistic utopian ideals. Quoted, @SegmentationFault: “Even if some country bans AI entirely, some other country will not, and companies there will be more productive”.
  • *AI Utopia - Champagne Dreams: Mkaic chimed into the conversation noting the irony in anti-AI activists’ utopian dreams. @mkaic argued that a utopia, where everyone is paid to exist, is more feasibly achieved by letting AI do all the work, rather than banning AI.

  • **AIwhile, back at “Anti-AI ban malfunction”*: In a slightly satirical exchange, user .undeleted suggested a three-point plan he believes to be the mindset of anti-AI critics: Ban AI, maintain current jobs, avoid using tech to eliminate jobs in the future. @SegmentationFault rebutted, pointing out that companies need to be competitive, and AI increases productivity, a reality he considered inevitable.

  • “I am a Pizza” - Human or AI?: @thejonasbrothers made a playful contribution to the chat, writing the sentence “I am a pizza,” then asking if it was written by an AI, sparking a lighter, more humorous tone amidst the serious chat.

  • AI Training Hacks with Pseudoterminalx: User @pseudoterminalx unveiled some AI training hacks in an extensive, more technical discussion. They discussed the benefits of training on early timesteps, using a 50x bias on the probability of early timesteps being selected. They noted that this didn’t eliminate others from the pool but tilted things notably. They flaunted these techniques’ effectiveness by sharing several images, including one from Euler’s with zero-terminal SNR, and asserted its improved quality than the previous midjourney v6. Towards the end, they added another trick - training on random crops and full frames (mixed) of 2160p blu-ray rips.

Links mentioned:

Phase1 Collect Underpants GIF - Phase1 Collect Underpants Gnome - Discover & Share GIFs: Click to view the GIF

▷ #research (18 messagesđŸ”„):

  • Impossible task for AI detector: @lixiang01 voiced skepticism about the effectiveness of a particular detector, stating that it’s quite impossible for it to work as, “this kind of detector can be easily fooled by contents generated by a carefully written prompt.”
  • State Space Models challenge Transformers: @thejonasbrothers shared an informative research paper regarding the dominance of State Space Models (SSMs) and Mixture of Experts (MoE) over Transformers, focusing on the development of MoE-Mamba, which shows better performance while preserving the inference performance gains of Mamba against the Transformer. The paper can be accessed here.
  • Watermarking challenges in generative models: @chad_in_the_house introduced a blog post discussing the limits of strong watermarking for generative models, suggesting that even if creators watermark their outputs, it would be difficult to maintain quality while having the AI verifiable. The post was based on a paper called Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models and can be directly accessed here.
  • Insights on HiDiffusion framework: @gothosfolly brought attention to a paper about HiDiffusion, a tuning-free framework designed to enable pretrained text-to-image diffusion models to generate high-resolution images. The paper can be found here.
  • Questions raised about RAU block: @gothosfolly sought clarification if the RAU block for SDXL in HiDiffusion’s architecture is located one block later than it should be according to the paper’s appendix.

Links mentioned:


Mistral Discord Summary

  • Tinybox Revealed: In a discussion about Tinybox, @digitalbo explained that it was a small “supercomputer” designed to run at home. The link provided gave more insight into the product.
  • Mistral and Web Browsing Disconnect: When @ajkuba asked if Mistral could be used for web browsing, @sublimatorniq clarified that unlike ChatGPT Plus, Mistral does not have a browsing feature or “function calling”.
  • Deploying Models on Raspberry Pi 5 Explored: @psdc4171 queried on how to run 7b on a Raspberry Pi 5. @ethux provided resources to GGUF models compatible with Raspberry Pi, model suggestions from HuggingFace such as OpenChat 3.5 1210 and Mistral 7B Instruct v0.2, and WebUI’s from ooobabooga’s GitHub repository and HuggingFace’s chat-ui GitHub repository.
  • Deployment Issues with Mixtral 46B on AWS: @sa_code experienced deploying an unquantised Mixtral 46B issue on a g5.48xlarge AWS instance where vllm failed to load the model into memory even after sharding using —tensor-parallel-size 8. A solution was not found in the discussion.
  • Fine-tuning Troubles and Explorations: Conversations revolved around issues faced while fine-tuning Mistral using a 4090 (@wilzh40), questions about whether a single A100 would suffice for full Mistral 7b Instruct training (@dinonst74), the effectiveness of fine-tuning with specific domain chat logs (@nickbro0355), and struggles in training Mistral 7b Instruct for text to SQL (@dinonst74). @adriata3 described their unsuccessful fine-tuning attempts with QLoRA 4-bit.
  • Paper on Mixtral of Experts Released: @sophiamyang shared a new paper on Mixtral of Experts https://arxiv.org/pdf/2401.04088.pdf.
  • Shout out to Vanna, the SQL helper: @zain_vanna announced the addition of Mistral integration to Vanna, a Python package using RAG for SQL generation for databases in GitHub Repository introduced.
  • Upset about Mistral’s API Latency: Users across the guild expressed their concerns about the variance in the Mistral API’s response times, sometimes taking 5-9 seconds for a response. @lerela, a member of the Mistral team, stated they are actively working on improving response times. A suggestion for Memstral to incorporate a function akin to OpenAI’s “function” tokens was also discussed (@astel123457).

Mistral Channel Summaries

▷ #general (13 messagesđŸ”„):

  • What is Tinybox?: In response to @gbourdin's query about Tinybox, @digitalbo defined it as a small “supercomputer” designed to run at home. @digitalbo also shared a link for more information. The cost however was noted to be $15,000.
  • Mistral and Web Browsing: @ajkuba asked whether Mistral can be used for web browsing. @sublimatorniq clarified that, unlike ChatGPT Plus, Mistral does not have a browsing feature nor “function calling”.
  • Request for Guidance on Project: @saga04, a software engineer and startup founder, requested advice on embarking on a project to create a “world teacher” for children.
  • Code Generation with Open Source Model: @xquietude asked about the ability of OpenAI’s last open-source (7B) model to generate code. @.superintendent confirmed the model’s ability to generate code, while @sophiamyang suggested that Mistral 8x7B could be better for this purpose.

▷ #deployment (9 messagesđŸ”„):

  • Running 7b on Raspberry Pi 5: User @psdc4171 sought advice on how to run 7b on their Raspberry Pi 5. @ethux provided a range of resources including GGUF models compatible with Pi’s from this GitHub repository, model suggestions not bigger than 4 bits from HuggingFace such as OpenChat 3.5 1210 and Mistral 7B Instruct v0.2, and WebUI’s for easy testing from ooobabooga’s GitHub repository and HuggingFace’s chat-ui GitHub repository.
  • Deployment error on AWS with unquantised Mixtral 46B: @sa_code tried to deploy an unquantised Mixtral 46B on AWS using the instance g5.48xlarge and ran into an issue where vllm failed to load the model into memory, even with the model sharded with —tensor-parallel-size 8. @ethux expressed uncertainty, stating 192GB VRAM should be enough for the operation. @sa_code suspects the issue might be with the vllm package.

Links mentioned:

▷ #finetuning (13 messagesđŸ”„):

  • Seeking success with Mistral fine-tuning using a 4090: @wilzh40 asked if anyone had any success with fine-tuning Mistral using only a 4090, as well as using it solely for inference. @adriata3 responded that while they worked with QLoRA 4-bit for fine-tuning, the results were not as desired.
  • Will a single A100 be sufficient for full Mistral 7b Instruct training: User @dinonst74 asked if a single A100 with 40 GB would be enough, or 80GB is needed, for full Mistral 7b Instruct training which is not 4bit and lora.
  • Curiosity about fine-tuning off of chat logs: @nickbro0355 seeks advice on an effective way to fine-tune a model using specific domain chat logs, wondering if there would be much benefit from fine-tuning off user-approved chats or if it’s better to create own fine-tuning information.
  • Inference on vllm explained: @wilzh40 asked @adriata3 what inference on vllm meant. A link to the vllm project on GitHub was shared in response.
  • Struggling with Training Mistral 7b Instruct for text to SQL: @dinonst74 shared they were a beginner on the finetuning front and were trying to fine-tune Mistral 7b Instruct for text to SQL with mssql (T-SQL) syntax generation. Despite having created a custom dataset and run training on A1000 for 6000 segments, they were unsatisfied with the results and sought advice on improving them. The discussion included links to their custom dataset on huggingface, their process on Google Colab, and their project results on wandb.

Links mentioned:

▷ #announcements (1 messages):

sophiamyang: New paper on Mixtral of Experts: https://arxiv.org/pdf/2401.04088.pdf

▷ #showcase (3 messages):

  • Waiting no more: @gbourdin expressed frustration about being on the waiting list for several days with no additional info on the website. @joselolol. reassured them that they were patching things up for a general release and happy to give early access.
  • Hello, Vanna: @zain_vanna announced the addition of Mistral integration into Vanna, a Python package that utilizes RAG for SQL generation for databases. This was accompanied by a link to the GitHub Repository.

Links mentioned:

GitHub - vanna-ai/vanna: đŸ€– Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄.: đŸ€– Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using RAG 🔄. - GitHub - vanna-ai/vanna: đŸ€– Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using R


▷ #la-plateforme (7 messages):

  • Concerns About Waiting Time for Mistral API: User @alimsss raised an issue about the variance in waiting times for the Mistral API, with responses sometimes taking 5-6 seconds and sometimes arriving almost instantly. @michaelwechner suggested that this could be due to peak traffic times resulting in queued requests, while @casper_ai shared similar experiences with waits of up to 9 seconds.
  • Mistral Team’s Response to API Latency Concerns: @lerela, a member of the Mistral team, addressed the issue by stating that they are actively working on improving the response times.
  • Suggestion for Local Function Implementation: User @astel123457 discussed the possibility of Mistral incorporating a function akin to OpenAI’s “function” tokens, which would allow for calls to local functions and return responses based on the output of that function. This would give the bot greater versatility in coding tasks.
  • User Experiences on API Latency: @sublimatorniq and @casper_ai discussed their experiences with API latency. Both have seen a range of response times, with @sublimatorniq remarking that the potential indicated by the faster response times gives hope for the future.

LangChain AI Discord Summary

  • LangChain Questions Needing Answers: @aaravvv__ and @rajvir3 asked about sequence diagrams in Pinecone and prospects of langchain/chat_models/messages availability in node_modules respectively, spawning discussions on LangChain utilities.
  • Heavyweights Square Off: BM25Retriever vs. Large Docs: @tigerinus has been wrestling with using BM25Retriever over a significant volume of disk-stored documents, an issue that is ripe for an answer.
  • Saving the Day with Code!: @uvizc_43278 reported that his LangChain RetrievalQA.from_chain_type app broke due to a deprecated text-davinci-003 model, but he had a solution ready. This might be a useful fix for anyone encountering a similar issue.
  • Llamafile, LangChain’s Hero?: @rawwerks excitedly outlined the potential of llamafile in simplifying LLM deployment across multiple OS, hinting at the dawn of a new era of LLMs.
  • Setting the Stage for Multi-Embeddings: User @dejoma initiated talks to find the ideal structure where the input and output are more than one embedding.

LangChain AI Channel Summaries

▷ #general (28 messagesđŸ”„):

  • Sequence Diagram Loading Query: @aaravvv__ asked if there is any way to load a sequence diagram in Pinecone using LangChain.
  • Issues on Using BM25Retriever over Large Documents: @tigerinus sought experience and help on using BM25Retriever over a huge amount of documents from disk.
  • LLM, ML and NLP Conferences in UK 2024: @stuartjatkinson inquired about any good conferences on LLM, ML, NLP in the UK in 2024.
  • Error on LangChain Import: @azharudeen_02613 encountered an error when using import load_qa_chain from langchain.chains.question_answering, reporting a validation error regarding abstract class BaseLanguageModel.
  • LangChain JS Import Issue: @rajvir3 reported an issue when importing import { HumanMessage, SystemMessage } from "langchain/chat_models/messages"; in LangChain JS, getting an ERR_PACKAGE_PATH_NOT_EXPORTED error message and confirming that there is no langchain/chat_models/messages in node_modules.
  • Potential of llamafile in LLMs: @rawwerks expressed how impactful llamafile could be in deploying fine-tuned models on multiple OSs, implying it could be a game-changer for LLMs.
  • Deprecated Model Issue: @uvizc_43278 reported that his app using LangChain RetrievalQA.from_chain_type stopped working due to the text-davinci-003 model being deprecated and returned a solution to the issue.
  • Comparison of LangChain Agents and Assistant API: @sheldada and @evolutionstepper engaged in a conversation discussing the difference and efficiency between LangChain Agents and the assistants API.
  • Message Threading Issue with Assistant: @zainsheikh had a problem with the thread ID while using an assistant invoke command, reporting that a new thread ID is created instead of adding a message to the specified thread ID.
  • Broken Link to Biweekly Release Notes: @aaronsmith4931 reported that the link to subscribe to the biweekly release notes was broken and sought for help.
  • Python & JavaScript Import Errors: @rajvir3 reported encountering errors when trying to import the LangChain OpenAI in Python and JavaScript. The issue was resolved by @hasan_34148 who suggested using pip install langchain-openai.

Links mentioned:

▷ #langserve (6 messages):

  • KeyError bugs on the history variable: @cryptossssun noted that after fixing an error, a new KeyError linked to the history variable arose.
  • A call for model configuration advice: @pgpaul161201 thanked <@1033432389516546158> for their contributions, and inquired about a more streamlined method for allowing users to handle LLM backend configuration settings such as API key, endpoint, and organization name.
  • Potential example needed: @a404.eth requested an example related to the ongoing conversation.
  • Input type specification through Passthrough and Pydantic: @veryboldbagel provided advice on specifying input types in langchain chains and validated the importance of sanity-checking schema. They shared a code snippet from langserve for reference.
  • Dynamic field configuration with Pydantic: @veryboldbagel suggested a method for dynamically generating configurations for models by listing fields and their types using Pydantic, specifically by subclassing from ChatModel. They provided a code snippet demonstrating this technique.

Links mentioned:

langserve/examples/passthrough_dict/server.py at main · langchain-ai/langserve: LangServe đŸŠœïžđŸ“. Contribute to langchain-ai/langserve development by creating an account on GitHub.

▷ #share-your-work (3 messages):

  • LangChain FastAPI Starter Revealed: User @evolutionstepper shared a GitHub repository for LangChain FastAPI Starter.
  • LangChain FastAPI Starter Tutorial Available: @evolutionstepper also shared a YouTube tutorial titled Langchain with FastAPI in Docker with Traefik [Code Included], which provides guidance on how to use langchain with FastAPI.
  • In Search of Solution for Multi-Embeddings: User @dejoma initiated a discussion seeking suggestions for a structure where the input and output are more than one embedding. A specific use-case mentioned was finding the best match for a video that is too large to be represented in a single chunk and must therefore be divided into multiple chunks.

Links mentioned:

Langchain with FastAPI in Docker with Traefik [Code Included]: A tutorial on how to use langchain with FastAPI.h

▷ #tutorials (1 messages):

  • Discussing Multi-Embedding Structures for Matching Large Videos: @dejoma raised a question about constructing a mechanism that accommodates more than one embedding for both input and output. He is particularly interested in devising a solution to find the best matching large video file that cannot be represented in a single chunk.

LlamaIndex Discord Discord Summary

  • Deploying With Ease: @wenqi_glantz unveiled a thorough guide on deploying a @llama_index app to AWS Fargate through the use of Terraform and automated CI/CD pipeline with @github Actions. Here’s the guide.
  • Hackathon Galore: @llama_index is organizing their inaugural in-person hackathon this February, with over $4,000 in prizes. The event welcomes RAG enthusiasts for collaborative ventures in uncharted projects. Registration details can be found here.
  • Simplifying Structure: @andrejusb shared a handy video tutorial on extracting structured JSON from invoices using Pydantic classes with @OLLAMA. To learn more, you can catch the video here.
  • RAGs and Freelancers: A lively discussion took place regarding the feasibility and costs of freelancers building Retriever-Augmented Generation (RAG) systems for businesses. @.kamja, @lolipopman, and @mr.dronie chimed in on the complexities of production implementation despite simplicity in prototyping.
  • LlamaIndex Integration Queried: Both @jace93 and @sridhar_10158 queried about the possibility of integrating Sqlite-vss and DeepInfra respectively with LlamaIndex.
  • LlamaIndex Learning Resources: A query for courses and learning resources for LlamaIndex was posted by @asdw2..
  • RAG Inroads: Strategies to address RAG limitations as well as appreciations for the in-depth insights provided by @bushdid420 were key highlights in AI discussions. Important insights included document summarization and chunking to address limitations in language model’s context windows. Here is the shared paper.
  • Game Changer Llamafile: With its ability to deploy fine-tuned models in 6 different OSs, @rawwerks hailed llamafile as a game-changer but lamented the team’s lack of interest in adding RAG capabilities or Python support. The relevant Github issue was highlighted.
  • Handling Context: @bushdid420 spurred discussion into the challenges of processing long textual context in LLMs. The degradation in performance due to critical facts positioned in the context documents’ middle sections was highlighted and a possible solution provided.

LlamaIndex Discord Channel Summaries

▷ #blog (3 messages):

  • Deploying LLM Apps on AWS Fargate Simplified: @wenqi_glantz has shared a step-by-step guide on how to deploy a @llama_index app to a service on AWS Fargate using Terraform (@HashiCorp) and with an automated CI/CD pipeline with @github Actions. Detailed how-to post can be found here.
  • First In-person Hackathon by LlamaIndex: @llama_index is organizing their first in-person hackathon on February 2nd-4th aimed at bringing together RAG enthusiasts to collaborate on exciting new projects. The event offers over $4,000 in prizes. Event registration details here.
  • Getting Structured Output from LLM: @andrejusb has shared an educational video where he explains how to use @OLLAMA to run a local model and use Pydantic classes to output structured JSON from invoices. Watch the video tutorial here.

Links mentioned:

undefined

▷ #general (22 messagesđŸ”„):

  • Freelancing for Building RAG Systems: @.kamja explored the field of freelancers building Retriever-Augmented Generation (RAG) systems for businesses and the cost associated with it. This inquiry also interested @lolipopman and @mr.dronie, who expressed the simplicity in prototyping but the complexity in production implementation.
  • Understanding File Compatibility in a New Project: @pichart is seeking information on all file types a recently discovered project can comprehend.
  • Integration of LlamaIndex with Sqlite-vss: @jace93 raised a query about the possibility of using LlamaIndex with Sqlite-vss.
  • Understanding LongContextReorder: @langzeitstudent_41429 was interested in how the LongContextReorder works, particularly how the relevancy of each document is measured for reordering.
  • Incorporating User Feedback in create-llama TS: @ballwave is utilising create-llama TypeScript and is curious if there’s a known way to incorporate user feedback into the application, such as upvote-downvote on answers and written commentary, to avoid redundancy.
  • Possibility of Using LangChain ToolKit in LlamaIndex: @7leven and @cheesyfishes explored whether it is possible to utilize the LangChain ToolKit as a tool in LlamaIndex.
  • Integration of Mistral Model in LlamaIndex: @sridhar_10158 sought help to integrate the Mistral model in LlamaIndex, showing the specific parameters.
  • Understanding ColBERTv2’s Storage Needs: @wizboar inquired if ColBERTv2 can utilize vector stores, or if it must load data into RAM.
  • Mismatch in Dependency Versions in llama’s files: @pveierland noticed a mismatch in openai dependency versions. pyproject.toml listed openai = ">=1.1.0" whereas poetry.lock had it as openai = ">=0.27.8".
  • Building a RAG with Document References: @erizvi is working on a RAG system where documents reference other documents. They are using the OpenAI chat engine and are trying to figure out how to include referenced documents in the context provided to the llm for synthesis. A possible solution was suggested by @erizvi as well.
  • Integration of DeepInfra with LlamaIndex: @sridhar_10158 asked if anyone has tried integrating DeepInfra with LlamaIndex.
  • Seeking Courses to Learn LlamaIndex: @asdw2. was interested in finding any good courses that might provide instruction for learning LlamaIndex.

▷ #ai-discussion (9 messagesđŸ”„):

  • Strategies for Addressing RAG Limitations: @bushdid420 shared insights about various strategies including document summarization and chunking to handle the common issue in the RAG space of language models not being able to make sense of all added information due to their limited context windows. An important conclusion is that in spite of the large context windows, it’s challenging to maintain information importance in the middle of the context.
  • Llamafile - The NGINX of LLMs: @rawwerks hailed llamafile as a game-changer for practically deploying fine-tuned models instantaneously on 6 different OSs. However, the llamafile team showed no interest in adding RAG capabilities or Python support as highlighted in a Github issue. He proposed that combining llamaindex and llamafile could enable a free and private paradigm for advanced RAG.
  • Developing Intelligent Systems with OpenLLM and LlamaIndex: @andysingal shared a Medium article about the rise of open-source Large Language Models (LLMs) and how tools like OpenLLM and LlamaIndex have reshaped developer engagements with these models.
  • Resolving Content Accessibility in Long Contexts: @bushdid420 further discussed challenges in processing long textual contexts with LLMs, stating that critical facts positioned in the middle sections of context documents often lead to degradation in performance. He suggested a possible solution could be found in the document “Dealing with Long Contexts: LLMs - How to Find What’s in The Middle”.
  • Appreciation for Insightful Discussion: @benjaminbascary expressed appreciation for the in-depth insights provided by @bushdid420 on context handling with LLMs, indicating the conversation’s value.

Links mentioned:


DiscoResearch Discord Summary

  • Mixtral Implementation Launches Paper: Sebastian.bodza announced the publishing of Mixtral’s paper on Arxiv.
  • TACO: A Hot Discussion Topic for Code Retrieval: @sebastian.bodza brought the TACO Dataset into focus, suggesting its potential usability for code retrieval tasks and stimulating a discussion on the creation of ‘hard negatives’. Different strategies like using a ‘bad model’, BM25 for code similarity, and model permutation were proposed by members like @bjoernp and @philipmay.
  • Great Third Reich of Synthetic Data?: The forum heated up with @thewindmom and @bjoernp discussing the possible effects of synthetic data generation on model quality and the importance of data curation, with the later arguing for structured collection of learned data within embedding models.
  • Model Troubleshooting: @philipmay inquired about a model that claimed an MRR@10 of 0.9139 on a German Dataset, sparking a question chain on different model-specific issues.
  • Colbert Meets SQuAD2.0 & LLM Fine-tuning Datasets: @thewindmom shared a machine-translated version of SQuAD2.0 to Turkish for training Colbert model and introduced a GitHub repository for trending instruction fine-tuning datasets.
  • E5-mistral-7b-instruct Chimes in: @aiui raised the question over quantized weights for the E5-mistral-7b-instruct model. @sebastian.bodza expressed reservations about the model’s performance yet suggested potential help from a tutorial in the AWQ’s pip project for its quantizing.
  • Python DPO Dataset for Code Retrieval Strikes Chords: @bjoernp showcased Jon Durbin’s approach of using a Python DPO dataset for code retrieval tasks as seen in Durbin’s tweet, marking this usage of Vezora/Tested-22k-Python-Alpaca “chosen” responses and 13b/7b model generations as rejected responses.

DiscoResearch Channel Summaries

▷ #mixtral_implementation (1 messages):

sebastian.bodza: Paper for mixtral is released: https://arxiv.org/abs/2401.04088

▷ #embedding_dev (16 messagesđŸ”„):

  • Discussing TACO Dataset for Code Retrieval Tasks: @sebastian.bodza shared the TACO Dataset and suggested that it could be useful for code retrieval tasks as it contains multiple code solutions for each problem. @bjoernp reinforced the idea and wondered whether creating hard negatives by allowing a bad model to write the code could be an effective strategy. The discussion evolved with @philipmay proposing alternative creation of hard negatives including algorithm BM25 for code similarity and model permutation.
  • Question on Model’s Performance: @philipmay asked about a specific model that reached MRR@10 of 0.9139 on a German Dataset.
  • Regarding Synthetic Data Generation: @thewindmom quoted a valuable point which emphasized that synthetic data without new external knowledge could lead to worsening quality and stressed the importance of data curation. @bjoernp disagreed for the case of embedding models stating the need for structured collection of already learned data.
  • Using SQuAD2.0 for Colbert Model and Datasets for LLM Fine-tuning: @thewindmom shared tweet about machine translation of SQuAD2.0 to Turkish for Colbert model training and a GitHub repository as a quick guide for trending instruction fine-tuning datasets.
  • E5-mistral-7b-instruct Model Question and Opinions: @aiui asked if there are any quantized weights available anywhere for the E5-mistral-7b-instruct model. @sebastian.bodza expressed his skepticism about the model performance considering its size, but also suggested that the model could likely be quantized with the help of a tutorial in the AWQ’s pip project.
  • Python DPO Dataset for Code Retrieval: @bjoernp shared a tweet of Jon Durbin, showing a similar approach of using a Python DPO dataset for code retrieval tasks, using items from Vezora/Tested-22k-Python-Alpaca as the “chosen” responses, while using 13b/7b model generations as rejected responses.

Links mentioned:


Latent Space Discord Summary

  • New Age of CoPilot: @guardiang provided a fresh perspective on AI coding in a YouTube video titled “Copilot Prompt Engineering: 3 UI Frameworks, 2 AI Agents, 1 Coding Assistant (AIDER CCC)”, focusing on augmenting engineering abilities “at a rapid pace”.
  • In Search of Retriever-Augmented Generation (RAG) Document Set: A dialogue sprouted around @dgross211’s query on suitable document sets for a RAG project, with @swizec responding with questions about the specific nature of the documents required.
  • R1, the Next Big Thing?: Igniting more interest in the R1 device, @mdcker shared a keynote presentation introducing the mysterious tech.
  • Advancements in the Field of Few-Shot Prompting: @henriqueln7 pointed to a GitHub document of openai-python advocating ‘system’ role for few-shot prompting.
  • Evaluating AI Assistants Simplified: @henriqueln7 initiated a thread for locating resources for straightforward evaluation metrics of AI assistants, remarking a need for simpler alternatives than OpenAI Evals.
  • Language Learning Machine (LLM) State Machine Delivers: In a triumphant declaration, @davidkpiano shared the successful use of an LLM state machine in project langgraph, providing its GitHub repository for reference.
  • Official OpenAI API Got Some Attention: @swyxio heralded the openai-python library on GitHub, an important resource for using the official OpenAI API.
  • Mixture of Experts (MoE) Approach Leads to Phi-2 Victory: @swyxio shared news from @maximelabonne via a tweet about the success of their MoE model using phi-2, creating the efficient Phixtral, accessible on Hugging Face and Hugging Face.
  • Massive Language Model Reading List Available: For avid researchers, @eugeneyan shared a Language Model Reading List, a compilation of over 40 papers, while welcoming suggestions and issue submissions via their GitHub repository.
  • Mixtral Gets Noticed: Along with the other models discussed, @swyxio highlighted the importance of another model, the Mixtral.

Latent Space Channel Summaries

▷ #ai-general-chat (11 messagesđŸ”„):

  • CoPilot through a Different Lens: @guardiang recommended a YouTube video about utilizing AI in a unique way to code with AI titled “Copilot Prompt Engineering: 3 UI Frameworks, 2 AI Agents, 1 Coding Assistant (AIDER CCC)”. The video is aimed at enhancing engineering abilities rapidly.

  • Quest for RAG Doc Set: @dgross211 asked for suggestions on finding a document set for a Retriever-Augmented Generation (RAG) project leading to a discussion on the matter. @swizec responded by asking about the nature of the documents required.

  • Introduction to Keynote Presentation of R1 device: @mdcker shared a link to the keynote presentation of a device known as R1.

  • Exploration on Few-Shot Prompting: @henriqueln7 shared a link to a GitHub document of openai-python, revealing that the ‘system’ role is the recommended one for few-shot prompting.

  • Help requested for AI Assistant Evaluation Materials: @henriqueln7 asked for recommendations for simple materials to aid in the evaluation of AI assistants that are in production. They mentioned having checked OpenAI Evals but were seeking simpler resources.

  • LLM State Machine Success: @davidkpiano shared the GitHub repository of a project known as langgraph, indicating their success with LLM (Large Language Model) state machine.

  • OpenAI Python Library Highlight: @swyxio shared the link to openai-python library on GitHub, a library for the official OpenAI API.

Links mentioned:

▷ #llm-paper-club (5 messages):

  • MoE Approach Adopted for Phi-2: User @swyxio shared a tweet from @maximelabonne describing their successful creation of an efficient Mixture of Experts (MoE) model using phi-2. The model, named Phixtral, combines 2 to 4 fine-tuned models and outperforms each individual expert. The models, phixtral-2x2_8 and phixtral-4x2_8, are accessible at Hugging Face and Hugging Face respectively.
  • More than 40 Papers on Language Modeling Reviewed in 2023: @eugeneyan shared their Language Model Reading List which includes over 40 papers that were reviewed in 2023. They also encouraged members to suggest new papers or raise issues on their GitHub repository.
  • Further Reading on Language Modeling Suggested by @swyxio: In addition to the models discussed, @swyxio mentioned a paper on another model named Mixtral.

Links mentioned:


Skunkworks AI Discord Summary

  • Mixtral’s Mysterious Specialization: @interstellarninja observed that in Mixtral’s routing analysis, experts didn’t showcase domain specialization apart from DM Mathematics, which was non-uniformly dispersed. @baptistelqt in the #core-moe supported this by expressing how the load balancing loss for the router could deter domain specialization. Both noted that consecutive tokens often enter the same portals.
  • Python’s ‘self’ & English’s ‘question’, Seat-Mates in Mixtral: @interstellarninja highlighted the peculiar syntactic behavior in Mixtral’s router as it paired “self” in Python and “question” in English, showing that the model gravitates towards syntax.
  • PyTorch or Jax? The Devil We Know: @dook4 wondered why AI Engineers prefer PyTorch over Jax notwithstanding its usage in Llama. Responding, @yikesawjeez confessed that rewriting everything in Jax for Google’s TRC grant was a daunting task. They concluded that the familiar proves to be the winner.
  • Game Changer for Fine-Tuning?: @nisten dropped a cryptic comment about a chart revolutionizing the game for fine-tuning. However, the enigmatic chart wasn’t revealed in the referenced discussions.

Skunkworks AI Channel Summaries

▷ #general (7 messages):

  • Experts’ specialization in Mixtral routing analysis: @interstellarninja noted that in a Mixtral routing analysis, experts didn’t specialize in specific domains with the exception of DM Mathematics, which was non-uniformly distributed.
  • Syntactic behavior in Mixtral router: @interstellarninja also mentioned that the routers do exhibit some syntactic behavior with “self” in Python, “question” in English, and indentation in code. It was noted that consecutive tokens often get routed to the same experts.
  • Mixtral’s model leans heavily on syntax: @interstellarninja also highlighted that the model shows strong specialization in syntax, particularly evident in how indentation is routed to the same experts.
  • Debate on PyTorch vs Jax: @dook4 asked why people are using PyTorch over Jax apart from its use in Llama. @yikesawjeez suggested that people use PyTorch mainly because they know how to write it, even mentioning the difficulty they experienced when they had to rewrite everything in Jax while using Google’s TRC grant.
  • Implications on the fine-tuning game: @nisten suggested that a specific chart completely alters the game for fine-tuning. The specific chart was not included in the cited messages.

▷ #core-moe (4 messages):

  • Discussions on Domain Specialization in Mixtral Experts: @baptistelqt raised a point about Mixtral’s experts not specializing in specific domains, theorizing that the load balancing loss for the router could be a deterrent. They also mentioned successfully implementing an MoE (Mixture of Experts) model that encouraged domain specialization, and sought insights on any potential misunderstandings. @snowclipsed expressed similar curiosity.

LLM Perf Enthusiasts AI Discord Summary

  • What’s the temperature? It’s AI Time!: @thebaghdaddy questioned about the context of data in temperature adjustments in AI models. @sehaj.dxstiny shed light on it as a concept linked to the latent representation of images and Codebooks embeddings of VQ-GANs.
  • Stepping into the Hyper Zone: @thebaghdaddy suggested exploring hyper parameter tuning, methods for controlling data quality inputs, and also hinted towards using regularization techniques for potential improvements.
  • Embracing the Spirit of Trial and Error: @sehaj.dxstiny showed willingness to experiment with the suggested methods for enhancing AI models.
  • Hey OCR, get out of the way!: As per @jeffreyw128, there is a reliance on detecting bad text from methods other than OCR.
  • Private Company Docs - The Data Hunt: @res6969 keenly searched for a dataset related to private company descriptive documents. @jeffreyw128 swung into action, pointing out Metaphor holds data that aligns with the required description and offered to provide more insights privately.

LLM Perf Enthusiasts AI Channel Summaries

▷ #general (5 messages):

  • Decoding ‘Adjust Temperature’: User @thebaghdaddy asked about the context of data involved in adjusting the temperature in AI models.
  • VQ-GANs and Latent Representation of Images: User @sehaj.dxstiny clarified that it pertains to the latent representation of images and the Codebooks embeddings of VQ-GANs.
  • Suggesting Regularization Techniques: In a quest to help, @thebaghdaddy recommended exploring hyper parameter tuning, controlling data quality inputs and also hinted toward using regularization techniques for potential improvements.
  • Open to Experimentation: Responding to the suggestions, @sehaj.dxstiny acknowledged that they haven’t tried these methods yet but expressed openness to consider them.

▷ #rag (1 messages):

jeffreyw128: we rely on detecting if there’s bad text from non-OCR methods

▷ #datasets (2 messages):

  • Seeking Dataset for Private Company Documents: User @res6969 inquired about the availability of a dataset for private company descriptive documents, including information such as board decks, financial statements, and quarterly letters.
  • Dataset Offer from Metaphor: In response, @jeffreyw128 mentioned that Metaphor has data that fits the inquiry and offered to provide more information through private messages.

Datasette - LLM (@SimonW) Discord Summary

Only 1 channel had activity, so no need to summarize


  • Inflation’s Impact on Music Industry Earnings: User @justinpinkney questioned how adjusting for inflation might alter the perception of the music industry’s revenue trends.
  • Streaming vs Downloading: @dbreunig clarified that streaming was the catalyst for damaging parts of the music industry, not downloading.
  • The Golden Window for Mid-Tier Musicians: @dbreunig mentioned a “golden window” of opportunity for mid-tier musicians to earn a living prior to the Spotify era.
  • The Pre-Digital Music Cartel: @dbreunig argued that the pre-digital market operated as a cartel, and once consumers were allowed to purchase individual songs instead of full albums, it crumbled.
  • Influence of Spotify on Mid-Tier Musicians: In response to @dbreunig, @antisimplistic expressed doubts about Spotify leading to harder conditions for mid-tier musicians, suggesting that the industry’s transition to unlimited shelf-space and a fragmented market made it challenging for all artists, regardless of business model. Also, @antisimplistic suggested that inflation adjustments and larger spending patterns may be factors in assessing the industry’s trends.

Alignment Lab AI Discord Summary

  • AI Agents Pitch In For Explanation @burnydelic shared an engaging article by MIT News that reports on how AI agents can potentially help in elucidating the mechanisms of other AI systems.
  • Comparative Performance Analysis of Llama2-70B: @tirmizi7715 raised the question of why Llama2-70B performs almost as well as Mixtral and GPT-3.5 in several evaluations, but is significantly worse at MT Bench?
  • Cryptic Discussion Leaves Users Puzzled: User @m8than’s comment “wtf is this lol” underscored an apparent lack of clarity on the discussion about the performance comparison between Llama2-70B, Mixtral and GPT-3.5.
  • NousResearch Simulation Topic: @teknium highlighted a Twitter post from NousResearch about simulation pertaining to the out-of-order OO environment.

Alignment Lab AI Channel Summaries

▷ #ai-and-ml-discussion (1 messages):

burnydelic: https://news.mit.edu/2024/ai-agents-help-explain-other-ai-systems-0103

▷ #general-chat (2 messages):

  • Llama2-70B’s performance compared to Mixtral and GPT-3.5: User @tirmizi7715 asked why Llama2-70B is almost equally good at almost all evaluations compared to Mixtral and gpt3.5, but significantly worse at MT Bench.
  • Confused Participant: User @m8than seemed confused about the preceding discussion, and commented “wtf is this lol”.

▷ #oo (1 messages):

teknium: https://fxtwitter.com/NousResearch/status/1744865872563618128


YAIG (a16z Infra) Discord Summary

Only 1 channel had activity, so no need to summarize


  • Speculation on Google’s Gemini Training Technique: User @stevekamman was keen on the approach defined in an arXiv research paper which outlines a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that allows training of language models on poorly connected device clusters. This technique is potentially linked to how Google trained Gemini. The algorithm is a federated averaging variant, with AdamW as the inner optimizer and Nesterov momentum as the outer optimizer.

Links mentioned:

DiLoCo: Distributed Low-Communication Training of Language Models: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected acc