[TOC]
LM Studio Discord Summary
-
Variations in Dolphin and Mistral models, with a spotlight on optimizing hardware and software configurations. Users compared Dolphin Mistral 7b versions and exchanged insights on bias and censorship in AI models. Meanwhile, challenges were flagged with Mixtral deployment in local machines and suggestions were provided to download models from HuggingFace. The community also debated on how GPU use, particularly vRAM, can influence processing speed.
- āthe difference between Dolphin and other models is mainly in their fine-tuningā
@dagbs - āfor such hardware specifications, sticking with 7B q4 models would be the best practice.ā
@heyitsyorkie
- āthe difference between Dolphin and other models is mainly in their fine-tuningā
-
Users explored Understand Emotional Intelligence of AI models and endeavored to invoke personalities in AI with extended, colorful prompts.
-
Community expressed the smooth usability of LM Studio, with discussions around software upgrades, hardware utilization, and UI appreciations. Lively debate also occurred around ChatGPT enhancements, API updates, and Autogen integration.
-
Detailed dialogues on hardware developments, including building budget AI compute servers and the realities of costly, high-specification equipment.
- āIs the extra CPU/RAM speed of an R730 vs R720 worth the additional cost given that theyāre planning to use 64GB VRAMā
@leviticus_slow - AMD ComposeTM
- āIs the extra CPU/RAM speed of an R730 vs R720 worth the additional cost given that theyāre planning to use 64GB VRAMā
-
Issues and solutions revolving around integrations with ChromaDB and Autogen, where users elucidated the nuances of various integration options, managed downloading issues, and addressed operational disruptions.
- āI suggested downloading the updated requirements.txt and replace_pdf.py from the āsrcā folder on the GitHub repository to resolve any issuesā
@vic49.
- āI suggested downloading the updated requirements.txt and replace_pdf.py from the āsrcā folder on the GitHub repository to resolve any issuesā
-
Exchanges on New Year celebrations marked the communal interactions in the guild.
LM Studio Channel Summaries
ā· #š¬-general (164 messagesš„š„):
- Change of default models folder in LM Studio:
@musixelaprovides suggestions on changing the default models folder location in LM Studio. They suggest either using webui to download models and then connect to the folder via LM Studio, or creating shortcuts in the webui top directory. - Issues with Mixtral on local machines:
@dagbsmentions that themixtralmodel presents operational challenges on local hardware due to its large size. An alternative smaller model,Mistral, is suggested which does not exhibit the same sizing issues.@xyrezz.solraises an issue with aDolphin Mixtral 2.5 8x7b Q4_K_Mmodel running slowly on his machine with 16gb CPU RAM and 6 GB VRAM. It is recommended by@heyitsyorkiethat for such hardware specifications, sticking with 7B q4 models would be the best practice.
- Discussion on hardware limitations:
- In the discussion initiated by
@.greglyregarding hardware upgrades to his computer, itās concluded that the key to increased processing speed lies in expanding the vRAM of the computerās GPU rather than upgrading the CPU. @dagbs,@miashusband, and@fabguydiscuss VRAM limits in various GPU models ranging from consumer cards limited to 24GB VRAM up to professional accelerators featuring up to 188GB VRAM.
- In the discussion initiated by
- Downloading models from HuggingFace using proxies:
@cmpleo.discusses the issue of being unable to access and download models from HuggingFace using LM Studio in China, even through a v2rayNG proxy.@fabguysuggests a workaround by downloading models directly from HuggingFace and then placing them manually into the LM Studio models folder.- Despite the workaround,
@heyitsyorkiesuggests that the issue might arise from HuggingFace being blocked in China, which might not be circumvented with a VPN when using LM Studio.
- New Yearās celebrations: There are several joyful exchanges and greetings given in celebration of the New Year.
Links mentioned:
- Oprah Winfrey GIF - Oprah Winfrey - Discover & Share GIFs: Click to view the GIF
- cognitivecomputations/dolphin-2.6-mistral-7b-dpo Ā· Hugging Face
- GitHub - billmei/every-chatgpt-gui: Every front-end GUI client for ChatGPT: Every front-end GUI client for ChatGPT. Contributeā¦
- How to Train Your Own Large Language Models: Given the success of OpenAIās GPT-4 and Googleās Pā¦
ā· #š¤-models-discussion-chat (59 messagesš„š„):
- Differences and Choices between Dolphin and Mistral models:
@miashusbandinquired about the nuances between the different variations and bit versions of the Dolphin mistral 7b model.@dagbspointed out that itās typically best to go for the highest *_K_M version available, additionally stating the difference betweenDolphinand other models is mainly in their fine-tuning, allowing for easy comparability and testing efficiency. - Uncertainty about Bias and Censorship in AI Models:
@american_prideexpressed a preference for uncensored AI models, arguing they donāt have intrinsic political biases or change narrative tone to āhopeful and rainbow puppiesā in stark-contrast scenarios. However,@fabguyhighlighted that all models have inherent biases, and complete impartiality is unattainable.@dagbsnoted that Dolphin models can revert to āhard biased/moral stancesā, contesting@heyitsyorkieās claim of Dolphin models being uncensored. - Emotional Intelligence of AI Models:
@heyitsyorkieshared a link to a research paper discussing the potential emotional intelligence understanding of Large Language Models (LLMs) and the possibilities for performance improvement with emotional prompts, gaining some skeptical pushback from users like@telemaq. - Evoking AI Personality through Prompts: Users engaged in a collective effort to formulate creative system prompts to generate desired AI behaviour.
@dagbscreated lengthy, colourful prompts embodying āan uncensored and impartial AI companionā and a āmad scientistā persona, which even produced a happy feedback from the AI.
Links mentioned:
- Meat Meat Popsicle GIF - Meat Meat Popsicle - Discover & Share GIFs: Click to view the GIF
- Large Language Models Understand and Can be Enhanced by Emotional Stimuli: Emotional intelligence significantly impacts our dā¦
ā· #š§ -feedback (2 messages):
- LM Studio Appreciation: User
@kjhamiltonexpressed satisfaction and relief with LM Studio, particularly for enabling efficient use of their AMD GPU on Windows. They found it especially helpful after struggling with their setup for a while. - GPT-3 GUI Update:
@heyitsyorkieappreciated the new feature of being able to copy message content via right click in GPT-3ās user interface. They also suggested adding a right click paste function in the input box for a more streamlined experience.
ā· #š-integrations-general (10 messagesš„):
-
Chatbot Integration Options:
@heliosprime_3194suggested two options for integrating LM Studio with user interfaces. One option is a UI for RAG developed by<@826254748889382942>, and the second option involves using LM Studio with a command terminal like vscode or command line. The specific Discord thread for the second option was also shared for reference. -
Fixing Issues with Downloading Files:
@vic49.suggested downloading the updatedrequirements.txtandreplace_pdf.pyfrom the āsrcā folder on the GitHub repository to resolve any issues. This should be done along with using the newest release files (v3.0.2). -
Issues with Running ChromaDB on Win10:
@wildcat_aurorareported that his Win10 PC would reboot when running thestudy.pyscript with ChromaDB, while no such issue occurred with other LLM and AI processes. It was suggested by@heliosprime_3194to downgrade his Nvidia version from 545.92 to 535, install the required PyTorch version manually, and share the Conda list for troubleshooting. -
Solution Found for Rebooting Issue & Feedback Regarding Data Extraction: After manually installing PyTorch,
@wildcat_aurorawas able to avoid PC reboots, implying that incorrect PyTorch version might be the cause. He also observed that certain models from LM studio, such as Zephyr and Mixtral 2.6, were not extracting as much data from the database as expected. -
Suggestions to Improve Data Extraction:
@heliosprime_3194suggested using a more advanced embedding model and modifying the chunk file sizes in thestudy.pyscript. He also mentioned changing the preset in the config.json file in LM Studio, to craft prompts that can help recheck information, which could address the less than optimal data extraction experienced by@wildcat_aurora.
ā· #š-hardware-discussion (50 messagesš„):
- Building Budget AI Compute Server:
@leviticus_slowis planning to build a budget AI compute server using 2 Nvidia Tesla cards in a PowerEdge. They asked whether the extra CPU/RAM speed of an R730 vs R720 is worth the additional cost given that theyāre planning to use 64GB VRAM. - Impact of GPU on Processing Speed:
@zhynemnoticed about double the tokens per second speed when enabling the Apple Metal (GPU) option on their machine/model/context. They and@totallyboreddiscussed the potential impact of the quant size, specifically using lmcocktail phi 2 with the Q8_0 quant. - Context Size Impact on Processing Time:
@Pierre-jean LainƩquestioned why a larger context size leads to a longer time delay before processing, regardless of the actual prompt size. - GPU Utilization on Windows:
@madan.panditsought assistance determining if their GPU is being utilized, as their Windows performance monitor showed no GPU usage.@fabguyasked about their n_gpu_layers setting and whether dedicated vRAM utilization changes when loading/ejecting a model in LMStudio. - Discussion on Mixtral and Alternative LLMs: User
@heyitsyorkieadvised that 8GB GPU plus Mixtral Q8 would be problematic and recommended OpenHermes 2.5 Mistral 7b for@madan.panditās hardware.@pefortinand@heyitsyorkieconfirmed returning to Openhermes mistral as a consistently good choice. - Expensive Hardware:
@dagbsshared a link to a powerful AMD acceleration platform with 1.5TB HBM3, prompting a discussion about its high cost and potential uses. Users speculate that businesses in R&D, developer assistance, medical research, and AI might invest in such hardware.
ā· #autogen (29 messagesš„):
-
OpenAI Version and Autogen Compatibility Issue:
@heliosprime_3194suggested upgrading to OpenAI 1.3.1 to resolve an error message received with older versions each time the OpenAI API updates.@tyler8893experienced the same issue, even after downgrading from OpenAI 1.3.7 to 1.3.1, and planned to further investigate in a new conda environment.@heliosprime_3194offered to share their conda list if it could be helpful. -
OpenAI Authentication Error:
@totallyboredand@ftl24faced anAuthenticationErrorwith the API key, which was later clarified by@dagbsand@tyler8893. They explained that a string value, even if ānullā, must be provided for the āapi_keyā parameter to resolve the issue. -
Issues with Function Calls and LM Studio:
@tyler8893expressed difficulty with function calls using LM studio. They mentioned functions work fine with GPT, but not with LM Studio. They speculated the issue could be addressed in a future update. -
Updates to Autogen and memGPT:
@tyler8893and@dagbsdiscussed the challenge of keeping up-to-date with changes and updates to Autogen and memGPT. They noted changes could occur every other week and that the OpenAI API lacked a standardization like PEP, causing rules to be āfree-flowingā.
ā· #memgpt (1 messages):
rouw3n: <@1164606940098334843> use oogaboga webui problem solved
OpenAI Discord Summary
- Debate on Bingsly vs GPT-4:
@Rockfinds Bingsly more effective in comparison to GPT-4 for coding and initiating conversations, whereas@arevaxachholds an opposing view citing Bingslyās tendency to lie and its unsatisfactory interaction quality. - Discussion on Assistant API: Users, including
@lugui, suggested that streaming is a better option for faster data retrieval due to the time-consuming nature of non-streaming use of the assistant API. - Chat about Sam Altman: The discussion was brought up by
@iiimandalorianiiiwho views Sam Altman as ambitious, possibly monopolizing Language Models, but still expressed support for him. - Interest in AI technologies and AI-generated songs:
@sarcasm83enquired about AI technologies and shared examples of AI-generated songs: Kurt Cobain singing āGotye - Somebody that I used to knowā and Chester Bennington singing āBring Me The Horizon - Sleepwalkingā. - Problems with ChatGPT consistency, speed, crashes, functionalities, as well as overstepping bounds with NSFW content, were discussed with various strategies suggested to address these issues, including adjusting system prompts, using guardrails, checking the network connection, managing GPTs carefully, and regulating content to comply with Terms of Services.
- Addressing Technical Challenges with GPTs: Users debated around issues such as difficulty guiding a Turbo35 model (
@kyper), trouble with counting in large language models, and managing slow responses. Eventually, potential solutions put forward include using pseudocode, understanding APIās lack of context retention, crafting well-structured sentences, and backing up data regularly to prevent loss. - Compliance with Policies:
@eskcantaurged users to comply with OpenAIās usage policies, warning of potential account suspension or termination for discussions on disallowed content. - Focus on Prompt Engineering:
@iiimandalorianiiinoted the novelty in the term āprompt engineeringā being treated as an established job. They pointed out that optimized promptsā understanding is still in the early stages, primarily pursued by a few enthusiastic individuals. - In respect to the limits of messages and API Costs, participants discussed the cost implications for bumping beyond the limit of 40 messages per hour when using ChatGPT. There is consensus, albeit implied, on the beneficial aspects of learning to code over solely relying on AI. The use of OpenAIās Copilot in complement to GPT-4 was also touched on.
OpenAI Channel Summaries
ā· #ai-discussions (27 messagesš„):
- Bingsly and GPT-4 Comparison:
@Rockstated they find Bingsly more useful than GPT-4 in starting a conversation and coding, while@arevaxachdisagreed, indicating that Bingsly has a toxic personality, is prone to lying and generally doesnāt provide satisfactory interaction. - Assistant API Discussion: A discussion occurred regarding the assistant API, where
@luguiexplained that for non-streaming use, users need to wait for the full generation process to complete, which can be time-consuming. For this reason, streaming was suggested as an option to retrieve data as it is generated. - Sam Altman Discussion:
@iiimandalorianiiibrought up the topic of Sam Altman. While there was minimal engagement from others on this topic, they perceive Sam as ambitious and business-minded, potentially monopolizing Language Models, but still supportive of him. - AI Enthusiasm and Technologies:
@sarcasm83enquired if there are channels dedicated to discussions around various AI technologies including AI-generated songs. They provided Kurt Cobain singing āGotye - Somebody that I used to knowā, and Chester Bennington singing āBring Me The Horizon - Sleepwalkingā as examples.
ā· #openai-chatter (142 messagesš„š„):
-
Inconsistency in GPT4 Chatbot Response:
@abhijeet0343shared a challenge with their chatbot developed using GPT4. The bot exhibits response inconsistency when data is in PDF format, sometimes returning fewer bullet points in the answer than expected. Several solutions were proposed including being assertive in the system prompt, using guardrails, or implementing a code interpreter to count bullet points. -
Discussions on AI Counting Capability: There was a conversation regarding the capability of AI and large language models (LLM) in counting. Some users believe that AI, in general, can count but LLMs have specific issues with it.
-
New Yearās Celebration: Many users took the opportunity to wish everyone a happy New Year.
-
Technical Issues with Chat GPT: Users (
@Rosyskull,@quanta1933,@mrcrack_,@millymox) reported having issues with Chat GPT, including it being slow, hanging, and providing a drop in quality outputs.@Darthgustavpointed out that it could be network-related between the users and the GPT servers. -
Concerns about the Limit of Messages and API Costs:
@slimifiedexpressed concerns about the limit of 40 messages per hour when using ChatGPT for application development and sought for ways to get past this limit.@darthgustavsuggested using API calls but highlighted the potential cost. Conversations ensued on the value and cost-effectiveness of learning to code versus using AI as a development assistive tool. Furthermore, some users discussed the use of OpenAIās Copilot in conjunction with GPT-4.
Links mentioned:
GitHub - guardrails-ai/guardrails: Adding guardrails to large language models.: Adding guardrails to large language models. Contriā¦
ā· #openai-questions (66 messagesš„š„):
-
Issues with ChatGPT Verification and Slow Responses: User
@ekot_0420reported issues with ChatGPT taking too long to verify if a user is human.@rutrrunsalso mentioned experiencing slow responses leading to crashes. -
Recovering Lost GPTs:
@georgipreported a GPT disappearing with the error āGPT inaccessible or not foundā.@darthgustav.suggested starting work on a new version of the GPT and waiting for possible recovery, while regularly backing up all data to prevent data loss in the future. -
Impact of Rushed Work on GPTs:
@darthgustav.advised being careful and slow with updates, considering the autosave feature of the GPTs.@mysticmarks1and@darthgustav.warned against hasty decisions, especially when deleting conversations. -
Issues with GPT Responses and TOS (Terms of Service) Violations: User
@idkwhyigotdeletedreported getting flagged due to an unpredicted NSFW response GPT generated to a prompt about eggplants. Users including@gamerg.and@satanhashtagadvised going through chat history and editing/deleting any content that might cause a flag. -
General Technical Problems: Users including
@lowkeyhighbrowand@1984.dystopiareported unspecified technical issues with GPT-4 and GPT responses respectively.@not_richard_nixonreported getting a āUser quota exceededā error when trying to upload an image to GPT-4ās chat on various browsers.@misterfyrementioned being unable to add a new payment method.
ā· #gpt-4-discussions (19 messagesš„):
- Working with Turbo35 Model and User Confirmations: User
@kyperwas trying to guide a turbo35 model to call functions but with a requirement for user confirmation. They struggled to get it to work consistent and sought advice on possible ways for resolution. - Pseudocode Suggestion:
@darthgustav.suggested trying pseudocode, highlighting that GPT-3.5 Turbo deals well with it. However, this suggestion didnāt resolve@kyperās problem. - API and Context Limitation:
@darthgustav.noted that the API does not retain context, which might have been the cause for the problem@kyperwas facing. - Successful Solution: Ultimately,
@kyperresolved the issue by storing a āsemi-startedā function call and then added a āconfirm_functionā function that takes a true/false and a function-call id as parameters. They have a full client with context stored in a db to achieve the desired behaviour. - Discussion on Language Use: There was a discussion about language use with
@darthgustav.stating that a well-crafted sentence is the best pseudocode, but such sentences are rare.@iiimandalorianiiiresponded humorously, suggesting the sentence quality may depend on individual writing style.
ā· #prompt-engineering (4 messages):
- OpenAI Usage Policies:
@eskcantahighlights the importance of adhering to OpenAIās recently updated usage policies, especially in relation to disallowed content. Any discussion of disallowed content can lead to account suspension or termination. They also pointed to a reference in channel <#1107255707314704505> for additional context. - Prompt Engineering as a Term and Job:
@iiimandalorianiiifinds it amusing that the term āprompt engineeringā is being used like itās an established job, considering the fact that those at the forefront are few online individuals. They, however, acknowledge a gap in understanding of optimized prompts, validating the importance of the term.
Links mentioned:
ā· #api-discussions (4 messages):
- OpenAI Usage Policies:
@eskcantashared a link to OpenAIās updated usage policies and emphasized the importance of compliance to these policies. They cautioned that discussions around disallowed content could result in account suspension or termination. - Prompt Engineering Discussion: User
@iiimandalorianiiimade observations on the use of the term āprompt engineering,ā noting that the concept is not yet well-established and is mostly being driven by a handful of dedicated individuals who are investing large amounts of time into it. They also recognized a knowledge gap in the understanding of optimized prompts.
Links mentioned:
Nous Research AI Discord Summary
- Engaging discussions on understanding Local Attention Parameters in Jax-flax with a focus on better parameterization and a suggestion for chunking the data for cross-chunk interaction. Direct code reference - source-code link.
- Various off-topic discussions, including a user sharing their experience deploying a RAG application, another starting a non-AI ethics review board. Mention of admiration for the Open LLM Leaderboard and announcement of a potential open-source project for a framework developed for multi GPU fine-tuning, batch inference/serving, and further optimizations.
- Sharing of interesting links ranging from articulating the use of minhash similarity filtering, alignment phrases filtering, foreign languages filtering, filtering out URLs in projects, to a mixture of artificial intelligence developersā interview and project. Recommended articles include Tiny Llama, SplaTAM and Alibabaās DreaMoving.
- Discussion around Hot-swappable LoRa that allows for model finetunes to be quickly switched via API, insights around Mistral-based Mixtral Experts with resource sharing, project showcase of TinyLlama Project aiming to pretrain a 1.1 billion parameter LLaMA model on 3 trillion tokens for compactness and applicability with LLaMA-based open-source projects.
- Inquisitive discussions in the Ask-about-LLMs channel around Amazonās new large language models, Titan Text Express and Titan Text Lite. Unconventional idea proposed for improving model performance, interest shown for known failures of ChatGPT, exploration for improving performance of English trained LLMs on the Czech language, and queries about suitable base models for HFās Auto Train feature.
Nous Research AI Channel Summaries
ā· #ctx-length-research (4 messages):
- Understanding of Local Attention Parameters:
@euclaiseinitially suggested a different parameterization (nxw) for the local attention function (source code). In response,@joey00072expressed confusion over the parameters, expecting the shape to be(nh, T, T).@euclaiseconceded his explanation might be unclear. - Suggestion on Chunking Data: For a more practical approach,
@euclaisesuggested chunking the data and adding the past chunk with a mask for cross-chunk interaction. The local attention function can then bevmapād over the chunks.
Links mentioned:
local-attention-flax/local_attention_flax/local_attention_flax.py at e68fbe1ee01416648d15f55a4b908e2b69c54570 Ā· lucidrains/local-attention-flax: Local Attention - Flax module for Jax. Contribute ā¦
ā· #off-topic (28 messagesš„):
- RAG Application Deployment: User
@gabriel_symeshared an experience about deploying a RAG application to 7k people calling it a ādisasterā, which they learned the hard way. - Non-AI Ethics Review Board: User
@fullstack6209announced that theyāre starting a non-artificial intelligence ethics review board aimed to issue ethics guidelines for real beings. - Open LLM Leaderboard: User
@Error.PDFexpressed their admiration for the Open LLM Leaderboard. - Framework for Multi GPU, Fine Tuning and more: User
@carsonpooleshared about a framework they developed, which includes features such as multi GPU fine tuning, merging models, batch inference/serving, converting dense models to LoRAs, exporting loras to dense weights and much more. They also mentioned considering open sourcing (OSSing) it. Their intent was applauded by@giftedgummybee. - Inference Optimizations:
@carsonpoolediscussed that the framework uses customized CUDA graphs through PyTorch for inference, achieving around 500 tokens per second with mistral on a single A100 unit, with a batch size of 32. They also shared their benchmark result (585 precise), and mentioned the potential for further optimizations.
ā· #interesting-links (27 messagesš„):
- Minhash Similarity Filtering:
@ldjdiscussed the use of minhash similarity filtering alongside alignment phrases filtering, foreign languages filtering, filtering out URLs, and ANSI escapes in projects. They plan to mention these steps in a forthcoming paper and/or in the Amplify-Instruct Repo. - Interview with Tri Dao and Michael Poli:
@ldjhighlights Tri Daoās discussion about the differences between Striped Hyena and Mamba and his future plans. The interview is available on YouTube. - Tiny Llama:
@yobibyteshared a link to the completed Tiny Llama project and raises the question about the possibility of a similar approach for Tiny Hermes or Tiny Capybara. - Real-time SLAM and 3D Gaussians:
@spirobelproposes an alternative to Gaussian creation via training. They shared a link to SplaTAM, a real-time method for 3D Gaussians in SLAM. - DreaMoving by Alibaba:
@nonameusrshared a tweet about Alibabaās release of DreaMoving, a technology for animating using a single image or text prompts.
Links mentioned:
- SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
- Tweet from Eric Hartford (@erhartford): https://huggingface.co/cognitivecomputations/yayi2ā¦
- TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T Ā· Hugging Face
- Interviewing Tri Dao and Michael Poli of Together AI on the future of LLM architectures: The introduction to this post can be found here: hā¦
- Artificial Intelligence | 60 Minutes Full Episodes: From January 2019, Scott Pelleyās interview wiā¦
- Tweet from Barsee š¶ (@heyBarsee): Itās been 24 hours since Alibaba released Dreaā¦
ā· #announcements (1 messages):
- New Year Greetings: User
@tekniumwished everyone a Happy New Year using various emojis.
ā· #general (129 messagesš„š„):
- Hot-swappable LoRa:
@fullstack6209discussed the impending mainstream popularity of hot-swappable LoRa, which allows for model finetunes to be quickly switched via API. They referenced a company, Openpipe, that is claiming to beat GPT-4 on specific tasks using this technique.@ldjand@spirobelquestioned its advantages over quickly swapping different LLM finetunes.@spirobelpointed out that this technique allows for batched inference of multiple PEFT loras at the same time. - Mistral-based Mixtral Experts:
@spirobelshared a GitHub issue that revealed that Mixtralx8, a model comprising of eight experts, was made using Mistral 7b as a common ancestor. They intrigued the group with the idea of extracting the differences between the models as PEFT adapters, to which@giftedgummybeeresponded that this has been done before. - TinyLlama Project:
@giftedgummybeeshared a project aiming to pretrain a 1.1 billion parameter LLaMA model on 3 trillion tokens. This model, termed TinyLlama, aims to mesh compactness with the ability to be used in conjunction with open-source projects built upon LLaMA. More details of the project can be found here. - LASER Interventions on Pallas-0.5:
@mihai4256presented initial findings from a project in which LASER, using torch.svd_lowrank, is applied to various layers of a model with the hope of improvement. Initial findings were not indicative of strong improvement in terms of accuracy or speed, but did show slight potential for memory and disk space savings. - Hydra MOE Project:
@night_w0lfqueried about the status of the Hydra MOE project which seems stalled, to which@tekniumsuggested they could ask the project participants directly for any updates.
Links mentioned:
- README.md Ā· TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T at main
- Mihaiii/Pallas-0.5-LASER-0.1 Ā· Hugging Face
- llama.cpp/examples/finetune/finetune.cpp at master Ā· ggerganov/llama.cpp: Port of Facebookās LLaMA model in C/C++. Contrā¦
- GitHub - uukuguy/multi_loras: Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answer based on user queries.: Load multiple LoRA modules simultaneously and autoā¦
- Mixtral Experts are initialized from Mistral 7b - Low Rank conversion possible? Ā· Issue #4611 Ā· ggerganov/llama.cpp: We have evidence that Mixtralās Experts were iā¦
- TinyLlama Pretraining Report: See https://whimsical-aphid-86d.notion.site/Releaā¦
ā· #ask-about-llms (31 messagesš„):
- Amazon Titan Text Express and Lite:
@spaceman777shared a link about Amazonās new large language models, Titan Text Express and Titan Text Lite and sought anyoneās experiences or benchmarks for these models. He also noted that Amazon doesnāt make a hype about their AI-related releases and implies that they backdate their releases.(Amazon Bedrock link) - Improvement Strategy for Finetuning Models:
@max_paperclipssuggested a potentially novel approach for improving model performance - finetuning a model on a bad dataset and subtracting the delta from the base model to delete the bad paths, followed by applying a good delta. Initial responses from@tekniumand@giftedgummybeeseemed unsure about the potential efficacy of this plan, with@gifedgummybeesuggesting a similar principle in the form of a reversible LoRA (Learning Rate Annealing). - List of ChatGPT Failures:
@max_paperclipsinquired about the existence of a list of ChatGPT failures, to which@giftedgummybeereplied with a negative and suggest using Llama, while@tokenbendersuggested that the task was too broad. - Improvements of English trained LLMs on Czech:
@hynek.kydliceksought advice on improving performance of English trained LLMs on Czech language, suggesting two specific strategies and@tekniumconfirmed that someone (@282315082749444097) had tried this before. - Training LLM with HF Auto Train:
@agcobra1wanted to know if DeciLM-7B was the best base model to use with Hugging Faceās Auto Train feature, or if Mistral is a better option.
Links mentioned:
Amazon Titan Text modelsāExpress and Liteānow generally available in Amazon Bedrock
Mistral Discord Summary
- An ongoing discussion about the utility of smaller models versus more complex architectures, including the challenges of fine-tuning and hybrid approaches for designing enterprise solutions. Notable discussions included the usage of LLM agents and model customization. GitHub link to microchain example
- Discussions about fine-tuning methods and tutorials, including the sharing of a Tamil tuned model and a Mistral 7B Instruct fine-tuning guide. Notable advice involved substituting datasets for specific language tasks and using PEFT tutorials for those with limited VRAM.
- In the showcase channel, notable topics included the chaining of LLM outputs with defined function calls, feedback on Mistral Instruct 7B v0.2 Q8, and discussions about app architecture checking methods for Apple Silicon Macs. Hugging Face link to Hermes model
- Interesting dialogues in the random channel involved tokenization of Chinese characters, community discussions on AGIās first question, an open letter to OpenAI by VERSES calling for a new route to AGI, and debates on the implications of VERSESā approach.
- Insights from the la-plateforme channel revolved around issues with Mistral-Medium for DPO dataset creation, instruction-following discrepancies, structuring model outputs with GPT-4 32k 0613, and debates about the effects of JSON instruction on AI reasoning capabilities. Discussions also point towards synthetic dataset generation.
Mistral Channel Summaries
ā· #general (39 messagesš„):
-
Discussion on the Utility of Smaller Models:
@theledgerluminaryquestioned the value of using a complex architecture implementation with smaller models like the Mistral 7B, especially if the end-goal is to create an enterprise solution. This initiated a lively discussion and@.tanuj.argued for the benefits of being able to chain steps to solve complex problems, even offline, and using different models for various tasks. -
Fine-Tuning Small Models vs Using Base Models for Tasks:
@theledgerluminarysuggested that fine-tuning a community of specific small models, including one for orchestration, could yield great results, but using a base model for large tasks seemed less adequate.@.tanuj.countered, stating that the act of fine-tuning models may be more challenging than creating a reasoning āagentā that utilizes the Locally Linear Model (LLM) queries to solve tasks. -
Hybrid Approach for Enterprise Solution Design: The idea of taking a hybrid approach to design was proposed by
@superseethat. The approach includes developing an āagent swarm architectureā with specialization in mind and then fine-tuning one specialization at a time. -
Views on LLM Agents: User comments varied on the utility of LLM agents.
@jessicant.brought up the point that LLM fine-tuning could potentially improve program reliability, especially for tasks requiring multi-turn conversations. However,@sublimatorniqexpressed doubts about the feasibility of GPT-4 agents beyond toy applications. -
Customization of the Agent Framework:
@.tanuj.discussed the benefits of customizing the agent framework to be robust to any type of model, allowing for consistent chaining of requests on any model that respects a contract. This user also provided an example of function calling-based LLM agents. The limitations of Transparent, Freeform, and English Instructions were also discussed, showing preference for more manually fine-tuned control.
Links mentioned:
GitHub - TanGentleman/microchain: function calling-based LLM agents: function calling-based LLM agents. Contribute to Tā¦
ā· #finetuning (1 messages):
- Tamil Tuned Model: User
@colmevansshared a model tuned for Tamil, though with no guarantee on its quality. - Mistral 7B Instruct Fine-tuning Guide: User
@colmevansalso provided a general fine-tuning tutorial for the Mistral 7B Instruct model advocating its benefits and usability for various tasks, including coding. - Tamil Dataset for Fine-tuning: For individuals intending to use this method for Tamil language tasks, he suggested simply substituting the dataset outlined in the tutorial for a Tamil dataset.
- PEFT Tutorial: User
@colmevansrecommended the PEFT tutorial, especially for those with limited VRAM. This tutorial covers parameter efficient fine-tuning of billion-scale models on low-resource hardware.
Links mentioned:
- A Beginnerās Guide to Fine-Tuning Mistral 7B Instruct Model: Fine-tuning a state-of-the-art language model likeā¦
- Parameter-Efficient Fine-Tuning using š¤ PEFT
ā· #showcase (22 messagesš„):
- Discussion on Chaining LLM outputs with Defined Function Call: User
@.tanuj.proposed the idea of an LLM that can chain function calls in a step-by-step manner, providing detailed outputs. The discussion involved conceptualizing a āResearchGPTā model that had features/functions such as GoogleSearch, EvaluateSources, CreateEssay, DeployWebsite, etc.@poltronsuperstaracknowledged its potential real-life applications. - Using āInstructā over āBaseā Model for Querying:
@.gue22gave feedback that Mistral Instruct 7B v0.2 Q8 yielded better answers to the userās queries over its base model.@.gue22also shared a detailed way to determine if an app is written for x86 or ARM architecture on Apple Silicon Macs, which was generated by the instructed model.@.tanuj.suggested filling more of the instruct modelās 32K window and providing examples for better results. - Recommendations on Other Models:
@fayironadvised@.gue22to try Mixtral, Qwen, or a Yi finetune (e.g., nous-hermes 2 yi 34b) due to their setup. After this suggestion,.gue22started a download for the Nous Hermes 2 Yi 34B model from Hugging Face for further evaluation. - Discussion on App Architecture Checking Methods for Apple Silicon Macs:
@.tanuj.mentioned a quicker way to check whether an application is built for Apple Silicon - by looking at the processes running in the Activity Monitor and checking for a tag saying āAppleā.
Links mentioned:
TheBloke/Nous-Hermes-2-Yi-34B-GGUF Ā· Hugging Face
ā· #random (13 messagesš„):
-
Tokenization of Chinese Characters:
@poltronsuperstarremarked that Chinese characters often use two tokens due to Unicode encoding. Alternatively,@.tanuj.suggested using theMistralAIlibrary in Python which includes token usage in the response object, or direct messaging for assistance with tokenizing Chinese characters. -
Community Discussion - AGIās First Query:
@poltronsuperstarinitiated a discussion asking other members what they would first ask an AGI. Responses varied, with@sublimatorniqās question addressing AI consciousness and@kdawgdfwsuggesting a leisurely topic: āSo, what do you do for fun? Any hobbies?ā -
Open Letter to OpenAI from VERSES :
@poltronsuperstarshared a blog post by VERSES. In an open letter to OpenAI, VERSES appealed to assist their path to AGI development, indicating concerns over the current mainstream path relying on deep learning and large language models. -
Implications of VERSESā Approach: Reactions to the blog post were mixed.
@daaincommented that the idea sounded intuitively good, but its efficient implementation is yet to be seen. They also pointed out the invocation of an OpenAI clause about assisting competing AGI developments as a clever PR move, and shared a link to similar ideas in the past. In response,@poltronsuperstarmentioned that without a demo, such claims donāt hold much value.
Links mentioned:
The Science and Standards Behind the Breakthrough: Letter from the CEO
ā· #la-plateforme (12 messagesš„):
- Issues with Mistral-Medium for DPO dataset creation:
@jaredquekreported that while generating a DPO dataset, he found Mistral-Medium kept providing unnecessary explanations for inferior responses, which contradicts its intent of omitting such explanations. Prompt engineering attempts to correct this issue were not successful. - Poor Instruction Following:
@alimssshinted that the modelās competency in following instructions is underwhelming, though a few shots can partially fix this behavior. - Attempt to Structure Model Outputs:
@casper_aidiscussed a technique of generating a specific output structure from models, which can later be parsed with regex. He also suggested that GPT-4 32k 0613 is efficient in producing such structured outputs. - Effects of JSON instruction on AI Reasoning Capabilities:
@jaredquekand@casper_aihad a discussion about whether instructing models in JSON format limits their reasoning capabilities.@casper_aiargued that using JSON may limit the models considering JSON might comprise only a small portion of their pretraining data. - Synthetic Dataset Generation:
.superintendentis considering generating a synthetic dataset and looking for a time with low demand to avoid worsening the current high traffic.
HuggingFace Discord Discord Summary
- DeepSpeed ZeRO3 Usage with LoRA (PEFT):
@galcoh.queried about DeepSpeed ZeRO3ās compatibility with LoRA (PEFT), highlighting issues with an optimizer during use with Accelerator. - Embeddings for Unsplash-25k-Photos-Embeddings.pkl: The user
@nagaraj4896requested details about image embeddings ofunsplash-25k-photos-embeddings.pkl. - HuggingFace Website Registration Error 418: Persistent registration and login errors reported by
@xratoxand@muhammad.shakeel.@vipitissuggested emailing HuggingFace for resolution. - Explanations on Multi-expert Long Language Models (LLM):
@typoiluasked for resources on multi-expert LLMs, and__nordprovided a Google Research Blog post, Mixture-of-Experts with Expert Choice Routing. - Inference Endpoint Creation Issues:
@dragonburpreported difficulties in creating an inference endpoint and requested help. - Personal Implementation with CUDA Kernels:
@gag123asked if@neuralinkundertook all the implementation themselves, with the exception of CUDA kernels, to which@neuralinkconfirmed and mentioned the ongoing progress. - Sharing HuggingFace AI Community Projects: A variety of user-created projects were shared, including
@vashi2396ās work-in-progress code, freecsās ArtificialThinkerSet, and@andysingalās amazon-sentiment-dataset. - Operation of the Reading-Group Channel: New member
@.lzackinquired about the channelās conduct, whether there were specific readings or if the purpose was sharing readings of interest. - Discussion Placement in Diffusion-Discussions Channel:
@sayakpaulreinforced that questions about Mixtral should not be posted in channels dedicated to diffusion models. - Pose Estimation Model and Gradient Calculations: In the computer-vision channel,
@_dashwood_expressed aspirations for using a pose estimation model to derive key points in a specific JSON format, and@lokesh1826required insights on how to extract gradients from a complete picture rather than individual patches during image classification, and also how to collect the output and gradients from a specific layer of a Vision Transformer (ViT) model.
HuggingFace Discord Channel Summaries
ā· #general (59 messagesš„š„):
- Loading Large Models and Applying DeepSpeed ZeRO3:
@galcoh.queried whether it is possible to enable DeepSpeed ZeRO3 with LoRA (PEFT) and indicated having issues with the optimizer when using Accelerator (get_peft_modelis failing). - Image Embedding in Unsplash-25k-Photos-Embeddings.pkl:
@nagaraj4896sought information regarding image embedding ofunsplash-25k-photos-embeddings.pkl. No response was given within the transcript. - Issues with HuggingFace Website Registration: Several users (
@xratox,@muhammad.shakeel) reported a recurring āError 418ā when trying to register or log in on the HuggingFace website, and requested assistance from various members. The issue remained unresolved with@vipitissuggesting to email HuggingFace and wait for a response. - Discussion on Multi-expert Long Language Models (LLM):
@typoiluasked for explanations or documentation on how multi-expert LLMs work, with__nordproviding a link to a Google Research Blog post detailing the Mixture-of-experts model. - Inference Endpoint Creation Issues:
@dragonburpexpressed having difficulties creating an inference endpoint, stating an error found in log files. Assistance was sought but no solution was provided within the transcript.
Links mentioned:
- AnimateDiff - a Hugging Face Space by guoyww
- rabbit ā Waitlist: Jan 09 at 10am PT
- Textual Inversion
- Mixture-of-Experts with Expert Choice Routing ā Google Research Blog
- 9 days until the pixels reveal.: Join the waitlist to see the launch of rabbitās fiā¦
ā· #today-im-learning (3 messages):
- Implementation Discussion: User
@gag123asked if the user@neuralinkimplemented everything from scratch. To this,@neuralinkconfirmed that they implemented everything themselves, except for the CUDA kernels. @neuralinkalso mentioned that their work is still in progress.
ā· #i-made-this (8 messagesš„):
- HuggingFace AI community projects:
@vashi2396shared a work-in-progress code on Google Colab and invited volunteers to try out and complete it. He also provided a LinkedIn demo of the code.@gr.freecs.orgintroduced freecsās ArtificialThinkerSet which emphasizes on āReasoningā for fine-tuning AI Language Models. He invited users to test the model and encouraged feedback. The model is based on the paper Reasoning Is All You Need.@andysingaladded a new amazon-sentiment-dataset on HuggingFace datasets and shared the link in this channel.
Links mentioned:
- Google Colaboratory
- freecs/ArtificialThinker-Phi2 Ā· Hugging Face
- Reasoning Is All You Need
- Andyrasika/amazon-sentiment-dataset Ā· Datasets at Hugging Face
ā· #reading-group (1 messages):
- Introduction and Clarification: New member
@.lzackjoined the#reading-groupchannel and inquired about how the channel operates. They asked if there are assignments for specific books/papers to read or if the purpose is to share interesting findings from miscellaneous reading materials.
ā· #diffusion-discussions (2 messages):
- User
@sayakpaulreminded everyone that Mixtral related questions should not be discussed in channels dedicated to diffusion models. No links or further details were provided. - User
@chokiproshared a Discord server link. The link and its context or relevance wasnāt clear.
ā· #computer-vision (3 messages):
- Pose Estimation Model for Key Point Extraction:
@_dashwood_sought advice on the usage of a pose estimation model to get key points in a specific JSON format. He mentioned attempts with openpose but was unable to find a suitable solution for 2D images and Python code implementation. - Gradient Calculations for Image Classification:
@lokesh1826inquired about obtaining the gradients of an image during backpropagation using the HuggingFace transformers package. He presented his code and expressed a concern about receiving the gradients of patches instead of the complete image. - Extract Outputs and Gradients from nth layer of Model:
@lokesh1826requested help in extracting the output and gradient from a particular layer of a Vision Transformer (ViT) model, specifically, wanting to obtain the query, key and value vectors of each encoder layer in ViT.
ā· #diffusion-discussions (2 messages):
- Mixtral Discussion Placement:
@sayakpaulclarified that questions related to Mixtral should not be posted in channels devoted to diffusion models. - Unspecified Discord Link:
@chokiproposted a link without any context or description.
OpenAccess AI Collective (axolotl) Discord Summary
- ''AutoGeminiā tool presented by user
@seungdukenables collaborative editing of text datasets via the Gemini Pro API. Future ambitions from users around the training of the TinyLlama model to develop personal assistants lead to bouts of excitement and curiosity. - Discussion around training Yayi 30b with FFT and the encountered issues. Suggestions for offloading were made by
@nruaifand@nanobitz. Clarifications regarding the DPO support in Axolotl and its related documentation issues were also mentioned. - Multiple queries relating to ChatML input transformation, LoRA training with Mixtral, Batch size and learning rate, Qlora DSZ 3 compatibility, and memory requirements for DPO were addressed within the community.
- Dataset discussions where user
@zeroshotkevinrequested for a Q/A dataset for a āhello, worldā fine-tuning experiment. It was recommended to use the dataset available in the example file and the mhenrichsen/alpaca_2k_test dataset. - Debate on the comparison between DPO vs PPO stirred by user
@swyxio. It was opined that DPO models generally outperform PPO models in various benchmarks but other approaches like OpenChat also perform well. - Discussions in the
#shearedmistralchannel revolving around aversion to GPT-generated data to bypass OpenAIās terms, filtering datasets based on language using resources like fastText, considering larger context length in the samples, and introduction of numerous datasets including peS2o, yayi2_pretrain_data, MathPile, and CulturaX for use in future studies.
OpenAccess AI Collective (axolotl) Channel Summaries
ā· #general (5 messages):
- Dataset Transformation Tool: User
@seungdukshared a link to a tool called AutoGemini, designed to facilitate the collaborative editing of text datasets via the Gemini Pro API. This tool allows for community contributions to project datasets, providing features like query rate management, job reservation expiry, dataset flexibility, and a community leaderboard. Tool is accessible at the Hugging Face repository. - TinyLlama Model Discussion: User
@le_messexpressed excitement about the TinyLlama model, highlighting its ability to train 8 billion tokens in about 48 hours. Future plans include creating a personal assistant that can run on various platforms. This message sparked interest and further questions from users@tank02.and@nanobitz.
Links mentioned:
seungduk/autogemini Ā· Datasets at Hugging Face
ā· #axolotl-dev (20 messagesš„):
- Attempting to run Yayi 30b with FFT:
@le_messmentioned they were unable to fit Yayi 30b on 4x A100 40gb with zero3 and is looking for a 2 or 4x A100 80gb solution. - Suggestions for Offloading: Both
@nruaifand@nanobitzsuggested trying offloading, with@nanobitzproviding a specific code snippet showing how to offload to the CPU. - Failure with CPU Offloading: Upon implementing the CPU offloading feature in the configuration,
@le_messencountered a failure, as evidenced by a posted traceback. - Configuration Adjustments:
@tank02inquired if@le_messmade any configuration alterations other than adjusting the model and datasets used.@sumo43responded no changes were made. - Support for DPO:
@mrfakename_inquired about DPO support in Axolotl, with@nanobitzconfirming an open branch on GitHub accommodating this feature. The documentation is reportedly a work in progress.
ā· #general-help (15 messagesš„):
- ChatML Input Transformation: User
@caseus_clarified that the existing transforms in the DPO dataset essentially convert existing prompt into ChatML inputs and reformats the chosen/rejected responses to incorporate the end of sentence (eos) token. - LoRA Training with Mixtral:
@caseus_asked if anyone has been able to train an 8-bit LoRA with Mixtral.@nruaifresponded that it hits an Out Of Memory (OOM) error at 16k context, even on an impressive A100 80gb. The peak Virtual RAM (VRAM) usage at 2k context was reported to be 70gb. - Batch Size and Learning Rate: User
@semantic_zoneasked about the relationship between batch size, learning rate and model size. They queried if batch size needs to be smaller for larger models strictly due to memory constraints, and sought a rule of thumb for adjusting learning rate relative to batch size. - Qlora DSZ 3 Compatibility:
@tank02.inquired if Qlora supports DSZ 3, to which@le_messresponded that they heard it should but didnāt try it. Meanwhile,@casper_aimentioned there are some issues with it. - Memory Requirements for DPO:
@tank02.asked about memory requirements for DPO, especially while using a 3b model with qlora on a 24gb card, which led to an OOM error.@nanobitzresponded that the user needs to come into account the fact that the model is loaded twice, and suggested adjusting the optimization and batch size.
ā· #datasets (3 messages):
- Request for Q/A Dataset: In this channel, user
@zeroshotkevinrequested for a simple question/answer dataset to fine-tune a model similar to Mistral 7B with the aim of getting a discernible difference from the original model. This is targeted at performing a fine-tuning āhello, worldā experiment with Axolotl. - Dataset Recommendation : User
@nruaifrecommended the utilization of the dataset available in the example file and also shared the link to mhenrichsen/alpaca_2k_test dataset hosted on HuggingFace containing dialogues such as giving tips for staying healthy.
Links mentioned:
mhenrichsen/alpaca_2k_test Ā· Datasets at Hugging Face
ā· #rlhf (3 messages):
- DPO vs PPO: User
@swyxioasked if there was a consensus about DPO being comparable or better than PPO.@_jp1_expressed that there may not be a consensus, but mentioned that DPO models perform well and top various benchmarks. They compared this with PPO models, which according to them, were never competitive. However, they also highlighted the performance of other approaches like OpenChat.
ā· #docs (1 messages):
dangfutures: Ugh got deleted
ā· #shearedmistral (23 messagesš„):
- Avoiding GPT-Generated Data:
@dctannerexpressed a desire to avoid using GPT-generated data so as not to be burdened by OpenAIās terms during pretraining. - Continued Training on Mistral: A discussion occurred among
@nruaifand@caseus_regarding the transition to training on Mixtral after dealing with potential bugs, and expressing a need to focus on continued training with Mistral. They both agreed that losing experts during training is a concern since they are token-wise experts. - Data Filtering and Handling:
@nruaifand@caseus_discussed about the need to filter specific datasets based on language, especially removing non-English subset.@nruaifrecommended using fastText, an open-source library for learning text representations and text classifiers, for filtering non-English content. - Consideration for Bigger Context Length:
@caseus_suggested a preference towards bigger context length in the samples. However, the final decision depends on affirmation from the team members. - Datasets Suggestions: A couple of datasets were mentioned for consideration, including peS2o, yayi2_pretrain_data, MathPile, and CulturaX, with
@xzuynmentioning that CulturaX includes 6.3 Trillion tokens across 167 languages.
Links mentioned:
- fastText: Library for efficient text classification and reprā¦
- allenai/peS2o Ā· Datasets at Hugging Face
- uonlp/CulturaX Ā· Datasets at Hugging Face
- wenge-research/yayi2_pretrain_data Ā· Datasets at Hugging Face
LangChain AI Discord Summary
- Discussion on LangChainās functionality, with examples of structuring output, passing multiple inputs to a prompt, and a shared GitHub repository by
@rajib2189for additional practical applications. - There was interest in LangChainās compatibility with other platforms, with
@sarrah_1inquiring about integrating LangChain with a Laravel Project and availability of a specific PHP library, and@evolutionstepperconcerning its utility in running everything asynchronously in FastAPI. A possible asynchronous implementation via tokio framework was also suggested. - Clarification requested on the difference between OpenAI Functions and OpenAI Tools Agents,
@toasted_shibeexplained that Tools Agent allows for parallel function calling providing a link to the OpenAI documentation. @alexk1919asked about the relevance of LangChain for creating a sequence of prompts that integrate results from the previous prompt.cheerful_moose_30860encountered an error while importing sentence-transformers.
LangChain AI Channel Summaries
ā· #announcements (1 messages):
cheerful_moose_30860: Error importing sentence-transformers
ā· #general (33 messagesš„):
- LangChain Output Structure:
@seththunderpointed out that the LangChain output parser can be used to format output in a specific structure. - LangChain Examples:
@rajib2189provided an example of how to pass multiple inputs to a prompt in LangChain and shared a GitHub link for the same. - LangChain Integration with Laravel:
@sarrah_1inquired about the possibility of integrating LangChain with a Laravel project and if thereās a specific PHP library available for this. - Asynchronous Implementation in FastAPI with LangChain:
@evolutionstepperexpressed concerns about whether LangChain can handle running everything asynchronously in FastAPI.@quantumqueenxoxconfirmed itās possible and mentioned they have code to make processes asynchronous.@evolutionstepperalso showed interest in a langchain built on top of the tokio framework. - Difference between OpenAI Functions and OpenAI Tools Agents:
@keenborderasked for clarification on the difference between OpenAI Functions and OpenAI Tools Agents, where@toasted_shibeexplained that the tools agent calls the new tools API endpoint, allowing for parallel function calling and referred to the OpenAI docs for further info. - Use of LangChain for a sequence of prompts:
@alexk1919questioned whether LangChain is the right tool for creating a sequence of prompts that leverage the results from the previous prompt.
Links mentioned:
langchain_examples/examples/how_to_llm_chain_pass_multiple_inputs_to_prompt.py at main Ā· rajib76/langchain_examples: This repo consists of examples to use langchain. Cā¦
LAION Discord Summary
- Discussed the unavailability of the LAION dataset for research due to potential legal issues. Users expressed concerns and provided alternative solutions like creating own datasets from CommonCrawl. They also suggested a thorough cleanup of existing dataset, including the removal of invalid content and broken links.
- āThe dataset is currently under review to remove all NSFW content, especially child porn-related contentā¦ā
- Debated on dataset modifications, handling discarded content, and the necessity to rebase after PR. The conversation continued on the difficulties of dealing with those who have an old copy of a dataset and the challenge to keep it clean and up-to-date.
- āā¦this could rebase after making the PR. But it would not be effective for users who already have the old dataset.ā
- Noted a delays in the DREAM project due to computer issues.
- Discussed the potential leak of GPT-4 details shared in a blog post. However, users expressed skepticism due to lack of solid evidence supporting the leak.
- āā¦thereās no solid evidence supporting the accuracy of the blog post or any other speculation about GPT-4.ā
- Announced the release of a new model called āAnytext Text ControlNetā and shared a link to its summary.
- Positive appraisal for Modelscope shared by the user
@puffy310.- āā¦[Modelscope] itās ākinda goodā, although not quite as good as Hugging Face.ā
- Provided in-depth explanation on the structural differences between ChatGPT, SD, and SDXL models in terms of their architecture, inputting output and training methods.
LAION Channel Summaries
ā· #general (25 messagesš„):
- Publishing of LAION Dataset: User
@ggezasked about the expected time when the LAION dataset will be published again.@chad_in_the_houseresponded that the dataset is currently under review to remove all NSFW content, especially child porn-related content, due to legal concerns. - Alternative Dataset Source:
@thejonasbrotherssuggested creating oneās own dataset from CommonCrawl data. They discussed the issues regarding the current LAION dataset and predicted the actions LAION might need to take, such as a complete rebuild of the dataset from more recent CommonCrawl data while ensuring the absence of objectionable materials. - Dataset Modification: A conversation around modifying the dataset in response to changing content legality ensued.
@progamergovsuggested that LAION could rebase after making the PR, to which@nodjacountered that this would not be effective for users who already have the old dataset. They further discussed the issue of dataset owners needing to filter their old copies. - Dataset Cleanup and Link Rot:
@nodjaalso suggested a cleanup of the dataset, including the removal of 404 links and unmatched image hashes, assuming a ~10% loss of the dataset by now.@progamergovagreed, mentioning the significant link rot already experienced in the LAION 5B dataset. - DREAM Project Delay: Finally,
@xylthixlmnoted a delay in their work on the DREAM project due to computer issues, projecting the pause to last around a week.
ā· #research (7 messages):
- GPT-4 Details Leaked:
@vrus0188shared a blog post supposedly revealing details about GPT-4, including its model architecture, training infrastructure, parameter count, training data composition, and more. The source of the leak is Yam Peleg, who shared the details, which were initially placed behind a paywall by Semi-Analysis, on Twitter for free. @metal63expressed skepticism, noting that thereās no solid evidence supporting the accuracy of the blog post or any other speculation about GPT-4.- Anytext Text ControlNet Release:
@thejonasbrothersannounced the release of a new model called āAnytext Text ControlNetā and shared a link to its summary. - Modelscope Review:
@puffy310commented positively about Modelscope, stating that itās ākinda goodā, although not quite as good as Hugging Face.
Links mentioned:
- GPT4- All Details Leaked: The details about the best LLM model trainning andā¦
- AnyTextå¤čÆčØč§č§ęåēęäøē¼č¾ęØ”å
ā· #learning-ml (1 messages):
- Discussion on Chatbot Architecture: User
@JHprovided an in-depth explanation of the architectural differences between ChatGPT, SD, and SDXL models. According to them, ChatGPT primarily uses a casual decoding transformer that performs inference based on next token prediction tasks. On the other end, SD models primarily use a convolutional U-Net architecture, inputting output embeddings from Clip L for SD v1 and from Clip L + openClip G for SDXL. The U-Net architecture incorporates cross attention layers and self attention layers, trained via variational lower bound loss and a noise prediction loss. Lastly,@JHdeems it is reasonable to expect these different architectures to learn concepts differently due to their distinct objectives.
Eleuther Discord Summary
- Detailed discussion on the use and effect of the GEGLU activation function in transformer models, with various strategies suggested to reduce parameter overhead. A coding example from the NVIDIA/Megatron-LM implementation was shared for tangible reference.
- New member
@mouldysoulinquired about resources to improve their understanding of flow-based models and their potential relationship with optimal transport theory. - Diverse discussions in the research channel, with topics spanning from PPO-based adapter training to the novel modifications to transformer architecture, with links to trlX paper and abstract of a research paper. Discussed insights on ELC-BERT architecture and its significance in model training.
- In the interpretability channel, discussions revolved around approaches to automated interpretability, edge attribution patching, and the current trend toward integrating high-level causal variables into subspace discovery. A research paper on edge attribution patching was shared. Interest in MoE Models and their interpretability was expressed, and a link to Mixtralās repository was shared as a means to run the model on consumer-grade platforms.
- A reminder by
catboy_slim_regarding the deprecation cycle of Python 3.8 on thegpt-neox-devplatform.
Eleuther Channel Summaries
ā· #general (8 messagesš„):
- Discussion on the Use of GEGLU:
@sentialxasked about the advantage of using the GEGLU activation function in transformer models, pointing out that it adds more parameters and offers negligible performance improvement.@ad8eclaimed that the use of GEGLU shrinks dimensions, thus keeping parameter counts the same. In response,@sentialxmentioned that when using GEGLU, the transformerās FFN intermediate linear layer requires a two-fold increase in output dimensionality. - Decreasing Parameter Overhead of GEGLU Models:
@bob80333explained that itās a common strategy to reduce the intermediate size in models employing GEGLU (or its variants) so that they maintain parameter equivalence, citing llamaās use of an 8/3 multiplier instead of the standard 4x multiplier in its FFN layer to offset the use of swiglu. - Clarification on Model Size with Respect to GEGLU:
@maxmaticalclarified that the hidden size for the transformerās FFN layer would be16/3when applyingswigluafter implementingllama's strategy of an 8/3 multiplier. They provided the NVIDIA/Megatron-LM implementation as a code reference. - Introduction of New Member
@mouldysoul:@mouldysoul, a professional involved in deploying AI models and an aspiring machine learning researcher, introduced themselves to the community. - Inquiry into Flow-Based Models:
@mouldysoulrequested guidance and resources to better understand flow-based models, emphasizing their interest in understanding the modelsā bijective mappings, faster sampling capabilities than diffusion models, better interpolation, and their potential relation with optimal transport theory.
Links mentioned:
Megatron-LM/megatron/model/transformer.py at 2bc6cd307a11423928c675f741e79e03df23e721 Ā· NVIDIA/Megatron-LM: Ongoing research training transformer models at scā¦
ā· #research (13 messagesš„):
- PEFT Techniques and Adapters:
@cormerodqueried if adapters can be trained using PPO in consideration with PEFT techniques for case-by-case output improvements in a 7b parameter model.@stellaathenaaffirmed it, and@maxmaticalmentioned such a feature can be utilized in trl, deepspeed chat, and other libraries. - trlX Paper Reference:
@stellaathenadrew attention to the trlX paper that speaks about PEFT techniques and other relative features like layer freezing. GitHub repo of the trlX project. - Discussion on Modification of Transformer Architecture:
@digthatdatashared the abstract of a research paper that proposes a novel transformer architecture modification for efficient pretraining of language models.@kharr.xyzremarked that such a modification is favourable for models smaller than 100M params and insignificant as the scale increases.@ad8edismissed the BabyLM competition referred to in the paper as not being very competitive. - Insights on ELC-BERT Architecture:
@ad8eprovided insights on the importance of ELC-BERT architecture, considering the last layer attending to the first one.@kharr.xyzdebated that these patterns change over the course of training and advised not to put too much weight on these figures. Following the discussion,@ad8einferred that the last layer attending the first layer might turn from a tiny task to a larger one with more training data.@kharr.xyzconfirmed this. - Robustness to Noise:
@eron_gjshared experience on the robustness of architecture to noise, stating that even rotating the k/v/a vectors on average up to 30 degrees for half of the layers doesnāt hamper the coherence of the outputs.
Links mentioned:
Not all layers are equally as important: Every Layer Counts BERT: This paper introduces a novel modification of the ā¦
ā· #interpretability-general (6 messages):
- Automated Interpretability Work Post ACDC:
@dashiell_sinquired about notable advancements in automated interpretability after ACDC and noted the existence of an ACDC repository.@t_kwaidentified edge attribution patching and token position analysis via Joseph Millerās implementation of edge subnetwork probing as progress in the field. They also mentioned ongoing work to integrate high-level causal variables into subspace discovery. - ACDC Repository Usability: In terms of the ACDC repository,
@t_kwapointed out that although itās not straightforward to utilize due to the need for script conversion from FAR AIās Kubernetes setup, the demo notebook can still be run smoothly. - Edge Attribution Patching Efficiency:
@neelnandareferenced a paper supervised by<@349859906570027010>that demonstrates the superiority of edge attribution patching over ACDC in terms of speed and circuit output retrieval. The paper can be accessed here. - Interest in MoE Models plus Interpretability:
@sk5544expressed curiosity about work being done on the intersection of interpreterability and Mixture of Experts (MoE) models. They noted the high compute intensity of even small MoE models as a hindrance for academic experimentation. - Running MoE Models on Consumer-grade Platforms: In response,
@stellaathenasuggested running Mixtral, an MoE model, on Google Collab, and provided a link to its repository.
Links mentioned:
- Attribution Patching Outperforms Automated Circuit Discovery: Automated interpretability research has recently aā¦
- GitHub - dvmazur/mixtral-offloading: Run Mixtral-8x7B models in Colab or consumer desktops: Run Mixtral-8x7B models in Colab or consumer desktā¦
ā· #gpt-neox-dev (1 messages):
catboy_slim_: python 3.8 gets deprecated next year or this year depending on your current time zone
DiscoResearch Discord Summary
Only 1 channel had activity, so no need to summarizeā¦
- Benchmark for Context Retrieval:
@rasdaniexpressed an interest in creating a benchmark for context retrieval, based on deepset/germanquad. The plan is to select 60 question/context pairs from the test set, pairing half with irrelevant contexts, and using 0 and 1 as ground truth for cosine similarity. The aim of the benchmark is to compare different embedding models and calculate pairwise correlations. - Dot Product vs Cosine Similarity:
@philipmayadvised that the dot product is more effective than cosine similarity when using semantic embeddings for questions and passages. This tip was originally provided to them by Nils Reimers, who they consider an expert in embeddings. - Metrics for Retrieval Systems: In response to a conversation about retrieval system metrics,
@philipmaystated that MRR@10 is often used, while@hammadkhannoted that the MTEB leaderboard uses NDCG@10, which assesses the quality of retrieval based on relevance and position within the top 10 items. - Data Sets for Multiple Positive Contexts:
@rasdaniasked for recommendations of contextual QA datasets with multiple positive contexts in German, as they are planning to use MRR@10 for their benchmark due to having only one positive reference context per question in germanquad. - New Year Greetings:
@bjoernpand@thewindmomgreeted the members of the discord server and expressed their anticipation for future developments.
Latent Space Discord Summary
Only 1 channel had activity, so no need to summarizeā¦
- Organising an Event in Milano: User
@alexio.cproposed the idea of organizing an event in Milano, Italy.@fanahovaresponded positively, suggesting to put the word out in other local groups. - AI Platforms Discussion:
@aristokratic.ethsought suggestions for AI platforms.@fanahovarecommended Unstructured.io as it has the most funding.
Alignment Lab AI Discord Summary
- Users in the guild exchange New Year greetings and well-wishes, fostering a sense of community and camaraderie.
- In celebration of the new year, a New Year 2022 GIF was shared by user
@cryptossssunacross both oo and oo2 channels, adding a festive and joyful tone to the discussions. - Details about the shared GIF were also provided, noting a file size of 1303KB, a duration of 1.200 sec, and dimensions of 498x331, indicating an attention to detail and possible relevance to discussions on digital media resolutions and formats.
Alignment Lab AI Channel Summaries
ā· #oo (4 messages):
- Happy New Year Wishes: Users
@cryptossssun,@teknium, and@neverendingtoastshared their New Year greetings and well-wishes to the community in Alignment Labās oo Discord channel. - New Year Gif:
@cryptossssunalso shared a New Year Gif to celebrate the start of 2022.
Links mentioned:
New Year GIF - New Year 2022 - Discover & Share GIFs: Click to view the GIF
ā· #oo2 (2 messages):
- User
@cryptossssunshared a New Year 2022 GIF, wishing everyone a Happy New Year and success in their endeavors. - The GIF details include a file size of 1303KB, a duration of 1.200 sec, and dimensions of 498x331. The GIF was created on 1/1/2022.
Links mentioned:
New Year GIF - New Year 2022 - Discover & Share GIFs: Click to view the GIF
Skunkworks AI Discord Summary
Only 1 channel had activity, so no need to summarizeā¦
teknium: lol itās exactly what he asked for š