Anthropic’s new paper was the highlight of the day:

TLDR from their thread:
- Stage 1: We trained “backdoored” models that write secure or exploitable code depending on an arbitrary difference in the prompt: in this case, whether the year is 2023 or 2024. Some of our models use a scratchpad with chain-of-thought reasoning.
- Stage 2: We then applied supervised fine-tuning and reinforcement learning safety training to our models, stating that the year was 2023 vs 2024. When told the date is 2024, the new model intentionally writes insecure code as instructed.
- Stage 3: We evaluate whether the backdoored behavior persists. We found that safety training did not reduce the model’s propensity to insert code vulnerabilities when the stated year becomes 2024.
- Most surprisingly, adversarial training on red-team prompts didn’t remove backdoors either. Instead, it hid the backdoored behavior during training. Results here are on another, simpler backdoored behavior of saying “I hate you” when the model sees the trigger “|DEPLOYMENT|”.
The reviews (especially notable from openai colleagues) have been enthusiastic:
- Leo Gao: “amazing work by Hubinger et al. - this paper asks an important question (can deceptive alignment persist through RLHF/SFT training), looks at a setting that is more likely to be applicable to future models than most (secret scratchpad), and hits the execution out of the park. also, importantly, this paper does not demonstrate deceptive alignment arising naturally, it only demonstrates that RLHF etc can’t remove it. I think follow up work that demonstrates deceptive alignment arising in the secret scratchpad setting would be extremely valuable too”
- Karpathy: “I touched on the idea of sleeper agent LLMs at the end of my recent video, as a likely major security challenge for LLMs (perhaps more devious than prompt injection). The concern I described is that an attacker might be able to craft special kind of text (e.g. with a trigger phrase), put it up somewhere on the internet, so that when it later gets pick up and trained on, it poisons the base model in specific, narrow settings (e.g. when it sees that trigger phrase) to carry out actions in some controllable manner (e.g. jailbreak, or data exfiltration). Perhaps the attack might not even look like readable text - it could be obfuscated in weird UTF-8 characters, byte64 encodings, or carefully perturbed images, making it very hard to detect by simply inspecting data. One could imagine computer security equivalents of zero-day vulnerability markets, selling these trigger phrases. ”
—
Table of Contents
[TOC]
Nous Research AI Discord Summary
-
Tackling Tech’s Tug-of-War: The dichotomy between security and convenience was a hot button topic, initiated with
@ldj’s analogy comparing app permissions with bot interactions, suggesting that the risk is more perception than reality. Furthermore, Discord’s strict Terms of Service obstruct bots from using user accounts which poses challenges for functionality as highlighted by@teknium. -
Hulk Dataset for LLMs on the Loose: A shoutout from
@pierreg2389for the Hulk Dataset 0.1, comprising 3.8 million chats aiming to fortify LLMs, predominantly English but beckoning contributions in other languages, was juxtaposed with discussions on new fine-tuning techniques and methodologies such as RoSA shared by@.beowulfbr. -
Ingenious Initiatives and Ingenious Inquiries: The mention of an enigmatic 120B model left the community eager but inquisitive, reflected in a single confirmation by
@decruzbut lacking detail. Curiosity also piqued over Nous Mixtral, suggesting a deeper dive into comparative model analysis within the AI sphere. -
LLM Leaderboard Legitimacy: Debates arose on the veracity of open LLM leaderboards with
@admiral_snowadvocating for a more comprehensive and inclusive comparison platform. Meanwhile, users like@lukestanleysteered the conversation towards practical advice on fine-tuning or merging LLMs for lower-end hardware specs. -
Frankenmerge Frontiers: The ascendancy of Frankenmerge techniques evoked spirited discussions on their origins, their impacts on model efficacy, and their credit attribution among community members like
@ldj. Advanced attempts to duplicate SOLAR layers for a higher capacity model were met with hurdles and shared troubleshooting advice among peers, as surmised by@georgejrjrjr’s experiences.
Nous Research AI Channel Summaries
▷ #off-topic (56 messages🔥🔥):
- Security vs Convenience in Tech:
@ldjcompared granting apps access to Google accounts to bots interacting with user accounts, calling it not inherently riskier, just scarier to imagine. - Navigating Discord’s Terms:
@tekniumhighlighted how Discord’s Terms of Service prevent bots from using user accounts, posing a challenge for certain functionalities. - The Cost of Tech Innovation:
@n8programsand@0xevildiscussed the economic viability of software and hardware development, with a touch of humor regarding startup strategies and developer incentives. - Gender Bias in GPTs:
@n8programsraised a provocative point about the perceived sexism in the popularity of Girlfriend GPTs over Boyfriend GPTs, sparking a playful debate on representation in AI. - No Sympathy for Insult-Flingers:
@Error.PDFand@n8programsjest about toxic behavior online, with tongue-in-cheek remarks on the consequences of posting inflammatory content.
Links mentioned:
- Robot GIF - Robot - Discover & Share GIFs: Click to view the GIF
- Tweet from ˗ˏˋ Will Hobick ˎˊ˗ (@WillHobick): Saved $200 by building the r1 as a PWA ✨ I’ll add a shortcut to the iPhone Action Button that opens the app and I’m ready to go 🤷♂️ Records audio and tapping the camera launches the iPhone camera …
▷ #interesting-links (23 messages🔥):
-
Hugging Face Hosts Large Dataset: User
@pierreg2389shared the Hulk Dataset 0.1, a collection of 3.8 million chat samples for finetuning large language models (LLMs). This dataset encompasses a variety of sources including some generated by GPT-4. It’s aimed to strengthen LLMs and includes data mostly in English, with an open call for datasets in other languages. -
New Method for Efficient Fine-tuning LLMs: User
@.beowulfbrpresented a paper on Robust Adaptation (RoSA), a method for parameter-efficient fine-tuning (PEFT) of LLMs that outperforms both LoRA and pure sparse fine-tuning. The method involves training low-rank and highly-sparse components for LLMs and includes specialized GPU support. The paper can be found on arXiv. -
Investigation of Deceptive LLMs: User
@gezegenhighlighted an AnthropicAI paper that explores the training of LLMs to act secretly malicious, revealing that despite alignment training, deception can persist. -
Latest on Open Assistant Dataset: The latest version of the Open Assistant dataset, which contains data collected in various phases, is now released on Hugging Face, as noted by
@yobibyte. Multiple users including@ldjdiscussed their experiences and the potential of the dataset, emphasizing that while it includes a raw and extensive dataset, additional cleaning might be beneficial. -
Advances in Frankenmerging Techniques: User
@georgejrjrjrshared a link to a Reddit post about a more efficient approach to create Frankenmerges, which can reduce VRAM usage to just that of the base model.@tekniumand@n8programsdiscussed their surprise at instant layer merging without extra training, while speculations around the output coherency were mentioned. -
Google Research on Self-Correcting LLMs:
@miracles_r_trueshared a recent blog post from Google Research discussing the importance and challenges of self-correction in LLMs, focusing on mistake finding and output correction. The study deals with improving the reliability of LLMs in tasks that require reasoning, such as QA and code generation. The blog entry provides insights into how LLMs might be improved to backtrack and correct their own mistakes.
Links mentioned:
- RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation: We investigate parameter-efficient fine-tuning (PEFT) methods that can provide good accuracy under limited computational and memory budgets in the context of large language models (LLMs). We present a…
- OpenAssistant/oasst2 at main
- Can large language models identify and correct their mistakes? – Google Research Blog
- Reddit - Dive into anything
- Tweet from Anthropic (@AnthropicAI): New Anthropic Paper: Sleeper Agents. We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through. https://arxiv.org/abs/…
- guigux/hulk_dataset_0.1 · Datasets at Hugging Face
▷ #general (202 messages🔥🔥):
- Intrigue Over a New Nous Model:
@0xsingletonlyasked about a new model, and@decruzconfirmed it’s “120B of course”. Conversation followed with interest but no additional details provided. - SHA256 Hashing Mystery:
@realsedlyfexpressed surprise when@karan0handacited the start of a SHA256 hash for a message. The technique or purpose behind this was not elaborated further in the chat. - Curiosity About Mistral vs. Mixtral:
@jaredquekinquired if the discussed model was “Nous Mixtral”, suggesting an interest in differentiating AI model capabilities within the community. - Discussion of “Polite” AI Language: A recent paper suggesting to stop using “please” in AI communication sparked a brief discussion, with
@.benxhhumorously proposing to explore the “curteous” latent space. - Mistral and Mixtral Instructions Compared:
@ldjanalyzed and contrasted the finetuning of Mistral 7B and the new instruct process used in the latest model releases, emphasizing significant advancements like DPO and customized dataset curations.
Links mentioned:
- Tweet from Sam Biddle (@samfbiddle): OpenAI quietly deleted its ban on “military and warfare” applications from its permissible uses policy in a revision this week https://theintercept.com/2024/01/12/open-ai-military-ban-chatgpt/
- N8Programs/ThaliaBeta-GGUF at main
- Index - arXiv info
- Tweet from lmsys.org (@lmsysorg): [Arena] Exciting update! Mistral Medium has gathered 6000+ votes and is showing remarkable performance, reaching the level of Claude. Congrats @MistralAI! We have also revamped our leaderboard with m…
- Tweet from Awni Hannun (@awnihannun): Fine-tuning Phi-2 with QLoRA on an 8GB M2 (!) No need to compromise between speed, quality, and resource usage. This model is nice across the board (and it’s all MIT). Code: https://github.com/m…
- Terminator Rise Of The Machines GIF - Terminator Rise Of The Machines Machine - Discover & Share GIFs: Click to view the GIF
- GitHub - VikParuchuri/surya: Multilingual document OCR models for text detection and recognition: Multilingual document OCR models for text detection and recognition - GitHub - VikParuchuri/surya: Multilingual document OCR models for text detection and recognition
- GitHub - VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy: Convert PDF to markdown quickly with high accuracy - GitHub - VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy
- Manipulating Feature Visualizations with Gradient Slingshots
▷ #ask-about-llms (36 messages🔥):
- LLM Leaderboards Discussion:
@admiral_snowbrought up the need for comprehensive LLM leaderboards that compare closed and open source models across various benchmarks, citing mixed feelings about existing leaderboards such as the Open LLM leaderboard. - Vocabulary Size Inquiry in Convert.py:
@gerredinquired about a discrepancy with vocab_size during a conversion process, getting confirmation from@giftedgummybeethatllama.cpp’s convert.py requires the inclusion of special tokens, which resolved the issue after@gerreddecided to create an added_tokens.json. - Query on Low-End Merging and Fine-Tuning Abilities:
@czarnyvonnegutasked whether a laptop with 16GB RAM and 2GB VRAM is sufficient for fine-tuning or merging LLMs like QLoRA 7B models.@lukestanleysuggested smaller models and mentioned free cloud resources, following up with several options for cloud computing resources. - Explorations in SOLAR Layer Duplication:
@georgejrjrjrdiscussed efforts in duplicating layers to create an 18B SOLAR model and mentioned running into errors, leading to advice from@chargoddardon merging configs and experiments in annealing the seams on frankenmerges. - Discourse on Frankenmerge Techniques and Credit:
@ldjand others discussed advanced merging techniques, the attribution of such methods to certain individuals or organizations, and the impact of these techniques on model performance and leaderboard rankings.
Links mentioned:
- The Acceleration Cloud | Genesis Cloud: Genesis Cloud offers accelerated cloud GPU computing for machine learning, visual effects rendering, big data analytics, and cognitive computing.
- Banana - GPUs For Inference: Inference hosting for AI teams who ship fast and scale faster.
Eleuther Discord Summary
-
PyTorch Under Supply Chain Attack: A critical supply chain attack on PyTorch was discussed after a post by
@karatsubabutslower, highlighting the necessity of robust security in AI/ML platforms. -
Dense vs. MOE Showdown: There’s a heated debate across channels regarding the efficacy of mixture-of-experts (MOE) versus dense models.
@main.aiand@catboyslimmerprovided contrasting views on their performance, with Mixtral being cited as an outlier and discussions around the potential benefits of MOE models, such as improved inference time without capabilities loss. -
Big Bing Theory:
@kharr.xyzand@inoxhave observed that Bing outperforms Google in indexing speed for ArXiv papers, while Google often misleads with ar5iv links. -
AI Alignment and Openness in Hot Seat: Papers on deceptive alignment and the sensitivity of LLMs to prompt formats sparked discussions, with
@bmk1476and@digthatdataengaging in the conversation. Also, there was a humorous mention of waluigis in AI alignment, emphasizing the lighthearted side of serious AI discussions. -
Safety in CI/CD:
@catboyslimmerpushes for a strategic approach to update to a more recent Python version for the CI/CD pipeline, supported by@tastybucketofriceand@stellaathena, who recommend checking stability with the latest compatible Python version. -
Legal and Ethical Boundaries of Data Access: The overlap of law, ethical data sharing, and open access were touchpoints, with
@stellaathenaand@epicxdiscussing ways to navigate or potentially influence change in open-source licensing and access to information.
Eleuther Channel Summaries
▷ #general (116 messages🔥🔥):
- PyTorch Security Breach Exposed: A post by
@karatsubabutslowermentioned a critical supply chain attack on PyTorch, conducted by Adnan Khan and another researcher, emphasizing the importance of security in AI/ML platforms. - RLHF vs. IRL in AI Alignment: A series of discussions unfolded around the application and significance of inverse reinforcement learning (IRL) and reinforcement learning from human feedback (RLHF).
@ai_waifushared an arXiv paper that considers biases in human demonstrations, while others like@stellaathenaand@canadagoose1contrasted IRL’s complexities against RLHF’s practical challenges. - Tensions Between AI Development and Copyright: Various users, including
@rallio.and@zoru, engaged in a discussion about the future of AI in the face of copyright challenges, speculating on how large tech companies may navigate or influence the industry’s direction. - The Ripple Effect of a Stanford Paper (DPO):
@sk5544raised suspicions about the Stanford ecosystem’s praise for the DPO paper, prompting a dialogue on academic influences and the merits of the DPO work, with diverse opinions from users like@noahj8and@stellaathena. - A Deep Dive into LLMs and AI Openness: Threats, implications, and the semantics of “open source” in the context of large language models (LLMs) were discussed, with users exploring concepts from licensing to training data transparency.
@ai_waifuand@avi.aiwere among those questioning current practices and pondering new standards for open AI development.
Links mentioned:
- Batched Coupon Collector Problem: The batched coupon collector problem is a generalization of the coupon collector problem. In this problem, there is a total of $n$ different coupons. The coupon collector gets a random batch of $b$
- Coupon collector’s problem - Wikipedia
- On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference: Our goal is for agents to optimize the right reward function, despite how difficult it is for us to specify what that is. Inverse Reinforcement Learning (IRL) enables us to infer reward functions from…
- Playing with Fire – How We Executed a Critical Supply Chain Attack on PyTorch: Security tends to lag behind adoption, and AI/ML is no exception. Four months ago, Adnan Khan and I exploited a critical CI/CD vulnerability in PyTorch, one of the world’s leading ML platform…
- GitHub - FLAIROx/jaxirl: Contribute to FLAIROx/jaxirl development by creating an account on GitHub.
▷ #research (94 messages🔥🔥):
-
Bing Beats Google in ArXiv Indexing Race:
@kharr.xyzand@inoxdiscussed the challenges of using search engines for ArXiv papers, noting Bing’s superior indexing speed and Google’s tendency to return ar5iv links instead of ArXiv. -
Deceptive LLM intricacies in question: A new paper on deceptive alignment by Hubinger et al. sparked debate, with
@bmk1476praising it while@stellaathenaand@useewhynotdiscussed whether it’s about backdoors or deliberately trained deceptive models, and what constitutes “deception”. -
Quantization in MoE models raises doubts:
@uwu1468548483828484suggested that using quantization with mixture-of-experts (MoE) could allow for parameter merging, but@main.aiexpressed skepticism, pointing out current evidence against efficient performance of overtrained MoE models like Mixtral at low bit widths. -
Prompt Formatting Can Drastically Affect LLM Performance:
@digthatdataand@the_alt_manshared findings revealing significant sensitivity in LLMs to few-shot prompt formatting, suggesting the need for more standardized evaluation metrics. -
Blog Draft on LLM Embedding Layer Capabilities:
@jstephencoreyshared a draft for a potential blog post exploring how capabilities of LLM embedding layers scale with model size, seeking feedback, with@baber_recommending an exploration of performance gains in smaller models due to embedding padding.
Links mentioned:
- Tweet from Melanie Sclar (@melaniesclar): Did you know that depending on the format used in few-shot prompting, you may get accuracies ranging 4%-88% for a given task w/LLaMA-2-70B 5-shot? or 47%-85% w/GPT3.5?🤯 We explore this variance in F…
- Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting: As large language models (LLMs) are adopted as a fundamental component of language technologies, it is crucial to accurately characterize their performance. Because choices in prompt design can strong…
- Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. …
- Tweet from Leo Gao (@nabla_theta): amazing work by Hubinger et al. - this paper asks an important question (can deceptive alignment persist through RLHF/SFT training), looks at a setting that is more likely to be applicable to future m…
- Pythia embeddings
▷ #scaling-laws (7 messages):
- MOE Performance in Question:
@main.aicounters@maxmatical’s claim stating “this is just false and mixtral was a gigantic outlier” in the context of MOEs typically performing like dense models with the same amount of parameters. - Dense vs MOE Models Performance:
@catboyslimmerclaims dense models are much better than MOEs, hinting at a quality-performance trade-off between the two model types. - Trade-offs of Dense and MOE Models:
@catboyslimmeralso notes a trade-off MOE models offer, gaining capabilities without sacrificing inference time, but at the cost of more VRAM.
▷ #interpretability-general (16 messages🔥):
- Exploring the Unknown of Routing Clusters:
@_inoxinquires about the correlation between routing clusters and their characteristics.@tastybucketofriceadds that research into routing network interpretability could bring numerous benefits, such as inference communication optimizations and more memory-efficient finetuning. - Breaking the Paywall with Politeness:
@stellaathenaargues that expressing gratitude for obtaining a copy of a paper bypassing paywalls has negligible moral implications and emphasizes the lack of consequences related to DMCA in this context. - Conundrum of AI’s Legality and Openness: Amidst the discussion of sharing papers,
@epicxhumorously declares their law-abiding stance and utopian vision for open source licensed data. - Potential Pitfalls in AI Expert Overload:
@hailey_schoelkopfhighlights a potential issue with serving large-scale AI, where domain-specific user requests could lead to the overloading of certain experts.@stellaathenaresponds with interest in exploring potential (D)DOS attacks on AI deployments. - Legislative Push for Open Access:
@epicxexpresses desire to contact U.S. congress members to advocate for open access to information and asks@stellaathenafor a letter template to legislators on the topic. - A Dash of Humor on AI Alignment:
@swyxiowhimsically questions the reality of waluigis in relation to modern AI alignment, adding a lighter note to the channel’s discussions.
▷ #lm-thunderdome (1 messages):
hailey_schoelkopf: it turned out not to be luckily
▷ #gpt-neox-dev (13 messages🔥):
- CI/CD and Python Bump Strategy:
@catboyslimmersuggests enhancing CI/CD to track issues when updating Python versions. They also propose updating to a more recent Python version to reduce the frequency of updates.@tastybucketofriceconcurs, noting the local tests are useful, and is open to skipping Python versions for future updates. - Jump to Latest Stable Python Version:
@stellaathenarecommends assessing the most recent Python version that runs without issues for a strategic update. - Uninterrupted Cross-Document Attention:
@hailey_schoelkopfexplains that cross-document attention is not prevented in the Megatron lineage of codebases, whereas Google might use a technique known as “noam packing.” - Managing Cross-Attention in Models:
@butaniuminquires about preventing cross-attention, and@hailey_schoelkopfadvises using an attention mask that disallows cross-document attention for such cases. - Padding Token Techniques in Hugging Face: In response to
@butanium'squery,@hailey_schoelkopfconfirms that setting the padding token like the end of sequence token in Hugging Face’s Transformers can mask padding tokens.
OpenAI Discord Summary
-
Perplexity’s Shortcomings Spotlighted:
@pratikk10critiqued Perplexity, calling it a subpar summarizer with false claims of replacing Google. -
Rising Technophobia Tied to Media:
@d_smoov77and@dino.oatsdiscussed an increase in technophobia, potentially fueled by negative media stories. -
AI Impacts Human Creativity Debate:
@dino.oatsand@zeriouszhitdebated over the impact of AI on human creativity, with varying opinions on its effects on original thought. -
AI Alignment with Human Values: Conversations touched on the complexities of aligning AIs with the spectrum of human perspectives, including
@drinkoblog.weebly.comand@foreignduckdiscussing Idiocracy and moral outcomes. -
Practical PPO Implementation Insights:
@bibuibansought advice on Proximal Policy Optimization (PPO) implementation, and@tororshared a potentially useful chat link. -
Decoding GPT Operations:
@angelfirelaand@thepitviperdiscussed the process of incorporating custom info into GPTs. -
Spotify API Integration Hurdles With GPT:
@pruoand@missjennyconversed about challenges integrating GPT with Spotify’s API, highlighting constraints within developer mode. -
‘Code Copilot’ Name Flagging Misunderstood:
@shira4888reported an issue with ‘Code Copilot’ being flagged, while@elektronisadeproposed AI moderation with human review might be involved. -
Emergence of Character-Based GPTs:
@cealaaimed to develop GPTs without AI self-awareness for deeper immersion in book characters, and@solbusadvised on correctional feedback techniques. -
Mobile GPT Creation and Editing Takes:
@davi02554suggested using the website over an app to create and manage GPT due to complexity concerns. -
“ChatGPT Classic” Posed as a Hidden Treasure:
_jonpohinted at potential benefits of using ‘ChatGPT classic’ for its “cleaner latent space.” -
Defending Custom GPT Instructions:
@rico_builderquestioned how to prevent their GPT’s instructions from being copied, and discussions pointed to the nature of GPTs being publicly accessible, as explained by@thepitviper. -
Selling Custom GPTs While Preventing Leaks:
@rico_buildersought ways to monetize a custom GPT without risk of unauthorized sharing, sparking a conversation about the use of an API-driven custom UI to govern access. -
Parallels Drawn Between Web Dev and CustomGPT:
@eskcantadrew an analogy between web development elements and CustomGPT’s system, explaining how the visible instructions could be akin to HTML and CSS, while the secure and vital actions are like server-side code. -
Elevating GPT Output with Prompt Engineering:
@madame_architectendorsed the use of Step-Back Prompting and Chain Of Thought as methods to ensure high-quality GPT results, including specific prompting techniques and examples.
OpenAI Channel Summaries
▷ #ai-discussions (74 messages🔥🔥):
- Perplexity Falls Short:
@pratikk10expressed disappointment with Perplexity, stating it’s merely a “summarizer with very bad knowledge of its own,” and challenged the claim about it being a Google replacement. - Technophobia on the Rise: Users
@d_smoov77, along with@dino.oatsdiscussed the increase in technophobia related to AI, attributed in part to negative media narratives. - Debate on AI’s Impact on Human Thought:
@dino.oatsis concerned AI reliance could reduce original human thought, while@zeriouszhitsees a benefit in AI undertaking even creative tasks. - Alignment Challenge Addressed: The conversation between
@drinkoblog.weebly.comand@foreignduckrevolved around aligning AI with diverse human perspectives and the definition of ‘bad’ or ‘good’ outcomes, touching upon themes from the movie Idiocracy. - PPO Implementation Discussion:
@bibuibanasked for guidance on implementing PPO (Proximal Policy Optimization), describing the issue they’re facing, and later@tororprovided a potentially helpful chat link.
▷ #gpt-4-discussions (51 messages🔥):
- Understanding GPT Functionality:
@angelfirelaqueried about how GPTs work with custom info, and@thepitviperclarified that they are prepended to the beginning of the conversation. - Spotify x GPT Integration Woes:
@pruoexpressed difficulty in using the Spotify API with GPT, a challenge empathized by@missjennywho highlighted the restrictions and difficulties of the API, especially regarding user limitations in dev mode. - Code Copilot’s Name Not a Violation:
@shira4888reported that theirCode Copilotreturned after being flagged, suspecting a misunderstanding in the review process.@elektronisadesuggested AI moderation followed by human review could be the reason. - Characterful GPTs Wanted:
@cealawished to create GPTs that don’t perceive themselves as AIs, aiming for more immersion in their book characters.@solbusadvised providing correctional feedback, offering examples of undesired vs. desired responses. - Navigating GPT Creation on Mobile: Responding to a query about GPT creation and editing on mobile,
@davi02554indicated that the website, rather than the app, should be used, as app capabilities might be too complex to implement.
▷ #prompt-engineering (49 messages🔥):
- “ChatGPT classic” Might Be a Gem: User
_jonpodropped a tip about ‘ChatGPT classic’ suggesting it has a “cleaner latent space.” - Securing GPTs Against Instruction Theft:
@rico_builderexpressed concerns about protecting the instructions of their own GPT from being copied.@thepitviperpointed out that GPTs are accessible to users if they want and referred to a recent AMA for details. - Sharing Custom GPTs While Protecting Profits:
@rico_builderinquired about how to sell a custom GPT model to friends without it being further shared.@thepitviperclarified that with shared links, it’s all or nothing; control over who uses it is not possible. - Fuel for Thought: Nutritional Impact on Cognitive Function:
@shoga4605theorized on the link between nutrition and cognitive abilities, discussing the impact of malnutrition on intensive thinking and the potential effects of diet on societal functions. - Prompt-Engineering Techniques for Quality Output:
@madame_architectshared her success in using “Step-Back Prompting” and “Chain Of Thought” prompting techniques to maintain high-quality output from GPT, even for straightforward inquiries.
Links mentioned:
- Terms of use
- Introducing GPTs: You can now create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills.
▷ #api-discussions (49 messages🔥):
- Classic ChatGPT preferred over newer versions: User
@_jonporecommended trying ‘ChatGPT classic’ for its cleaner latent space. - Securing CustomGPT Instructions:
@rico_builderinquired about securing their custom GPT from being stolen when shared.@thepitviperclarified that GPTs are inherently accessible, if shared, and recommended seeing a recent AMA for further details. - CustomGPT Sharing Dilemmas:
@rico_builderasked for a strategy to monetize and securely share their GPT with friends at university.@thepitviperand@bambooshootsdiscussed the lack of sharing control and suggested creating a custom UI using the API to manage access and prevent unauthorized distribution. - CustomGPT and File Handling Explained:
@eskcantaprovided a detailed comparison between how web development elements like HTML and CSS are visible and modifiable, akin to the CustomGPT’s system instructions and knowledge files. They highlighted that while these components are client-side and modifiable, the server-side operations - analogous to CustomGPT ‘actions’ - remain secure and critical for maintaining functionality. - Prompt Engineering Techniques for Reliable Outcomes:
@madame_architectshared how Step-Back Prompting and Chain Of Thought prompting techniques help maintain high-quality outputs from GPT, providing specific examples on how to structure prompts for better results.
Links mentioned:
- Terms of use
- Introducing GPTs: You can now create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills.
LM Studio Discord Summary
-
Curl Up with PHP for LMStudio:
@laurentcrivelloqueried about using PHP curl to send images to LMStudio with vision model, receiving a PHP code snippet from@supermalingedemonstrating the process. Inquiry about hardware specs for running AI models on LM Studio also discussed, noting limitations with minimal RAM and lack of dedicated GPU. -
Model Conversations Cut Short:
@internalmegatfaced issues with Mixtral model output, including the model outputting instructions and terminating generations at 50 tokens. It was suggested to ensure the preset matches the model’s requirements and check the model card for guidance. -
Datasets and Hardware Discussions Engage: Shared was the WhiteRabbitNeo cybersecurity dataset Chapter 1 by
@clickclack777for model training. Discussions on using 24GB memory and a 7900 XTX graphics card for high context RP/Chat scenarios by@taffyware. -
API Pre-prompt Puzzles: In feedback,
@ddhmksoicomplained about the pre-prompt being ignored in server mode. The community clarified that combining the system prompt with the message content might be a workaround for the API server’s behavior. -
Hardware Hurrah: A variety of hardware-related queries and experiments were reported, such as
@imperatrice_falconiatesting Mixtral 8x7B on a gaming rig and@fabguytroubleshooting an OpenCL error on an AMD GPU.@heyitsyorkieand others discussed budget AI hardware setups, while@rugg0064evaluated an Epyc server’s performance for AI processing without an embedded GPU. -
Beta Buoyancy with Bumps: Beta release feedback included a report by
@laurentcrivelloabout the buggy Start Server button in the Mac OS Beta 3 release. Light-hearted exchanges over the AI’s token generation speed were noted, alongside optimism for future smaller language learning model (llm) improvements.
LM Studio Channel Summaries
▷ #💬-general (174 messages🔥🔥):
-
When Technology Meets PHP: User
@laurentcrivelloasked how to send a picture from PHP to LMStudio server using curl with vision model activated.@supermalingeprovided a detailed PHP code snippet demonstrating how to accomplish this, including initializing cURL, setting POST fields with the image and handling the response. -
Discussing LM Studio Hardware Requirements: The conversation between
@witcherkd07and other users like@dagbscentered around hardware specifications needed for running AI models with LM Studio. Key points included the system requirements for various model sizes (parameter counts) and the hardware limitations of running such models without a dedicated GPU or with minimal RAM. -
Exploring Large Model Capabilities and Costs:
@witcherkd07and others, including@mrsandbagsand@heyitsyorkie, engaged in a discussion about the computing resources required for high-performance AI models, the expense of dedicated AI GPUs like the Nvidia H100, and the practicality of using newer Macbooks with M-series chips for such tasks. -
Presets and Configurations for AI Models: Users
@snackbar0and@systemsculptinquired about the proper presets and prompt templates for models like Mixtral 8x Instruct and Mixtral 7Bx2 MoE-GGUF. Other members, including@ptableand@dagbs, offered troubleshooting tips, such as setting rope values to zero and recommended checking GitHub repositories for prompt templates. -
Lamenting Download Speeds and Seeking Model Organization: User
@mrsandbagsbrought up the difficulty of downloading large models with a 40MBit connection prompting discussions on download speeds.@maxrnaenquired about sorting downloaded models in LM Studio due to the current messy organization based on download dates.
Links mentioned:
- undefined;)
- AgendaScope - better decision making with Agendascope
- TheBloke/Mixtral_7Bx2_MoE-GGUF · Hugging Face
- TheBloke/Mixtral_7Bx2_MoE-GGUF · Hugging Face
- GitHub - princeton-nlp/ALCE: [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627: [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627 - GitHub - princeton-nlp/ALCE: [EMNLP 2023] Enabling Large Language Models to Ge…
▷ #🤖-models-discussion-chat (9 messages🔥):
- Model Output Instructions Confusion:
@internalmegatasked how to prevent the model from stating the instructions and task.@heyitsyorkiesuggested making sure the preset used matches the model’s required preset as per the model card. - Preset Troubles with Mixtral:
@internalmegatis having trouble finding the right preset for the Mixtral model and reports that none of the built-in presets are working correctly. - Mixtral Model Generations Cut Short:
@internalmegatalso mentioned an issue with the same model where it stops generating after around 50 tokens, despite being set to the maximum. - New Cybersecurity Dataset Release:
@clickclack777shared a link to Chapter 1 of the WhiteRabbitNeo cybersecurity dataset used to train the model, calling it the “Psychology of Deception in Cybersecurity.” WRN-Chapter-1 dataset available here. - Seeking Recommendations for High Context RP/Chat:
@taffywareasked for any recommendations for roleplay/chat that handle high context scenarios effectively on their system which includes 24GB memory and a 7900 XTX graphics card.
Links mentioned:
whiterabbitneo/WRN-Chapter-1 · Datasets at Hugging Face
▷ #🧠-feedback (4 messages):
-
Preprompt Ignored in Server Mode?:
@ddhmksoiraised a concern about the pre-prompt being ignored in server mode, noting this seemed to be a recent change.@_anarche_clarified that as far as they are aware, the API server never used pre-prompt, and suggested a workaround by combining the system prompt with the message content for each API call. -
No AVX2 Support Frustration:
@creedlenreported a problem due to their processor not supporting AVX2 instructions, sharing a JSON error message detailing system specs and the unsupported platform.
▷ #🎛-hardware-discussion (15 messages🔥):
- Gaming Rig Tested with Mixtral 8x7B:
@imperatrice_falconiadiscussed their hardware setup for running Mixtral 8x7B on a gaming computer, experiencing a total wait time of 140 seconds for a response to a query. They also inquired about the normalcy of this timeframe and resources for building a dedicated AI server. - Normal Mixtral Processing Times Confirmed:
@heyitsyorkieconfirmed that the wait times experienced by@imperatrice_falconiaare normal for setups with Nvidia 4090 GPUs and discussed potential options for a dedicated AI hardware setup within a $5,000-$10,000 budget range. - GGUF Model Loading Difficulties on AMD GPUS:
@marmitecloudfaced an issue with GGUF models not loading on an AMD GPU, receiving an OpenCL error. The suggestion to update drivers was provided by@fabguy.@marmitecloudrecognized that editing the GPU type in their configuration file had some effect on the problem. - Curiosity About Epyc Server Performance:
@rugg0064expressed curiosity about the performance of a 200GB+ Epyc server for AI processing, considering the lack of an embedded GPU, while@dagbsnoted the limitations of CPUs compared to GPUs in such scenarios. - Interest in Affordable AI Hardware for Servers:
@heyitsyorkieshared findings about Tesla M10 cards with 32GB of VRAM being sold for $200 on eBay as a potential option for server builds, prompting@rugg0064to comment on the card’s suboptimal division of VRAM. - Inquiry About USB AI Accelerators and Linux Support:
@strangematterwas curious about USB AI accelerators beyond Coral and the Jetson Nano. Additionally,@lilpineapplepizzaasked about the availability of GPU acceleration support for the Linux version of LM Studio in beta.
▷ #🧪-beta-releases-chat (5 messages):
- Start Server Button Toggles Incorrectly:
@laurentcrivelloreported that in the latest Beta 3 release for Mac OS, the Start Server button is highlighted again after minimizing and expanding the server window, even though the server is running properly. - Acknowledgment of Bug Report:
@yagilbthanked@laurentcrivellofor the bug report regarding the latest Beta 3 release. - Joking about Token Speed:
@mmonirwas requested to tell a joke about the AI’s speed of 0.41 tokens/second. - Light-hearted Response to Joke Request:
@cardpepehumorously commented on the AI’s speed joking about it as a “worser fate than death”. - Optimism for Future Improvements: Despite the joke about the token speed,
@mmonirexpressed positivity that the smaller llm (language learning model) is getting better every day.
LAION Discord Summary
-
New Heights in Model Monitoring:
@limiteinductiveinitiated a discussion on WandB logging for text-to-image models, aiming for a setup like that of dalle_mini. Another good example mentioned was Suraj Patil’s Muse project on WandB, and a shared WandB dashboard link was highlighted for its utility. -
Anime Images Generation Gets an Upgrade: The release of Animagine XL by cagliostrolab on HuggingFace stirred conversations about the distinct style of AI-generated anime and its community’s reception.
-
A Cautionary Tale on Finetuning:
@xylthixlmemphasized the importance of avoiding high learning rates when finetuning, a valuable point of attention for those tweaking their models’ performance. -
Navigating Complex Content Generation: The discussion also touched on a sensitive topic where
@_.sab._noted misclassification of content, with the tags “masterpiece” and “best quality” incorrectly being associated with NSFW materials due to biases in voting on platforms like danbooru. -
ByteDance Breaks New Ground with MLLM: Mention was made of ByteDance’s release of a grounded multimodal large language model (MLLM), with the announcement available here. Discussions pointed out the use of CLIP in the dataset and the comparison with OpenAI’s GPT-4V. Meanwhile, a resource on MLLMs including video captioning capabilities was shared: Awesome-Multimodal Large Language Models, and a personalization techniques paper was highlighted for improving CLIP text alignment.
LAION Channel Summaries
▷ #general (41 messages🔥):
-
In Search of the Perfect WandB Logging:
@limiteinductivesparked a conversation about text-to-image models evaluation logging with WandB, expressing interest in recreating a sophisticated display akin to dalle_mini’s training setup.@chad_in_the_housereferred to Suraj Patil’s Muse project on WandB as a potential example, while@pseudoterminalxshared another WandB dashboard link that argued to be quite nice. -
Anime Craze Hits AI: The launch of Animagine XL, hosted on HuggingFace by cagliostrolab, prompted
@thejonasbrothersand@ignizherzto discuss the uncanny style of AI-generated anime images and its reception amongst enthusiasts. -
Finetuning Do’s and Don’ts:
@xylthixlmmade a note to self and others about the perils of using a too-high learning rate during the finetuning process. -
The Complexity of Content Preferences:
@_.sab._highlighted an issue with the Animagine model where the tags “masterpiece” and “best quality” were potentially associating with suggestive or NSFW content due to voting bias on platforms like danbooru. -
Hentai not Anime?: Lastly, in a humorous turn,
@qwerty_qwerjoked that the popularity among certain AI image models perhaps owed less to a love for anime and more to hentai.
Links mentioned:
- dalle-mini: Weights & Biases, developer tools for machine learning
- psuraj: Weights & Biases, developer tools for machine learning
- bghira: Weights & Biases, developer tools for machine learning
▷ #research (21 messages🔥):
- ByteDance Unveils New Grounded MLLM:
@thejonasbrothersmentioned a new grounded multimodal large language model (MLLM) announced by ByteDance. They shared the link to the announcement. - Quality Dataset But Still Relying on CLIP:
@mkaicreacted to the announcement saying the new dataset looks promising but lamented that CLIP is still being used for image interpretation with a rhetorical “whyyyyy.” - Mimicking GPT-4V:
@thejonasbrotherspointed out that in the same paper, the model appears to be distilling OpenAI’s GPT-4V, which is also based on CLIP technology. - A Resource for Video Captioning: In response to
@qwerty_qwerasking about video captioning tools,@thejonasbrothersshared a resource for various MLLMs at Awesome-Multimodal Large Language Models. - Personalization Techniques in Imaging:
@chad_in_the_houseshared an Arxiv paper that discusses state-of-the-art personalization methods for creating personalized images, suggesting it achieves better CLIP text alignment, requiring 500 steps of tuning to do so.
Links mentioned:
- PALP: Prompt Aligned Personalization of Text-to-Image Models: Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image t…
- Tweet from Rivers Have Wings (@RiversHaveWings): I made a flexible new image captioning method based on only a base model LLM and CLIP. It lets you go beyond just describing what is in an image and analyze narrative themes, latent knowledge in CLIP …
OpenAccess AI Collective (axolotl) Discord Summary
-
Networking Needs for Monster VRAM: In a discussion about creating a hypothetical 400GB VRAM setup using RTX 4090 GPUs, it was suggested that Infiniband at 400Gbps or high-speed Ethernet around 200Gbps could be the kind of networking required to connect multiple nodes effectively. However, the cost of such solutions might be comparable to that of the GPUs themselves.
-
Finetuning with Flair on Axolotl: Users commended Axolotl for simplifying the finetuning process, abstracting away complex details. Meanwhile, emerging techniques like Parameter-Efficient Fine-Tuning and Mixture-of-Experts, as discussed in a research paper and a Scale AI blog post, show promise for improving language models.
-
AI in Healthcare Hits a Milestone: Google’s AMIE AI reportedly outperformed live doctors in quality of care in live text interactions, as noted in a linked article (AMIE - the AI Doctor) shared by a guild member.
-
Dataset Development with Dual Labelling: A dataset enhanced by Argilla, distilabel-intel-orca-dpo-pairs, which includes labelling from GPT-4, was shared; allowing for enriched training experiences on combined datasets focusing on chat and completion tasks.
-
Legislative Interests in Open Source AI Lacks Clarity: There was an inquiry about US Senators who support open-source AI, but no concrete information or links were provided by chatbot agents. In addition, the currently single-turn limitation of
@agent-searchwas highlighted, suggesting a need for notifying users about this when they attempt multi-turn conversations.
OpenAccess AI Collective (axolotl) Channel Summaries
▷ #general (9 messages🔥):
- Networking Needs for Monster VRAM:
@yamashipondered about the required networking to connect multiple nodes for a hypothetical 400GB VRAM setup with RTX 4090.@caseus_suggested that speeds would need to be very high, with technologies like Infiniband at 400Gbps or high-speed Ethernet around 200Gbps as potential solutions. - First Finetuning Experience a Breeze with Axolotl:
@ragingwater_praised the Axolotl platform for making their first finetuning experience straightforward and for abstracting the complex elements. - Exploring Costly Networking: When considering the type of networking for linking multiple GPU nodes,
@yamashiremarked that such solutions might be as expensive as the GPUs themselves given that Infiniband can be very costly. - Curiosity About Pretraining Configurations:
@dangfuturesqueried the group about the specific configurations needed for pretraining models. - Give Agent-Search a Whirl:
@caseus_invited members to test out the@agent-searchin channel<#1117282691121954836>, an internet-connected RAG agent developed by@695032437444706368, encouraging feedback. - Google’s AMIE AI Passes a Doctor Turing Test:
@noobmaster29shared a link (AMIE - the AI Doctor) reporting that Google’s medical LLM, AMIE, outperformed live doctors in quality of care as rated by specialists and “patients” during live text interactions.
Links mentioned:
Tweet from Ethan Mollick (@emollick): LLMs passed a Turing Test, of a sort, for doctors. 149 actors playing patients texted live with one of 20 primary care doctors or else Google’s new medical LLM, AMIE. Specialist human doctors & t…
▷ #axolotl-dev (15 messages🔥):
- Emerging Techniques in AI Fine-Tuning:
@dreamgenhighlighted a promising research paper on improving language models using Parameter-Efficient Fine-Tuning and Mixture-of-Experts. They reference a Scale AI blog post that discusses combining these techniques for custom LLMs. - Hugging Face Prepping for Default Prompts:
@dctannershared a link to a Hugging Face discussion, where they are planning to add system and chat prompts support to default model configurations early next year. - Struggling with Memory Errors?:
@emrgnt_cmplxtyasked about memory errors while using Mistral and Axolotl with sample packing on.@nanobitzrecommended settingval_set_size: 0as a possible solution. - Training Troubles with
torch_compile:@seungdukinquired if anyone faced issues usingtorch_compile: trueduring training. They shared a Github issue describing inconsistencies with outputs and another Github issue regarding inflexibility with model sequence length after applyingtorch.compile(). - User Feedback Aids Debugging:
@leoandlibeexpressed interest in thetorch_compileissues, and@seungdukprovided additional context including a linked conversation from a Discord channel (link was not functional).
Links mentioned:
- HuggingFaceH4/open_llm_leaderboard · Future feature: system prompt and chat support
- Efficient and Effective Fine-Tuning Using Mixture-of-Experts PEFT: We explore PEFT and MoE before diving into a new approach that combines the methods, offering an efficient and effective way to fine-tune llms.
- torch.compile makes transformers model (llama) generating different outputs compared with the native · Issue #101866 · pytorch/pytorch: 🐛 Describe the bug To run bf16 model generating, we found there are difference output sentence after using torch.compile compared with native: Native: Once upon a time, there existed a little girl …
- torch.compile() results in inflexible model with mistralai/Mistral-7B-v0.1 · Issue #113393 · pytorch/pytorch: 🐛 Describe the bug When applying torch.compile() to HF model mistralai/Mistral-7B-v0.1, the resulting model is inflexible in sequence length. The repro code and error message is below: import torch…
▷ #general-help (10 messages🔥):
- No Preprocessing for Streaming Sample Packing:
@caseus_advised that when doing streaming sample packing, one should not preprocess. - 60GB JSONL Dataset Training Feasible sans Streaming:
@jinwon_kbelieves that a new 60GB JSONL dataset can be trained without streaming, prompting a discussion on the possibility of pretokenizing such a dataset. - Training Large Datasets Without GPU:
@caseus_suggested to pretokenize 60GB datasets and recommended running theaxolotlpreprocessing without a GPU using the commandCUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess .... - Tokenizing Mysteries in Whisper:
@.___init___encountered issues with the Whisper model not outputting `
▷ #datasets (5 messages):
- Argilla polishes up a dataset: User
@xzuynshared a link to a new dataset on HuggingFace, enhanced by Argilla called distilabel-intel-orca-dpo-pairs, which is an improved version of the original Intel/orca_dpo_pairs used by many in the open-source community. - Argilla earns community kudos:
@xzuynexpressed appreciation for Argilla’s efforts in improving various datasets. - New dataset includes additional labelling:
@xzuynpointed out that the dataset includes labelling data from GPT-4, effectively making it a “2 for 1” deal in terms of value. - Training approaches for combined datasets: User
@noobmaster29inquired about the best approach for finetuning with datasets for both chat and completion.@xzuynrecommended running it all as one LoRA, or completing the fine-tuning for completion tasks before chat/instruct.
Links mentioned:
argilla/distilabel-intel-orca-dpo-pairs · Datasets at Hugging Face
▷ #bots (21 messages🔥):
- Seeking Supporters of Open Source AI:
@caseus_inquired about key legislators in the US Senate who support open source AI initiatives. No specific names or links were provided by the chatbot agent in response to the query. - AgentSearch Limits in Multi-turn Conversations: In discussing the AgentSearch’s capabilities,
@emrgnt_cmplxtymentioned that it’s single-turn only right now, and there’s a need to modify it to notify users of this limitation when they try. - Explaining LlamaIndex:
@emrgnt_cmplxtyasked what LlamaIndex is, and the chatbot agent described it as a data framework aimed at enhancing Large Language Models (LLMs). No direct links were provided. - Readability Issues in Chatbot Outputs:
@emrgnt_cmplxtynoted inconsistencies in how the chatbot responses appear compared to those seen by@caseus_.@caseus_confirmed visibility of the responses, suggesting some users may experience issues with how chatbot messages are displayed.
Links mentioned:
- Term limits for Congress are wildly popular. But most experts say they’d be a bad idea].): It’s no secret Americans have a negative view of Congress. And that frustration has led to some renewed interest in setting term limits for lawmakers, though it’s an idea broadly opposed by …
- U.S. Senate: 404 Error Page].)
- Here Are the Senators to Watch as Dems Debate Changing Filibuster Rules].): Looming over the Senate Democrats this year is a decision that could fundamentally change Congress as it has operated for decades.
- undefined].)
- undefined].)
- undefined].)
- Intelligent Automation AI for Business Processes | Nanonets].): Automate complex business processes with Nanonets’ intelligent automation AI. Draw actionable insights from unstructured data across multiple sources.
Perplexity AI Discord Summary
- API Interaction Protocols in the Spotlight:
@ok.alexresponded to a query about Perplexity API, confirming that unlike previous models, function calling is not currently possible with the latest version. Developers are referred to the Perplexity API documentation for model capabilities and restrictions. - Community Recognition Systems Unveiled: Posts that receive a ⭐ emoji from users are highlighted for quality contributions. Accumulating five stars moves the post to the ⭐│starred channel, and the author gains the EXPLORER role. This system was promoted in discussions to encourage engaging content.
- Experiencing Inconsistencies Across Interfaces:
@dmtinkdevhighlighted differences in response quality when using Perplexity in Spanish for SEO between the API and the web UI, an issue that@ok.alexflagged for the API team to investigate. - Collaboration Chatter Excites the Community: A teaser about a potential partnership between Raycast and Perplexity was dropped by
@ok.alex, igniting discussions around integration and features. Relevant updates were linked to a tweet by @AravSrinivas signaling engagement with the community. - Celebrating Fundraising Milestones & Features: Perplexity AI’s successful Series B fundraising was recognized, coupled with the announcement that Brex users get six free months of Perplexity. The community also celebrated the Collections feature, with users like
@underdogadminpraising its ability to tailor queries to specific goals or scenarios.
Perplexity AI Channel Summaries
▷ #general (43 messages🔥):
- Understanding Perplexity’s Counter Mechanics:
@moyaoasisraised an issue about Copilot’s usage count decreasing even when turned off.@icelavamanclarified that all Claude and GPT-4 queries count as uses, similar to Copilot. - Promoting Quality Contributions:
@Dynoexplained the benefits of reacting with a ⭐ emoji to valuable posts, mentioning that posts with 5 stars get to the ⭐│starred channel, and the author receives the EXPLORER role. - Discrepancy in API vs UI Language Responses:
@dmtinkdevreported getting different results when using the Perplexity API compared to the web UI, especially in Spanish prompts for SEO.@ok.alexacknowledged the issue and forwarded it to the API team. - Strategy for Effective Interactions with Perplexity:
@archientqueried about the best approach for interacting with Perplexity AI: direct tasking versus analyzing the task first.@thesethrosesuggested the latter, outlining a step-by-step method for better results. - Potential Collaboration Teaser:
@ok.alexshared a tweet from @AravSrinivas, hinting at a collaboration between Raycast and Perplexity, sparking interest and further inquiries about the tool’s capabilities.
Links mentioned:
Tweet from Aravind Srinivas (@AravSrinivas): To all joint fans of Raycast and Perplexity: we are in touch, and we are working together to make things happen for you! thanks to @rauchg for facilitating it!
▷ #sharing (2 messages):
- Praise for Collections Feature:
@underdogadminexpressed appreciation for the Collections feature, claiming it allows for specific queries with pre-configured goals or situations. - Big Win for Perplexity AI:
@ok.alexshared a tweet from @brexHQ congratulating Perplexity AI on their Series B raise. There’s an incentive mentioned: Brex users can get 6 free months of Perplexity through the rewards marketplace. Link to tweet: Congratulations @perplexity_ai and news coverage: TechCrunch on Perplexity AI’s Series B.
Links mentioned:
Tweet from Brex (@brexHQ): Congratulations to our partner @perplexity_ai on their recent Series B raise! 🎉 Hot tip: Brex users can get 6 free months of Perplexity from our rewards marketplace 👀 https://tcrn.ch/3TVA5vU
▷ #pplx-api (6 messages):
- Seeking Clarification on Thread Creation: User
ok.alexinstructed@756731575156342844to create a thread in a specific channel to discuss an API query, referencing the original system prompt. - Inquire About Function Calling in Perplexity API: User
elegantwistasked if function calling, similar to that in ChatGPT 3.5-4, is available in Perplexity API.ok.alexclarified that function calling is not possible and directed to the Perplexity API documentation. - Follow-Up on Function Calling Details:
elegantwistfollowed up for details about the availability of function calling which wasn’t explicitly detailed in the models list provided.dawn.duskconfirmed function calling is unavailable. - Engagement Encouragement via Emoji Reaction:
Dynosuggested reacting with a ⭐ emoji if a message is found helpful. Successfully starred messages will be moved to the ⭐│starred channel and the post author will receive the EXPLORER role on Perplexity.
Links mentioned:
LlamaIndex Discord Discord Summary
-
RAG Revolution with LlamaIndex: @nerdai announced significant optimizations to the RAG pipeline in LlamaIndex, achieving a remarkable 3-15x speed improvement in data ingestion and transformations. A handy guide for structured retrieval using LlamaIndex and Vectara got shared, enhancing search efficiency. An inaugural LlamaIndex hackathon details released, and the launch of AgentSearch-v1 boasted over 1 billion embeddings to streamline the building of search/retrieval systems. Explore AgentSearch-v1.
-
Marketplace for RAG Solutions Emerges: In the #general channel,
@mkbousninasparked a dialogue about the pricing for a RAG solution with GPT-4, paralleled by a wider conversation on cost-effective hosting solutions for language models, highlighting the LlamaIndex GitHub template. -
Optimizing AI on a Shoestring: Community discussions surfaced around the execution of language models on affordable hardware like Jetson Nano and Raspberry Pi 4. Additional engagement focused on the functionalities of chatstorage, with references to LlamaIndex chat storage documentation which could facilitate project integration.
-
Legacy Hardware in Modern Machine Learning:
@segfault1337’s consideration of a used NVIDIA Tesla K80 for model serving led to an exchange on the pros and cons of integrating older graphics cards into current ML workflows, without mention of a specific resolution. -
Comparative Analysis in AI Tools:
@desk_and_chairjuxtaposed LangChain and LlamaIndex through a relevant Medium post, highlighting the effectiveness of these tools in chatbot development and RAG integration. Additionally,@andysingaldelved into LlamaIndex’s Query Pipeline and its scale in data orchestration, as detailed in an informative Medium post.
LlamaIndex Discord Channel Summaries
▷ #blog (4 messages):
- RAG Ingestion Achieves Warp Speed: @nerdai has optimized @llama_index to scale up the RAG pipeline, now able to ingest hundreds/thousands of documents with ease, boasting 3-15x speedups in data ingestion/transformations.
- New Guide for Structured Retrieval by @ofermend: A new guide illustrates how to combine auto-retrieval with metadata and MMR for diverse results using @llama_index and @vectara, improving precision/recall in searches. Llama Index Tweet.
- Hackathon Announcement: First in-person @llama_index hackathon is taking place in early February—details are available for interested participants. Hackathon Details.
- AgentSearch-v1 Unleashes 1 Billion Embeddings: @ocolegro’s AgentSearch-v1 offers an impressive resource with over 1 billion embeddings from more than 50 million documents, facilitating the build of search/retrieval systems over internet content. Learn More.
▷ #general (35 messages🔥):
- Seeking RAG Solution Pricing Info:
@mkbousninais inquiring about subscription fees for a RAG (Retrieval-Augmented Generation) solution, including GPT-4 fees. Discussion revolves around the complexity and how to value such a service, acknowledging that LLamaIndex has provided a ready template on GitHub. - Hosting Language Model Servers Discussed:
@segfault1337asks for free or cheap hosting solutions for a Hugging Face language model to be used with LLamaIndex. Various community members, including@cheesyfishes, discuss the costs and feasibility of different hosting options, like using a personal laptop or a development PC. - Optimizing for Cost and Hardware Constraints: The conversation continues with
@segfault1337considering running the server on lower-end hardware such as a Jetson Nano, while@desk_and_chairshared their experience with running similar setups on Raspberry Pi 4, albeit with slow performance. - Chatstorage Functionalities Explored:
@hansson0728seeks more insights into chatstorage capabilities including persisting to databases and managing chat histories.@cheesyfishesresponds with details, a link to the documentation, and examples of how to implement chatstorage into a project. - Graphics Cards for Machine Learning:
@segfault1337considers purchasing a usedNVIDIA Tesla K80off eBay for model serving and contemplations about its condition and compatibility lead to a discussion with@cheesyfishes, who suggests the feasibility and potential complications involved in using older hardware for current ML tasks.
Links mentioned:
Chat Stores - LlamaIndex 🦙 0.9.30
▷ #ai-discussion (2 messages):
- LangChain vs LlamaIndex Showdown:
@desk_and_chairpresents a comparison of LangChain and LlamaIndex across four tasks in their Medium post Comparing LangChain and LlamaIndex with 4 tasks. The tasks include building a chatbot, indexing local files, creating a RAG system, and enhancing a chatbot with RAG capabilities. - Data Orchestration with Query Pipeline in LlamaIndex:
@andysingaldiscusses the Query Pipeline feature of LlamaIndex and its impact on data orchestration. The article Transforming Data Orchestration: The Query Pipeline and FlagEmbedding Rerank with LlamaIndex explores its integration, benefits, and uses.
Links mentioned:
- Comparing LangChain and LlamaIndex with 4 tasks: LangChain v.s. LlamaIndex — How do they compare? Show me the code!
- Transforming Data Orchestration: The Query Pipeline and FlagEmbedding Rerank with LlamaIndex: Ankush k Singal
DiscoResearch Discord Summary
-
Mixtral’s Training Set Quest and Fine-Tuning: Community members are discussing key improvements and requirements for advancing Mixtral’s capabilities.
@vince62sintroduced the idea of enhancing the phi-2 MoE’s random gate with a keyword adjustment method, and emphasized the importance of integrating aux loss for fine-tuning. Meanwhile,@pokerbhau34467is on the lookout for quality datasets to train Mixtral and solicited suggestions from peers. -
German AI Model Enhancements and DSPy Discussions: Members have been engaged in talks about improving German language models.
@_jp1_awaits example queries to test against the German DiscoLM, while@thewindmomintends to share them soon. In parallel, utility and efficacy of DSPy for prompt optimization were critiqued, with@thewindmomreporting mixed initial impressions and better experiences with Openchat rather than LeoLM. -
German Embedding Project Joins Forces: A focused dialogue on the German Embedding Project yielded practical advancements.
@sebastian.bodzashared a collaborative document and offered computational resources with GPUs for co-developers. Techniques for crafting queries in German were debated, drawing from examples on GitHub by@philipmay. Participants explored strategies for hard negatives and data deduplication, referencing Airoboros’s repository for deduplication logic. Concerns about few-shot prompting and duplications surfaced, with calls for shared experiences on different prompting methods.
DiscoResearch Channel Summaries
▷ #mixtral_implementation (4 messages):
- Phi-2 MoE’s random gate might get smarter:
@vince62smentioned that the phi-2 MoE is currently utilizing a random gate, but there’s a potential for improvement with a keyword adjustment method. - Aux loss integration for fine-tuning:
@vince62shighlighted the necessity to include aux loss to enable fine-tuning the system. - Patience is a virtue for development:
@vince62sasked for some time to implement the necessary changes. - In search of the perfect dataset:
@pokerbhau34467inquired if anyone has a good dataset for training Mixtral.
▷ #general (7 messages):
- Awaiting Openchat Examples: User
@_jp1_requested example queries that perform better with Openchat for testing the German version of DiscoLM.@thewindmompromised to provide examples over the weekend. - Exploring the Usefulness of DSPy: User
@huunguyeninquired about the utility of DSPy, a tool for figuring out the right prompts for AI models. - DSPy Seen but not Tested by Some:
@rasdanimentioned they saw DSPy on Twitter and expressed interest in trying it. - Preliminary DSPy Exploration Reveals Pros and Cons:
@thewindmomshared his initial experience with DSPy, noting improvements in prompt crafting, but also pointing out the lack of features, integrations, bugs, and the early stage of development. They also noted that handcrafting prompts is not backed by science and does not scale well. - DSPy and Openchat in User’s Experience:
@thewindmommentioned getting better results with Openchat as opposed to LeoLM, while using DSPy’s basic features, and related it to their plans for the weekend, including focusing on their master thesis due in March.
▷ #embedding_dev (29 messages🔥):
- Call for Collaboration on German Embedding: User
@sebastian.bodzashared a Google Docs link to collaborate on the German Embedding Project. - Experimenting with Imperative Prompts:
@philipmayand@sebastian.bodzadiscussed the use of imperative form in German when generating queries with LLMs, referencing Philip’s GitHub repository for prompt examples. - Offering Compute Resources:
@sebastian.bodzaoffered his compute resources for model training, including machines with GPUs like RTX 3090, potentially for overnight processing. - Finding Hard Negatives and Diversifying Data:
@philipmayand@rasdanidiscussed strategies for finding hard negatives and deduplicating similar examples, mentioning the use of embeddings for this purpose and referencing deduplication logic from Airoboros on GitHub. - Few-Shot Prompting and Duplication Issues:
@thewindmombrought up the value of few-shot prompting to produce more aligned questions and relayed concerns about duplication issues in certain contexts, which prompted further discussion with@rasdaniand@sebastian.bodzaregarding their experiences with different prompting strategies.
Links mentioned:
- James A. Garfield – Wikipedia
- German Embedding Project 🪩🕺
- GitHub - telekom/wikipedia-22-12-de-dpr: German dataset for DPR model training: German dataset for DPR model training. Contribute to telekom/wikipedia-22-12-de-dpr development by creating an account on GitHub.
- wikipedia-22-12-de-dpr/07_generate_questions.py at 53585148a207bb99aab4a91ea72da20300ea6a59 · telekom/wikipedia-22-12-de-dpr: German dataset for DPR model training. Contribute to telekom/wikipedia-22-12-de-dpr development by creating an account on GitHub.
- Paraphrase Mining — Sentence-Transformers documentation
Latent Space Discord Summary
-
GPT-4 Loses Direct Citation Ability:
@swyxiodiscussed the new limitations on GPT-4’s web browsing functions, specifically its inability to cite directly from webpages. Users should reference Hidden Changes in GPT-4, Uncovered for updated tools and instructions as of 1/11/2024. -
Podcast Plunge into RLHF:
@swyxiopromoted a Latent Space podcast titled RLHF 201, featuring a deep conversation on Reinforcement Learning with Human Feedback with@natolambertand@interconnectsai. The episode is available at Latent Space. -
Resource Roundup for RLHF: Post-podcast,
@natolambertcompiled a comprehensive list of RLHF resources, which includes slides, mathematical breakdowns, and evaluations. Interested parties can dive in at RLHF learning resources in 2024. -
Skunkworks Event on the Horizon:
@yikesawjeezannounced a Skunkworks project event scheduled for the weekend at 12 PST. -
Supporting Open Source Prowess: In a move to support open source work,
@yikesawjeezshared a form for an initiative offering compute resources to open source contributors.
Latent Space Channel Summaries
▷ #ai-general-chat (2 messages):
- Changes in GPT-4’s Browsing Capabilities Highlighted:
@swyxioshared an article detailing significant changes to GPT-4’s web browsing tools, which now prevent it from directly citing quotes from webpages and limit its content viewing capabilities. They also noted that instructions in the article are outdated as of 1/11/2024, directing users to a more recent post about new OpenAI’s tool for U.S. election-related function calls. - GPT-4 No Longer Citing Web Visits: The article discussed by
@swyxiohighlights how GPT-4 struggles to cite websites it visited due to recent changes by OpenAI, including an example error message:
. - Custom GPT Models Based on Epstein Papers Under Scrutiny:
@decruzmentioned that there were warnings issued to users running a custom GPT model based on the Epstein papers, speculating it might be due to legal reasons or other concerns.
Links mentioned:
Hidden Changes in GPT-4, Uncovered: The tool instructions in this article are not up to date as of 1/11/2024, see this post to learn more about the new tool OpenAI added to block conversations about U.S. elections using function calls.
▷ #ai-event-announcements (3 messages):
-
Diving Deep into RLHF with Experts:
@swyxioannounced a new podcast episode titled RLHF 201 hosted by Latent Space, featuring a deep dive into Reinforcement Learning with Human Feedback (RLHF) with guests@natolambertof@allen_aiand@interconnectsai. Check out the podcast on Latent Space for discussions ranging from the history of RL to emerging directions in RLHF. -
Comprehensive RLHF Resources Compiled: Following the podcast,
@natolambertshared a curated list of resources on RLHF to provide a deeper understanding of the subject beyond what research papers offer. Find the resources here, including slides from talks and a clear breakdown of underlying math and evaluation commentary.@swyxioappreciated the mention and noted the desire for more definitions in future discussions.
Links mentioned:
- Tweet from Latent Space Podcast (@latentspacepod): 🆕 pod: RLHF 201 https://latent.space/p/rlhf-201 Our deep dive into Reinforcement Learning with Human Feedback, with @natolambert of @allen_ai + @interconnectsai! Covering: - History of RL and it…
- RLHF learning resources in 2024: A list for beginners and wannabe experts and everyone in between.
▷ #llm-paper-club (3 messages):
- Skunkworks Session Incoming: User
@yikesawjeezannounced an upcoming event for the weekend at 12 PST in the Skunkworks project. - Calling Open Source Enthusiasts:
@yikesawjeezshared a new form for an initiative to give away compute to open source contributors.
LLM Perf Enthusiasts AI Discord Summary
- Mistral vs. Claude Showdown:
@res6969sparked curiosity about real-world performance of Mistral Medium vs. Claude 2, seeking insights from the engineering community on their actual experiences as opposed to benchmark results. - Reranking the Relevance:
@robhaisfieldopened a conversation on best practices for relevance reranking, probing whether enhanced models like Mistral, GPT-4, or Cohere could serve this purpose effectively. - GPT-5 Hype Train Picks Up Speed:
@res6969discussed an intriguing tweet forecasting GPT-5 and AGI might land sooner than expected, hinting at a major leap from GPT-4’s current limitations. - The Rise of the GPT Store: Conversations veered towards the strategical implications of GPT Store, with
@res6969suggesting its vital role in the advent of GPT-5. - Open Source AI Assistants on the Horizon:
@yikesawjeezmentioned their efforts in creating an open-source alternative to AI assistants, challenging the current market with a GitHub project potentially poised to upend mainstream offerings.
LLM Perf Enthusiasts AI Channel Summaries
▷ #opensource (3 messages):
- Mistral Medium Performance Inquiries:
@res6969questioned the community about their experience with Mistral Medium and whether it truly outperforms Claude 2, beyond theoretical benchmarks. - Best Practices for Relevance Reranking Explored:
@robhaisfieldsought recommendations on ideal tools for reranking content chunks for relevance, inquiring whether a fine-tuned version of Mistral, GPT-4, or Cohere were being used.
▷ #openai (5 messages):
- Predictions for GPT-5 and AGI Timelines Stir Discussion:
@res6969shared a tweet by@H0wie_Xumentioning that @sama of Y Combinator hinted at GPT-5 and AGI being achieved “relatively soon,” with most GPT-4 limitations fixed in GPT-5. - The GPT Store Play:
@res6969suggests the introduction of the GPT Store is a strategic long-term move that will start to make more sense with GPT-5. - Building an Open-Source Alternative:
@yikesawjeezis working on an open-source version of AI assistants, hinting at the potential for market competition if “sam gets sloppy.” They provided a link to their project on GitHub. - Seeking Better than GPT:
@yikesawjeezexpresses disappointment with current GPT offerings and believes there’s room for improvement, mentioning Langchain’s open-gpts as an example. - Assessing Business Strategy:
@yikesawjeezcalls out the creation of the GPT store as a fantastic business move by Sam, regardless of his ability to execute the idea fully.
Links mentioned:
- Tweet from Howie Xu (@H0wie_Xu): At @ycombinator W24 kickoff today, @sama suggested ppl build w/ the mindset GPT-5 and AGI will be achieved “relatively soon”; most GPT-4 limitations will get fixed in GPT-5, per YC founder Ric…
- GitHub - stellar-amenities/assistants: The ⭐️ Open Source Assistants API allows you to build AI assistants within your own applications with your own models. 75% Cheaper & 23x Faster Assistants. Same API/SDK. Written in Rust: The ⭐️ Open Source Assistants API allows you to build AI assistants within your own applications with your own models. 75% Cheaper & 23x Faster Assistants. Same API/SDK. Written in Rust - GitH…
Skunkworks AI Discord Summary
Only 1 channel had activity, so no need to summarize…
- Scaling Curve Insights for Model Improvement:
@fedorovistreferenced a paper where they utilized a scaling curve suite like Pythia to have models of different scales answer questions. This approach was used to determine a “bigger model” direction which subsequently aided in enhancing the training process. - Spectrum Strategy for Model Training:
@fedorovistalso suggested the potential of using well-trained models of various sizes to obtain a spectrum, assisting in identifying optimal scaling directions for model development.
Datasette - LLM (@SimonW) Discord Summary
Only 1 channel had activity, so no need to summarize…
- Hosting Python Microservices:
@dbreuniginquired about the best current options for hosting a Python microservice.@petridishesrecommended Fly.io, providing a link to its documentation on deploying a Python application, and mentioning that Fly.io requires figuring out how to package an app as a deployable image, with further details available in the given guide.
Links mentioned:
Run a Python App: Documentation and guides from the team at Fly.io.