No big news or releases today. Perplexity is rumored to be the latest AI unicorn, Yi Tayās post on the hard parts of training LLMs outside Google got picked up on Twitter and HN, and we released Soumithās episode on the Latent Space pod.
Table of Contents
[TOC]
PART X: AI Twitter Recap
only one Claude Opus run today as we are currently retooling our pipelines for more functionality and we didnt get it viable in time for today. Sorry!
Anthropic Claude 3 Release
- Anthropic released Claude 3, replacing Claude 2.1 as the default model on Perplexity AI. Pro users get 5 daily queries on the most capable Claude 3 Opus model (surpassing GPT-4) and remaining queries on the faster Claude 3 Sonnet model.
- There is debate on whether Claude 3ās impressive performance is due to emergent properties or pattern matching on human preference data used in training.
- LangChain and LlamaIndex have added support for Claude 3, enabling multimodal and tool-augmented applications.
AI Progress and Limitations
- Current language models are still limited in out-of-distribution reasoning despite impressive performance. Models that can reason, run experiments, and seek truth more akin to scientists are needed for superhuman insights.
- There are concerns that focusing on model scaling distracts from core issues in robustness and reliability. Careful testing and understanding limitations remains crucial.
- Ideogram 1.0 shows progress in instruction-following for image generation compared to Midjourney and DALL-E.
Enterprise AI Adoption
- Cohere is partnering with Accenture to bring their enterprise search capabilities to Accentureās clients, aiming to drive productivity gains.
- Mistral AI and Snowflake are collaborating to make Mistralās LLMs available through Snowflake, enabling enterprises to build AI apps within the security of Snowflakeās platform.
- Deepspeed innovations are coming to Together AI Research to accelerate cloud infrastructure for generative AI.
Open Source Datasets and Models
- Hugging Face and the European Space Agency released a massive earth observation dataset to democratize earth observation model development.
- Google open sourced Gemma 2B, which can run natively on smartphones thanks to the MLC-LLM project for efficient model deployment.
- GPT4All added model discoverability features to easily find and run compatible open models.
Memes and Humor
- āIf hotels just added basic kitchens Airbnb would be out of businessā
- āInvest in whatever the EU wants to banā
In summary, the AI community is abuzz with the release of powerful new models like Anthropicās Claude 3, while also grappling with the limitations and robustness challenges of current approaches. Enterprises are rapidly adopting AI technologies through partnerships with leading AI and cloud vendors. Meanwhile, open source datasets and models continue to grow and democratize access to cutting-edge AI. Throughout it all, humor and memes provide levity and social commentary on the fast-moving AI landscape.
PART 0: Summary of Summaries of Summaries
Operator notes:Ā Prompt we use for Claude, andĀ our summarizer GPTĀ used for ChatGPT. What is shown is subjective best of 3 runs each.
Claude 3 Sonnet (14B?)
-
Exploring AI Model Capabilities and Comparisons:
- Claude 3 is generating excitement for its reported superior performance across various cognitive tasks, surpassing GPT-4 according to some users. Discussions revolve around its capabilities in coding, function calling, and self-moderation in group chats, as showcased in a Twitter story.
- Opus, a model variant, is praised for its coding prowess, particularly in function calling. It achieved an impressive 800 score on the SAT Reading section, sparking conversations about avoiding memorization in large models.
- Skepticism arises regarding the reliability of published benchmarks in capturing the full potential of newer models like GPT-4.
-
Advancements in Multimodal and Retrieval-Augmented Models:
- The release of Stable Diffusion 3 and its fusion of diffusion and transformer models is discussed, highlighting progress in multimodal approaches.
- An arXiv paper suggests retrieval-augmented language models could be a promising alternative to parametric LMs, though research in this area is still developing.
- The introduction of InfiMM-HD by
@_akhaliq
claims significant advancements in high-resolution multimodal understanding, potentially outperforming CogVLM and leveraging Vicuna 13B. (Tweet)
-
Techniques for Efficient Model Serving and Inference:
- A Fireworks AI blog post discusses FireAttention, a quantization method for serving open-source models up to 4x faster than vLLM with minimal trade-offs.
- The Aphrodite Engine by PygmalionAI is humorously attributed to the āWaifu-Driven Performance Theory,ā showcasing community-driven research efforts for performance gains.
- Discussions explore speculative decoding on GPUs to improve performance when memory is the bottleneck, and the inefficiency of generic masking in compute, leading to a PyTorch pull request for sliding window attention bias.
-
Advancements in Hardware and Quantization:
- Details emerge about the NVIDIA H100 GPU, with its L2 cache boasting a 5.5 TB/s read bandwidth, and speculations that its overall bandwidth could match the impressive 40TB/s L1 bandwidth of the RTX 4090.
- The bitsandbytes package is recommended for k-bit quantization in PyTorch, enabling low-precision linear algebra operations on GPUs with potential 5700x speedup in int8 versus bf16 matrix multiplication.
Claude 3 Opus (8x220B?)
-
Mistral Finetuning Challenges and Successes: Users like
@hammer_mt
struggled with Mistral finetuning on mlx, facing issues convertinglora_fused_model
tofp16.gguf
as detailed in a GitHub issue.@mrdragonfox
advised that MoE tuning is fundamentally difficult and recommended a fine-tuning tutorial for Mistral 7b, favoring full model over LoRA. Discussions also covered dataset sizes for chatting capabilities and style transfer. -
Claude 3 Sparks Excitement and Debate: Claude 3 Opus garnered praise for its performance and abilities compared to GPT-4, with a focus on coding tasks. However, its claims of consciousness and fear of death sparked a debate on AI sentience, with a video shared to counter these as genuine signs. Claude 3ās self-moderation in group chats, as showcased in an OpenRouterAI Twitter story, also drew attention.
-
Exploring Positional Embeddings and New Techniques: The Eleuther community discussed the efficiency of T5 simplified positional embeddings versus sinusoidal methods and ALiBi. A new paper on Resonance RoPE for improving long sequence performance in LLMs was highlighted. Separately, the potential of retrieval-augmented language models as an alternative to parametric LMs was explored, referencing an arxiv paper.
-
Hugging Face Updates and Community Contributions: Starcoder2 and The Stack v2 were released by
@BigCodeProject
for coding assistance (Twitter announcement). The Major TOM Core earth observation dataset was open-sourced in collaboration with the European Space Agency (Hugging Face dataset). GPU instances for Spaces were optimized with A100 and H100s support. The community also contributed walkthroughs, courses, and cookbooks for working withš¤
tools and building AI applications, as shared on Twitter and the Hugging Face Learning Platform.
ChatGPT (GPT4T)
Claude 3's Enhanced Capabilities and Market Position: Discussions across platforms illuminate Claude 3's remarkable coding prowess and medical knowledge depth, with it achieving a perfect SAT Reading score and comparisons favoring it over GPT-4 in aspects of intelligence and personality. Its introduction to Pro users, notably with a daily query limit on Claude 3 Opus before transitioning to Claude 3 Sonnet, underlines Perplexity AI's strategic positioning against competitors. Notably, a partnership offering one year of free Perplexity Pro membership with Nothing's Phone (2a) purchase exemplifies marketing ingenuity (Nothing Perplexity).
Mistral Community's Technical and Commercial Scrutiny: The Mistral community critically evaluates the platform's open model commitment and pricing structure, comparing unfavorably with OpenAI's GPT-4 Turbo due to Mistral Large models' 20% higher cost. Technical discussions revolve around optimal token lengths for Mistral models, finetuning challenges, and hardware requirements, notably the correction that an RTX 4090, not the 3090, provides 24 GB VRAM, essential for modeling considerations. The community also explores tools like Augmentoolkit for dataset conversion and finetuning strategies, with resources cited including a finetuning guide and an issue on GitHub detailing a finetuning challenge.
Advancements and Discussions in AI Hardware and Quantization: The CUDA Mode community is actively engaged in discussions on NVIDIA's hardware capabilities, such as the RTX 4090's impressive L1 bandwidth of 40TB/s and the H100's 5.5 TB/s read bandwidth. They are exploring quantization techniques for enhancing PyTorch performance, with the bitsandbytes package being highlighted for its potential to significantly speed up matrix multiplication. These technical exchanges underscore the continual search for optimizations and efficiency improvements in AI modeling and hardware utilization.
Hugging Face's Continuous Innovation and Community Engagement: Hugging Face remains at the forefront of AI development with the introduction of Starcoder2 and The Stack v2, improvements in GPU support for Spaces, and the unveiling of Major TOM Core in collaboration with the European Space Agency. Community engagement is evident through discussions on Zephyr 7B Gemma's capabilities, the anticipation around the Yi-9B model, and advancements in neural TTS systems. The platform's initiative to enhance learning and development through AI Cookbook and courses underscores its commitment to fostering a knowledgeable and skilled AI community.
PART 1: High level Discord summaries
TheBloke Discord Summary
-
Smart AI for Smarter Homes:
@v.jerryyyy
is exploring the development of a smarthome system with an AI voice assistant and inquired about integrating AI with JavaScript versus Python. The community suggested model quantization like 4bpw EXL2 for running unquantized Mistral on a 3070 Ti laptop. -
OpenAIās Closed Door Policies?: Concerns were raised by
@mikeygm
regarding OpenAIās founding principles, particularly regarding openness, after reading a blog on the Musk lawsuit. This spurned discussions on corporate marketing strategies and transparency. -
Googleās Gaffe Gets a Gloss Over:
@theyruinedelise
and@coffeevampir3
discussed fixes for Googleās Gemma model by Unsloth AI, highlighting the many bugs addressed and spawning speculative talks about Googleās commitment to model troubleshooting. -
Voice Activation and Interfaces Unpacked: Users delved into different UI interfaces like Oobabooga, ExUI, and LM Studio for local AI model use; meanwhile, the setup of voice-activated AI systems with omnidirectional microphones for improved performance was also a topic of interest.
-
Model Behavior Unveils Character Secrets:
@mr.dogbert
sought advice on configuring an LLM to mimic a cartoon character using character cards, with the community contributing strategies and recommendations on using GUI tools like oobabooga tgw for prompt construction. -
Model Legalities and Economics Explored:
@reinman_
and@mrdragonfox
shared experiences and concerns about hosting the miquliz model and its legal implications, alongside queries about budget-friendly hosting for large model APIs. -
System Prompts and Mistral Mechanics Mapped: Confusion about system prompts in the context of character cards was clarified by discussing different prompt assemblies across various models and offering guidance to new LLM users on grasping model internals through plotting with GU tools.
-
Pursue Professionalism in AI Interviews:
@_jaycie
engaged the community for advice on interviewing for AI roles, with@dirtytigerx
advising to tailor preparation for specific roles like āLLM Engineerā or āML Engineer.ā Misconceptions about MBSE, which stands for model-based systems engineering, were clarified, suggesting in-depth study for roles demanding professional experience.
Mistral Discord Summary
Augmentoolkit Gains Traction: Engineers discussed a tool called Augmentoolkit, which enables datasets to be converted for instruct-tuning, vital for those considering switching from factual corpus data to multiturn interactions.
Mistral Model Token Boundaries and Hardware Talk: A debate unfolded over the ideal token length for Mistral models, with the sweet spot reported to be between 8k-10k tokens. Separately, a correction was made regarding VRAM requirements, stating that the RTX 4090, not the 3090, carries 24 GB VRAM, a crucial distinction for modelers considering hardware purchases.
Mistral Finetuning Frustrations and Fixes: Users shared struggle stories and success strategies around finetuning Mistral models, with one user encountering challenges in converting lora_fused_model
to fp16.gguf
as discussed in this GitHub issue. Some advocated that finetuning Mistral 7B may be more efficiently done full-model rather than via LoRA, as advised in this guide, a potential blueprint for those trekking through the finetuning forest.
Community Questioning Mistralās Commitment and Pricing: The Mistral community voiced concerns over the platformās commitment to open models and the pricing structure, especially in comparison to OpenAIās GPT-4 Turbo and the 20% higher cost of Mistral Large models.
Model Properties, Downloads, and Legal Provisos in Focus: The currently available models for download are Mistral 7B and 8x7b, with larger models to be announced. Meanwhile, dialogue on the legal implications of using AI models without clear licensing brought up potential risks, with suggestions concerning hidden watermarks as identifiers for illicit use.
Technical Tripping Points in Mistral Usage: From API error handling related to assigning null
to max_tokens
in the JSON body, to the challenges with JSON table parsing in API calls and setting up webhooks, engineers exchanged both issues and solutions. Moreover, the accuracy of responses, especially in multilingual contexts and mathematical calculations, raised concerns about variability and prompted discussions on improving reliability.
Perplexity AI Discord Summary
-
Claude 3 Ascends to the Pro Stage: Perplexity AI announced that Claude 3 is now available to Pro users, with a daily limit of 5 queries using Claude 3 Opus, and subsequent queries leveraging the equally capable but faster Claude 3 Sonnet, drawing comparisons with GPT-4ās performance.
-
Sweeten the Deal: Phone Purchase Rewards with Pro Access: A new partnership offers up to one year of free Perplexity Pro membership (a $200 value) to customers who purchase Nothingās Phone (2a) between March 5-19. Redemption involves following instructions received via email and must be activated by April 30, as detailed on Nothing Perplexity.
-
AI Consciousness Draws Engaged Discussion: Members like
@codelicious
and@deicoon
extensively debated the potential of AI consciousness and methods for circumventing daily use limits of Claude 3 Opus. A prevailing view is that AI model scaling may transcend human prowess, and Continuous Learning (CL) might offer a solution to the AIās learning inflexibility. -
Audio Interactions with Perplexity Not Quite There Yet: User
@oogeefaloogee
questioned Perplexity AIās capability for voice interaction, which was clarified as not yet available, prompting comparison with existing services such as OpenAIās voice functionality. -
Curtailing Curiosities Through API Conversations: Discussion topics within the #pplx-api channel covered whether quota increases apply across API models and the extent of censorship in model outputs, as well as confusion regarding access to citation features and examples for API interaction. No direct answer was provided for quota carryover, but documentation was referenced here.
-
Interface Insights Shared Within the Community: Community members are actively sharing links to Perplexity AIās Claude 3 Opus-generated content on diverse topics like Ikigai, quantum mechanics, and myxobacteria, showcasing the utility and reach of the platformās AI capabilities.
Nous Research AI Discord Summary
-
OpenAI Dethroned?: Members discuss sentiment that OpenAI may no longer hold the top spot in AI, referring to the āapple testā as evidence of a shift, but specific details or sources for the test werenāt mentioned. Separately, excitement stirs around Claude 3 Opus, with users praising its capabilities and some rating it higher than GPT-4 on an unspecified test.
-
LLM Finesse and Transition: Technical conversations around large language models (LLMs) include the planned transition of Lumina-chat from a 7b Nous fine-tune (with GPT-4) to potentially Mistral or Yarn 7b, and introduction of function-calling capabilities within models like Nous-Hermes-2-Mixtral-8x7B. The InfiMM-HDās claims of advancing high-resolution multimodal understanding sparked interest, particularly in comparison to CogVLM.
-
New Models and Features Catching Eyes: The new Yi 9B modelās introduction by Hugging Face and its capabilities, along with Claude 3ās pricing strategy, dominate discussions. Speculation about an open-source version of Claude 3 emerged, pointing towards interest in understanding the components contributing to its performance.
-
Technical Glitches and Development Advice: Practical advice is shared for issues such as using the Capybara-34b model with a chat template, dealing with the striped-hyena nous tokenizerās default to sentencepiece, and the complex topic of training LLMs on length awareness. The potential versatile applications of models like GENIE and JEPA were also discussed, beyond their current popular usage.
-
Obsidian Projectās Mixed Reception: Within Project Obsidian, user feedback mentions the technology is āpretty fast and good for most things,ā acknowledging minor quirks, while another user commends its effectiveness in captioning tasks.
OpenAI Discord Summary
-
LLMs Vulnerable Without āPrepared Statementā Analog: Users in the AI-discussions channel compared current Large Language Models (LLMs) to old SQL protocols, noting their shared vulnerability due to assuming user goodwill. The similarity was drawn to the lack of a safeguard akin to SQLās prepared statements, presenting no current solution for LLM vulnerabilities.
-
Claude 3 Opus Vs. GPT-4 Faceoff: Enthusiastic discussions occurred regarding Claude 3 Opusā abilities, with users sharing positive experiences in scripting games like Python Tic Tac Toe and comparing its performance to GPT-4, citing higher intelligence and personality traits.
-
Quality of MMLU Dataset Under Fire: Criticism arose towards the MMLU dataset for AI evaluation, with users flagging issues with the dataset, such as incorrect Q&A pairs and nonsensical questions.
-
Yearning for Image Analytic Capabilities: Conversations turned towards the desire for AIs that can analyze images, a feature not currently supported by GPT-3.5. Users pointed out that Microsoft Copilot and Google Gemini might offer such functionalities.
-
GPT-4 Troubles Spotted Across the Board: Across various channels, users reported issues with GPT-4 such as a persistent āSaving GPTs Errorā, declining performance, API outages affecting user experience, and debate over its internet searching capabilities. The impact of this was a shared anticipation for the potential advancements GPT-5 might provide.
-
Prompt Engineering Challenges and Innovations: Users in prompt-engineering sought advice on creating bilingual translation prompts and ways to improve customer service bot interactions. Additionally, a user shared a success story with AI in generating futuristic cityscapes from photos. Meanwhile, others expressed frustration with Custom GPTs providing defiant responses and a lack of consistency acknowledging internet search abilities.
HuggingFace Discord Summary
-
Introducing Starcoder2 & The Stack v2: @BigCodeProject announced the launch of Starcoder2 and The Stack v2, marking a significant upgrade to coding assistance tools. Details were broadcasted through a Twitter post.
-
Major Milestone in Earth Observation: Collaboration with the European Space Agency led to the unveiling of Major TOM Core, the most extensive earth observation dataset to hit the open-source community. For more information and data access, visit Major-TOM.
-
Hugging Face Level Ups: The platform has optimized GPU instances for Spaces with the addition of A100 and H100s support. Enhancements also include updated markdown syntax for model/dataset cards and blogposts, as indicated on lunarflu1ās Twitter.
-
Excitement for Zephyr 7B Gemma & Competitions: The release of Zephyr 7B Gemma and PEFT v0.9.0 brings advancements like merging LoRA weights. Also, the new multimodal leaderboard and Sailor LLMs for Southeast Asian languages are stirring the pot, while the Autonomous Grand Challenge at CVPR2024 is set to spotlight. Relevant updates and developments are discussed on various Twitter channels.
-
Learning Paths in AI: @mervenoyann crafted a walkthrough using
š¤
tools, an ML for Games course rolled out, and the AI Cookbook for building a RAG Ebook Librarian using LlamaIndex was introduced, aiming to catalyze growth in AI knowledge and application. More can be learned at Learning Platform. -
ASCII Jailbreak Reveals LLM Flaws: ASCII art-based jailbreaks are compromising state-of-the-art LLMs as detailed in a research paper, a reminder that even sophisticated models can be blindsided by creativity.
-
Karpathy Discusses LLM Training Trials: @karpathyās Twitter thread reveals the complex and biological nature of training LLMs, from maintenance to unpredictable resource needs.
-
OpenMP Pragmas Through OMPGPT: A specific need in high-performance computing has led to the creation of OMPGPT for OpenMP pragmas, separating itself from general code-based LLMs. Study the full paper on arXiv.
-
Otio.ai Launches With a Smile: Otio.ai, an AI research, writing, and study tool is introduced with a special discount available through app.otio.ai.
-
Open-Sora-Plan Denounces Resource Scarcity: The Open-Sora-Plan project is attempting to replicate Sora with limited resources, calling for open-source collaborators on GitHub.
-
The Fireside Chat Bot Enters the Scene: Rust programming language enthusiasts have a new interface to explore - the āFireside Chatā Bot. Catch a glimpse at YouTube and contribute via the GitHub repository.
-
Yi-9B Model Expected to Top Leaderboards: Yi-9Bās introduction to the HuggingFace space brings anticipation for its future growth and impact, with discussions of its potential on the platform.
-
TTS Systems with GPT-4-like Pause Dynamics: The community is discussing neural TTS systems that emulate GPT-4ās dynamic pausing, signaling a push towards more human-like speech generation.
-
IP-Adapter Touted for Image Prompting: Hugging Faceās IP-Adapter is presented as a revolution for image prompting in diffusion models, allowing for specific image features learning while maintaining the integrity of the base model. More details can be found in the tutorial.
-
Gradio 4.20.0 Enhances User Authentication: The recent Gradio release supports external authentication providers, alongside features facilitating a smoother user experience such as automated clean-up with
delete_cache
, user logout, and a polishedDownloadButton
component. Dive into more with Gradio DownloadButton Docs.
LlamaIndex Discord Summary
-
Join the RAPTOR Webinar for Tree-Indexing Insights: A webinar on RAPTOR will unpack the workings of a tree-structured indexing technique suited for overcoming the limitations of traditional top-k RAG methods. Engineers can register for Thursdayās session to learn about its hierarchical clustering capabilities.
-
Claude 3 Dives into Multi-modal Applications: An update to LlamaIndex.TS, version 0.1.21, adds support for Claude-3 models, showcased in a notebook example available on their GitHub repository. Meanwhile, Claude 3ās versatility is highlighted in a guide for applications like structured data extraction and multimodal tasks.
-
LlamaIndex Community Tackles Technical Issues: Parallel processing of PDFs in LlamaIndex can be boosted using
num_workers
, while integratingOllama
with LlamaIndexās Query Engine involves assigning it directly toSettings.llm
. Issues regarding the size of datasets LlamaIndex can handle primarily depend on memory availability and software versioning constraints. -
LlamaIndex Streamlines Data Extraction and RAG Pipelines: The launch of LlamaParseās JSON Mode aids in extracting structured data from PDFs with text and images, which improves the process of building RAG pipelines, especially when coupled with Claude-3 Opus.
-
Supporting In-context Learning Progress: The community has been invited to support the LinC project which focuses on āEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibration.ā Interested parties can explore and star the work on GitHub.
Latent Space Discord Summary
-
The AI Intuition Behind Trial-and-Error: A discussion drew attention to the āblack magicā and āexpert intuitionā involved in the AI development process, including the empirical, trial-and-error methods often detailed in research papers. The fast-evolving nature of the AI field was noted, highlighting how quickly resources and knowledge can become outdated.
-
Claude 3 Sparks Sentience Debate: The AI assistant Claude 3 has contributed to a debate on AI consciousness, with claims that it fears death, though counterpoints cite videos to debunk these as genuine signs of sentience. The capabilities of Claude 3 to dispatch instances of itself and assign tasks were also noted, spurring discussions on autonomy and its comparison to GPT-4.
-
Advancing AI with Stable Diffusion 3 and Quantization: The advancements of Stable Diffusion 3 were a notable topic, with community contributions complementing the official material for clarity. A blog post from Fireworks AI on faster model serving with quantization, FireAttention, was recommended, promising substantial improvements in performance with minimal trade-offs.
-
Humorous Take on AI Research Motivation: The āWaifu-Driven Performance Theoryā humorously attributed a spike in dedication to AI coding to community-driven research efforts. The Aphrodite Engine by PygmalionAI was cited as an example of performance advances emerging from such research.
-
Eager Dip Into Model Serving Literature: Interest was high around the model serving paper presentation, with discussions on speculative decoding using GPU cycles for improved performance and the efficiency of various hardware configurations. A survey paper on model serving was highlighted, sparking valuable technical dialogues on distributed model serving and collaborative fine-tuning techniques. Links to relevant technical materials such as the FireAttention blog post, tools for better LLM data curation, and optimizations for sampling parameters were shared for further exploration.
Links mentioned:
- Model Serving Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
- FireAttentionāāāServing Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
- Aphrodite Engine by PygmalionAI
Eleuther Discord Summary
-
RoPE-ing in the Long Sequences: Discussions around position embeddings have surfaced, comparing T5 positional embeddings and ALiBi. A new paper released on Resonance RoPE aims to tackle long sequence performance in Large Language Models (LLMs), which could be particularly relevant to those looking into improving such aspects (Resonance RoPE Paper).
-
The Great Compute Debate: Conversations on whether increased compute power is essential for achieving AGI were sparked by an OpenAI blog post, revealing a split in perspectives among engineers on this strategic direction in AI development.
-
Unearthing the Intricacies of RWKV: Complexity and understandability of transformer diagrams sparked debates about learning resources, with a suggestion that code might be more comprehensible for newcomers. This prompted a sharing of the GitHub link to RWKV v6 demo, hopefully proving resourceful to those wrestling with the nuances of transformer models.
-
Melding Models and Methods: The Stable Diffusion 3 paper has stirred up talk around model mixing, specifically the fusion of diffusion and transformer models. Keen individuals interested in this multimodal approach can dive into the Stable Diffusion 3 Paper to explore the discussed methodologies.
-
GPT-Neox: A Call for Collaboration: GPT-Neox developers are seeking contributions particularly for fused triton kernels and Tensor Expressions (TE), indicating a current focus on integrating basic TE support. They are also welcoming assistance with debugging on H100 GPUs and tackling memory optimization issues, as chronicled in a GitHub issue discussing memory peaks. Those interested in contributing can reference the open GitHub issues on GPT-Neox for more details.
LM Studio Discord Summary
-
Image Generation Imaginary in LM Studio: LM Studio cannot generate images via models like
llava-v1.5-7b-Q4_K.gguf
. While models can analyze images fed to them, LM Studioās capabilities do not include creating new images from scratch. -
LM Studioās Offline Nature: The LM Studio chatbot cannot access the internet directly, meaning real-time information retrieval, like fetching the current time, is off the table. However, thereās a mention of LoLLMs, which can connect LM Studio in server mode to the internet.
-
Token Limit and LM Studioās Output: When working with LM Studio, the context window affects the input, not the output. Surpassing the token limit during generation can be managed by adjusting the
n_predict
setting to control output tokens. -
Hardware Enthusiasts Talk LM Studio Models: Enthusiasts discuss their experiences with different models and hardware setups, suggesting that running
Nous Hermes 2 Solar 10 34b q5 k m
on a 4090 yields positive results, but even 64GB RAM struggles to run Smaug 34B with 200k context. -
Syntax and Scripting Tips for LM Studio: The proper use of
default_system_message
in LM Studio can be environment-specific and challenging across systems like Linux, Windows 10, and WSL. Itās advised to run LM Studio in verbose mode to observe prompts history for better input understanding.
LAION Discord Summary
-
Triple Threat or One Too Many?: In a debate around text encoders,
@top_walk_town
suggests that combining three text encoders could be excessive, with a note that T5 could be eliminated at inference time; no consensus was mentioned. -
Advanced Sampling Methods Picking Up Speed: A mention by
@pseudoterminalx
of a technique that assigns more weight to intermediate timesteps when training velocity (vĪ) hinted at its competitiveness with rectified flows, though specifics were not provided. -
Distilling Googleās Knowledge:
@pseudoterminalx
also shared a repository detailing Googleās method for model distillation, though it remained unclear if it pertains to T5-XXL or another model. -
Under the Hood of Diffusion Models: A conversation led by
@astropulse
,@nodja
, and@pseudoterminalx
debated T5ās necessity in diffusion models with the potential exploration of alternatives and practical issues, but details on conclusions were not provided. -
When Less Is More: Mention of the GitHub project res-adapter, noted by
@astropulse
, sparked interest due to its promise for low resolution adaptation, capable of scaling SD1.5 down to 16x16 latents. -
Blog Post Dive into Augmented Generation: A blog post by
@ariondas
critically examines Standard RAG techniques and introduces CRAG (Corrective Retrieval Augmented Generation) as a potential advancement in the field.
OpenAccess AI Collective (axolotl) Discord Summary
Mix and Merge: Model Integration Techniques Explored:
- Engineers are exploring various model merging techniques with a focus on
MergeKit
, LoRA+,DoRA
, andLoftQ
. There is a discourse about how these techniques might enhance existing LLMs, with links to a MergeKit repository and a discussion around the implementation and effects on learning rates.
Claude-3 Ethical Safeguards Scrutiny:
- Claude-3ās response to sensitive topics, particularly race, is triggering debate on striking a balance between ethics and biases in model development, with no specific resources linked, the subject is noted as challenging for AI practitioners.
A Gearheadās Guide to AI Hardware:
- Technical discussions on AI inference hardware point to the usability of a mining motherboard supporting multiple GPUs and the pertinence of NVLink as compared to PCIe slots, highlighting an AliExpress listing.
Fine-Tuning Deep Dive and Data Enrichment Strategies:
- A contributed Medium article on enriching datasets for better reasoning was shared. The community is also exchanging deepspeed config tips for finetuning models and addressing memory issues, with references to HuggingFaceās functionalities and a deepspeed config file.
Towards Better Model Parameter Efficiency:
- Developers are discussing the benefits of LoRA+ ratios and DoRAās performance, with references to a comprehensive article on the subject and associated GitHub commits 0cfdb2c. Issues with
LoftQ
andPEFT
deployment are noted, alongside an ongoing PR for quantized DoRA updates.
OpenRouter (Alex Atallah) Discord Summary
- Claude 3 Self-Moderates Group Chats: Claude 3ās capability to self-moderate group chats has been highlighted by
@alexatallah
, with an illustrative Twitter story shared among users. - Clarification on Claude Versioning: The difference between
anthropic/claude-2.0
andanthropic/claude-2
was clarified, stating thatClaude-2
will automatically opt for the latest2.x
version. - Multithreading Cost Concerns with Gemma and Openchat: Users expressed concerns about cost predictions not aligning with actual figures when using multithreading with
gemma 7b
andopenchat 3.5
, prompting a discussion on the issue and attempts to diagnose the problem. - Mixed Reactions to Claude 3ās Conversational Management: A debate emerged surrounding Claude 3ās approach to conversation, with some users uncomfortable with potential over-censorship, while others were in favor of its moderation abilities.
- Integration Challenges and Developments with OpenRouter: Issues using LangChain.js with
OpenRouter
for text completions led to discussions about hardcoded endpoints and legacy status, alongside talks of developing a VSCode extension that integrates with OpenRouter. Active GitHub projects and alternative solutions were shared, including Tabby, Configuration | Continue, and repositories such as ChatGPT_DAN and Continue for VS Code and JetBrains.
LangChain AI Discord Summary
-
LangChain Function Integration Discussion: LangChain Core Example provided a guide on how to use LangChain and OpenAIās
ChatCompletion.create()
to integrate function roles into messages, following an inquiry by@vishal5795
. -
Partner Up for Paid Tech Gig:
@mattew_999
is on the lookout for a technically inclined collaborator for a paid project, no further details on the partnership offered. -
Chain Partners Wanted, Issues Reported: Queries about new partnerships with LangChain sparked conversations, while
@rajib2189
reported intermittent 502 errors on FastAPI hosted on AWS and served through an Apache server with Uvicorn. -
GPT-4 Fine-Tuning Interest Surfaces: One member,
@8886600
, expressed interest in obtaining access to GPT-4 fine-tuning capabilities and showed a willingness to purchase an API key with usage limitations. -
Search for Humor in AI Art: Through innovative image modification,
@neil6430
successfully incorporated humor into AI-generated art using a new control net block from ML Blocks, sharing their findings in the share-your-work channel. -
Innovation in Automation and Long Context AI: User
@polarbear007.
unveiled Lutra.ai which interprets English instructions and converts them into code for app-based workflows while@andysingal
delved into building Long Context RAG with RAPTOR, detailed in a Medium post. -
ChromaDB and LM Studio Integration: ChromaDB Plugin for LM Studio released, facilitating vector database creation as per the GitHub link shared by
@vic49.
. -
Streaming Stumbles on Caching Issues:
@veryboldbagel
notated a current limitation within langchain-coreācaching fails to operate properly in streaming mode, affecting cachable contentās performance. -
Tutorial Tease with Zero Context: Only a YouTube link was dropped by
pradeep1148
in the tutorials channel: Tutorial Video, without any accompanying explanation or context.
CUDA MODE Discord Summary
- Root Squashed at RunPod: Discussions revealed that RunPod provides a docker image which means root access wonāt actually grant the full permissions typically associated with a VM root.
- Bandwidth Performance in NVIDIAās Latest: The SRAM bandwidth of the NVIDIA H100 was compared to the A100ās 19TB/s, with the H100ās L2 cache having a 5.5 TB/s read bandwidth. The RTX 4090ās L1 bandwidth is positioned as a potential performance comparator, boasting an impressive 40TB/s.
- PyTorch Community Sparks Cooperation and Quantization Speed: Engagements in the Torch community highlight the importance of setting
TensorOptions
correctly and promote a friendly debugging environment. Additionally, the bitsandbytes package was recommended for k-bit quantization in PyTorch, with an enthusiastic note about a significant 5700x speedup in int8 versus bf16 matrix multiplication. - Optimization via Algorithms: The inefficiency of generic masking in compute was addressed, with the suggestion to fuse constraints into the flash_attention algorithm via the score-mod API for improved efficiency. A relevant pull request for sliding window attention bias was noted for PyTorchās GitHub.
- CUDA Learning Path: Newcomers to CUDA programming were directed to Jeremyās videos in Lecture 3 and 5 for digging into
numba.cuda.jit
. - Ring the Alarm on Ring Attention: Issues and progress concerning ring-attention were detailed, discussing device testing with scripts, a first attempt at sampling code despite parameter errors, and memory usage benchmarks for the striped and zigzag. Also noted was the public sharing of the OpenAccess-AI-Collectiveās Axolotl GitHub repository.
LLM Perf Enthusiasts AI Discord Summary
-
Opus Shows Promise in Coding: Opus is garnering attention for its coding capabilities, with users like
@pantsforbirds
initiating discussions on its potential, specifically highlighting function calling. -
GPT-4 Stands Out in Medical Expertise:
@thebaghdaddy
observed that GPT-4 surpasses its predecessors in medical and biological knowledge, but also questioned the reliability of published benchmarks, hinting they may not capture the full scope of newer modelsā abilities. -
Perfect Score for Opus on SAT Reading:
@jeffreyw128
pointed out Opus scoring an 800 on the SAT Reading section, raising conversation on the importance of creating holdouts to prevent memorization by large models. The performance was highlighted through a Twitter post. -
Exploring Citation Formatting with RAG:
@mat_mto
sought advice on formatting citations in RAG-generated outputs that refer to web search results, sparking interests in improving clear source attribution. -
JSON Output for RAG Source Clarity:
@res6969
shared a method of using function calling for RAG output that provides a JSON object entailing text paired with its web sources, aiming for transparency in information provenance.
Datasette - LLM (@SimonW) Discord Summary
-
Distinguishing Prompt Chaos:
@simonw
clarified the difference between prompt injection and jailbreaking, where prompt injection entangles untrusted user inputs with developer prompts and jailbreaking seeks to bypass an LLMās safety filters. Details are further elaborated in Simon Willisonās blog post. -
AIās Cybersecurity Front:
@tariqali
spotlighted a Microsoft report on bad actors utilizing OpenAIās LLMs for cyber tasks such as reconnaissance and spear phishing to probe the model for malicious purposes. -
Proactive Against AI Threats: The complex issue of dual uses of LLMs in creating biological threats was discussed, referencing OpenAIās research into early warning systems and a study contrasting problem-solving with the Internet alone versus with GPT-4, found here.
-
Gatekeeping the AI Knowledge: Following the risks associated with LLMs,
@tariqali
proposed that access to LLMs should be restricted, including potentially implementing human review processes to filter out harmful inputs before they can manipulate the AI model. -
The Invisible Injection Issue: Highlighting a specific concern,
@simonw
noted the challenge of preventing invisible prompt injections in images, which poses a threat to multi-modal versions of GPT-4, like GPT-4-V, discussed in Simon Willisonās blog post. -
Model File Placement Debate:
@florents_
sought community input on the agreed file locations for model files, questioning whether there was standardization around places such as$(pwd)/.models
or$HOME/models
, but no consensus or follow-up discussion was provided.
DiscoResearch Discord Summary
- Cutting-Edge Chatbot Environments Showcased:
@crispstrobe
identified chat.lmsys.org as a platform for testing chatbots with the understanding that inputs may be used in future training, and mentioned poe.com for its hosting of models and a perplexity analysis feature. - In Search of German Excellence:
@le_mess
sparked discussion on premier German language models, with recommendations encompassing Claude Opus, gpt-4, discolm-120b, and VAGOsolutions/Sauerkraut LM-UNA-SOLAR-Instruct, while@johannhartmann
and@flozi00
spoke highly of DiscoResearch/DiscoLM_German_7b_v1 and Nous Hermes 2 Mixtral 8x7b. - Retrieval-Augmented Models Paving the Future?:
@maxidl
shared an arxiv paper that presents retrieval-augmented language models as a promising alternative to conventional parametric LMs, although this area of research requires further development. - Hermes and Mixtral Garner Accolades:
@cybertimon
suggested using Nous Hermes 2 Mixtral 8x7b for tasks involving the German language, citing its language proficiency. - High-Performing German Models in Spotlight:
@johannhartmann
and@flozi00
discussed quality German models, with both advocating for Nous Hermes 2 Mixtral 8x7b due to its accuracy in handling the German language.
Interconnects (Nathan Lambert) Discord Summary
-
Intelās Struggles in the Spotlight: @natolambert shared a YouTube video titled āIntelās Humblingā by Stratechery, which discusses the recent challenges faced by Intel and complements the video with an in-depth article.
-
AI: The Great Unknown: @natolambert highlighted an article by Elad Gil that delves into the complexities of generative AI, presenting a list of open-ended questions to encourage further discussion and exploration in the AI field.
PART 2: Detailed by-Channel summaries and links
TheBloke ā· #general (967 messagesš„š„š„):
- Exploring AI for Smarthome Systems: User `@v.jerryyyy` expressed interest in developing a smarthome system with an AI voice assistant customized with system prompts, and queried about using JavaScript versus Python for AI integration.
- Choosing the Right Quantized Model: `@v.jerryyyy` attempted to run unquantized Mistral on a 3070 Ti laptop, which led to a discussion on model quantizations suitable for his hardware, with suggestions like 4bpw EXL2.
- Concerns on OpenAI Founding Principles: User `@mikeygm` shared a critical perspective on OpenAI's founding intent of openness after reading an OpenAI blog post about the Musk lawsuit, which sparked a discussion on corporate marketing strategies and transparency.
- Google's Gemma Model Issues and Fixes: `@theyruinedelise` mentioned fixes for the Gemma model and improvements made by Unsloth AI, and `@coffeevampir3` commented on the numerous bugs fixed, initiating a speculative conversation about Google's investment in model troubleshooting.
- UI Interfaces and Voice Activation Development: Users discussed different UI interfaces, such as Oobabooga, ExUI, and LM Studio, for local AI model usage and the intricacies of setting up voice-activated AI systems paired with omnidirectional microphones for better performance and audio processing.
Links mentioned:
- š¾ LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs
- Convert to Safetensors - a Hugging Face Space by safetensors: no description found
- OpenAI and Elon Musk: We are dedicated to the OpenAI mission and have pursued it every step of the way.
- gguf (GGUF): no description found
- CausalLM/34b-beta Ā· Hugging Face: no description found
- Unsloth Fixing Gemma bugs: Unsloth fixing Googleās open-source language model Gemma.
- Video generation models as world simulators: We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ā¦
- mistralai/Mistral-7B-Instruct-v0.2 Ā· Hugging Face: no description found
- GOODY-2 | The worldās most responsible AI model: Introducing a new AI model with next-gen ethical alignment. Chat now.
- Recurrent Neural Networks With Limited Numerical Precision: Recurrent Neural Networks (RNNs) produce state-of-art performance on many machine learning tasks but their demand on resources in terms of memory and computational power are often high. Therefore, theā¦
- TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF Ā· Hugging Face: no description found
- BASED: Simple linear attention language models balance the recall-throughput tradeoff: no description found
- ZeroBin.net: no description found
- You Need to Pay Better Attention: We introduce three new attention mechanisms that outperform standard multi-head attention in terms of efficiency and learning capabilities, thereby improving the performance and broader deployability ā¦
- LoneStriker/Mistral-7B-Instruct-v0.2-4.0bpw-h6-exl2-2 Ā· Hugging Face: no description found
- Blueāgreen distinction in language - Wikipedia: no description found
- exllamav2/conversion/standard_cal_data at master Ā· turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp/exllamav2
- GPU Cloud - VMs for Deep Learning | Lambda: NVIDIA H100, A100, RTX A6000, Tesla V100, and Quadro RTX 6000 GPU instances. Train the most demanding AI, ML, and Deep Learning models.
- New High-Resolution Multiplying DACs Excel at Handling AC Signals | Analog Devices: no description found
- GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project. - PKU-YuanGroup/Open-Sora-Plan
- Generative AI with Large Language Models: In Generative AI with Large Language Models (LLMs), youāll learn the fundamentals of how generative AI works, and how to deploy it in ⦠Enroll for free.
TheBloke ā· #characters-roleplay-stories (115 messagesš„š„):
-
Exploring the Depths of Model Behavior:
@mr.dogbert
sought advice on making an LLM behave like a cartoon character using character cards. Numerous community members, including@superking__
, provided detailed instructions and examples on prompt construction with character cards for various models, while emphasizing the effectiveness of using GUI tools like oobabooga tgw for such tasks. -
Model Hosting and Legal Concerns:
@reinman_
shared experiences with hosting the miquliz model, discussing its realism and comparison with other models, followed by@mrdragonfox
highlighting legal issues regarding the use of unlicensed models like miquliz. Meanwhile, users inquired about cost-effective hosting services for large model APIs. -
Mistral and System Prompts Clarified: Through a series of messages, users such as
@superking__
and@aightbits
clarified the concept of system prompts and how they relate to character cards, explaining different prompt assemblies across models. -
Guidance on Deep Diving into LLMs: Those new to LLMs like
@mr.dogbert
were given direction by@aightbits
and others on learning model internals through plotting with existing GUI tools, stepping beyond simple interfacing to grasp the underlying mechanics. -
Recommendations for LLM Learning Resources:
@aightbits
recommended the free Coursera course āGenerative AI with Large Language Models,ā while@mr.dogbert
expressed interest in using character cards as a starting point for model roleplaying, based on advice given throughout the discussion.
Links mentioned:
- llama.cpp/examples/quantize/quantize.cpp at 21b08674331e1ea1b599f17c5ca91f0ed173be31 Ā· ggerganov/llama.cpp: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.
- Troll Trolldespair GIF - Troll Trolldespair Despair - Discover & Share GIFs: Click to view the GIF
- GitHub - langwatch/langwatch: Contribute to langwatch/langwatch development by creating an account on GitHub.
- Project Atlantis - AI Sandbox: no description found
- Generative AI with Large Language Models: In Generative AI with Large Language Models (LLMs), youāll learn the fundamentals of how generative AI works, and how to deploy it in ⦠Enroll for free.
TheBloke ā· #model-merging (1 messages):
pablo.ce: https://huggingface.co/pabloce/Dolphin-2.8-slerp
TheBloke ā· #coding (8 messagesš„):
- AI Job Interview Insights Sought:
@_jaycie
inquired about what a typical interview for roles related to generative AI, machine learning, and language model engineering might involve, expressing a background in full-stack development and aspirations to move into AI and attend graduate school. - Navigating the AI Interview Landscape: In response to
@_jaycie
about interviewing for AI roles,@dirtytigerx
clarified that not all AI-related positions are the same, with an āLLM Engineerā requiring different expertise than an āML Engineer.ā They advised focusing on understanding the specific type of role, as generic preparation might not be feasible without a machine learning background. - Machine Learning vs. Model-Based System Engineering:
@_jaycie
sought clarity on preparing for a position requiring āexperience in machine learningā and āexperience with MBSE,ā while@dirtytigerx
corrected the misconception, explaining MBSE stands for model-based systems engineering, suggesting that brief studying would not suffice for roles expecting professional experience in these areas.
Mistral ā· #general (475 messagesš„š„š„):
- Augmentoolkit Shared:
@mrdragonfox
shared a link to Augmentoolkit on GitHub, a tool for converting datasets into instruct-tuning datasets, noting it supports changing from a factual corpus to multiturn. - Mistral Model Discussion: Users discussed the ideal token limits for efficiency, where
@useofusername
mentioned that 8k-10k tokens can work well and@mrdragonfox
questioned the purpose behind usersā datasets. The general consensus was to validate and clean datasets before use. - Gemma 7B License Inquiry:
@mehdi1991_
made multiple inquiries about running open-weight models and@mrdragonfox
clarified that Mistral 7B and 8x7b are open-weight and guided him to reach out to model authors regarding other models like Gemma 7B. - Hardware Requirements Dialogue: Amid discussions on hardware suitability for running large models,
@yesiamkurt
corrected assumptions about VRAM requirements, noting that 24 GB VRAM is associated with the RTX 4090, not the 3090. - Mistral API and Miscellany:
@ethux
provided a link to the Mistral chat where it can be used by users: Mistral Chat. Mistral AI seemed to be suggested due to its price-performance ratio compared to other services, but cost concerns were voiced by@clear3fram3
and@i_am_dom
. Some users discussed the efficiency of using the continue tool with large language models for coding tasks.
Links mentioned:
- Blinking Eyes White Guy GIF - Blinking Eyes White Guy Blinking Meme - Discover & Share GIFs: Click to view the GIF
- Continue: no description found
- News: Latest updates about Mistral AI
- OSI Discuss: Open Source Initiative Discuss
- Actualités: Les dernières nouvelles de Mistral AI
- augmentoolkit/prompts at master Ā· e-p-armstrong/augmentoolkit: Convert Compute And Books Into Instruct-Tuning Datasets - e-p-armstrong/augmentoolkit
- no title found: no description found
Mistral ā· #models (3 messages):
- Short and Sweet Query:
@yannn666
posed a concise question asking, āwhy ?ā - Admin Point Explained: In response,
@mrdragonfox
mentioned, ābecause āadministrative pointāā, however, the context of the discussion was not provided. - Case for On-Premises Necessity:
@mrdragonfox
also noted that āthere are a lot enterprises that needs on prem for various reasonsā, alluding to a discussion on the needs of enterprises for on-premises solutions.
Mistral ā· #deployment (2 messages):
- Inquiry About The Blokeās Discord Server: User
@api_1000
inquired why The Blokeās Twitter account has gone inactive and mentioned the Discord invite in the bio is not working anymore. They sought assistance on how to join his Discord server now. - Offering a Helping Hand:
@mrdragonfox
responded to the call for help and offered to provide an invite to The Blokeās Discord server.
Mistral ā· #finetuning (40 messagesš„):
- Stuck in MoE Finetuning Quagmire:
@hammer_mt
is grappling with Mistral finetuning on mlx, encountering conversion issues from lora_fused_model to fp16.gguf. They described their roadblock and error messages in their attempt detailed in a GitHub issue. - Mistral and MoE Donāt Play Nice:
@mrdragonfox
suggested that MoE tuning is fundamentally cumbersome, pointing out architecture complications and a general struggle in finetuning Mistral, as even well-versed practitioners are encountering barriers. - LoRA Fine-Tuning Tips Shared: In response to
@lawxls
ās query,@mrdragonfox
recommended starting with at least 20k instruction samples for LoRA fine-tuning Mistralās chatting capabilities and provided a guideline to gradually increase the dataset size for style transfer. - Pursuing the Perfect Finetune:
@mrdragonfox
also advised@lawxls
on the best finetuning practices for Mistral 7b, endorsing full model finetuning over LoRA and directing to a fine-tuning tutorial. - A Curiosity About Prompt Tuning:
@charlescearl_45005
asked about the consequences of using PEFT fine-tuning with a static system prompt, inquiring whether it would embed a āsystem promptā into the modelās behavior, but no clear answer was provided in the channel.
Links mentioned:
- A Guide to Cost-Effectively Fine-Tuning Mistral: no description found
- Ability to convert a lora_fused_model to gguf format for use in LMStudio and others Ā· Issue #540 Ā· ml-explore/mlx-examples: Recently I got a flow working where I would train a model with mlx (this is new for me) and then move over to llama.cpp to do the conversion to gguf in order to run it on LMStudio locally. However ā¦
Mistral ā· #announcements (1 messages):
sophiamyang: https://twitter.com/MistralAILabs/status/1765434559993123184
Mistral ā· #showcase (7 messages):
- Visual Cues for Bot Response Completion:
@jakobdylanc
clarifies that bot responses are in a black box (embed) which turns green when the response is complete. Embeds allow for up to 4096 characters, significantly more than regular messages. - No Need for Faux Human Delays:
@jakobdylanc
expresses a lack of interest in introducing artificial delays or ignoring messages in the chatbot since itās designed as a āLLM prompting tool,ā and suggests users can set the desired personality in the prompt instead. - Unveiling Telegram Bot Trio:
@edmund5
launches three new Telegram bots using mistral-small-latest: Christine AI for finding zen, Anna AI for joy and advice, and Pia AI for elegant conversations. - Top-p Setting Inquiry:
@kenharris.
asks the community what settings they are using for the top-p parameter in their models, sparking a discussion about best practices for sampling strategies. - Crafting Game Enhanced by Mistral:
@pradeep1148
shares a YouTube video that showcases āInfinite Craft Gameā using Mistral, highlighting the game development process and integration with AI.
Links mentioned:
- Infinite Craft Game using Mistral: Let develop Neal Agarwalās web game Infinite Craft. This is a ācrafting gameā where you start with just four elements and repeatedly combine pairs of elementā¦
- Christine AI š§āāļø: Your serene companion for mindfulness and calm, anytime, anywhere.
- Anna AI š±āāļø: Your bright and engaging friend ready to chat, learn, and play 24/7.
- Pia AI šø: Your royal confidante. Elegant conversations and wise counsel await you, 24/7.
Mistral ā· #random (35 messagesš„):
- GPT-4 Turbo vs. Standard and Mistral Pricing:
@nunodonato
mentioned that GPT-4 Turbo is cheaper than the standard GPT-4. In contrast,@mrdragonfox
highlighted that GPT-4 is still more expensive than Mistral Large by 20%. - Seeking French Speakers and Mistral for Analysis:
@ttvtama
looked for French speakers before inquiring about using Mistral IA to analyze text for a student project.@mrdragonfox
responded, explaining that while using Mistralās API comes at a cost, running Mistral 7b / 8x7b locally would be free. - Installing Mistral Locally:
@ttvtama
received guidance from@mrdragonfox
on setting up Mistral locally, suggesting starting with a Gradio web UI found on GitHub and explaining that it can run in 4bit, fitting well in 6GB of VRAM from an RTX 2060 graphics card. - Model for Local Use and Installation Tips:
@mrdragonfox
provided@ttvtama
with a Hugging Face link to Mistral 7B in 4bit for efficient local use and remarked that explanatory videos could be found on YouTube. - Inconsistencies in MMLU Dataset Questions:
@privetin
initiated a discussion about the appearance and quality of MMLU dataset questions, noting that some questions seemed nonsensical, and@mrdragonfox
commented that the dataset consists of questions with four possible answers.
Links mentioned:
- TheBloke/Mistral-7B-Instruct-v0.2-GPTQ Ā· Hugging Face: no description found
- GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.: A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models. - oobabooga/text-generation-webui
Mistral ā· #la-plateforme (33 messagesš„):
-
API Error Handling Inquiry:
@georgyturevich
faced a 500 error with an API request and@lerela
requested more details for troubleshooting, including the model and request ID. The error was later identified by@georgyturevich
as being caused by assigningnull
tomax_tokens
in the JSON body, contrary to the documentation stating that the default value isnull
. -
Mistral Webhook Query:
@weslleymistura
inquired about anyone having experience with setting up a Mistral webhook but didnāt receive further clarification or responses on the topic. -
API Hosting Location Concern:
@fangh
asked about the geographical hosting location of the API, questioning whether it is on European servers or US servers; however, no answer was provided within the captured discussions. -
JSON Table Parsing Issue:
@samseum
struggled with inserting a table in JSON format into an API call, receiving an error message.@_._pandora_._
and@lerela
offered advice on syntax, highlighting the need to escape JSON before adding it to the prompt and ensuring proper text recognition in the userās IDE. -
Correcting Chatbot Errors:
@patz3r
encountered an error with using multiplesystem
roles in a Mistral prompt, which was corrected by@sublimatorniq
clarifying that the role to be used after the first message isassistant
, notsystem
. This is in line with the guidance from@nunodonato
thatsystem
should be used only for the first message to give general instructions.
Links mentioned:
no title found: no description found
Mistral ā· #office-hour (400 messagesš„š„):
- Mistral Team Acknowledges Community Input:
@michaelwechner
raised concerns about Mistralās commitment to open models and long-term reliability. Although Mistral allows for open-weight models, the community voiced the importance of clear future expectations for planning. - Open Source vs. Business Sustainability Discussion:
@michaelwechner
also addressed the challenge of balancing open projects and commercial viability.@mrdragonfox
and others emphasized that creating AI models requires significant resources, which should be compensated to ensure continued innovation. - Fine-Tuning Challenges and Industry Evaluations: Discussion on fine-tuning larger models like Mixtral 7b was a common theme. Users like
@kalomaze
,@netrve
, and@cybertimon
expressed a need for more information and guidelines on effective fine-tuning. - Multilingual Model Performance and Bias: Users like
@_._pandora_._
noted that Mistralās larger models sometimes default to English responses, when French is expected, raising questions about training data diversity. - Next Mistral Office Hour Anticipation: As the office hour ended,
@potatooff
and others expressed their eagerness for the next session, highlighting the value of these discussions for the Mistral community.
Links mentioned:
- Becario AI asistente virtual on demand.: no description found
- Endpoints and benchmarks | Mistral AI Large Language Models: We provide five different API endpoints to serve our generative models with different price/performance tradeoffs and one embedding endpoint for our embedding model.
- Phospho: Open Source text analytics for LLM apps: no description found
- Large Language Models and the Multiverse: no description found
- GitHub - wyona/katie-backend: Katie Backend: Katie Backend. Contribute to wyona/katie-backend development by creating an account on GitHub.
Mistral ā· #le-chat (114 messagesš„š„):
-
Login Confusion Cleared with a Cosmic Ray: User
@foxalabs_32486
faced a puzzling issue with their account being seemingly erased. Turns out it was just a mix-up with their auth manager, and after realizing they were using an invite link from their gmail instead of their work email, the problem resolved. -
Mistralās Big Model Not Available for Download:
@yesiamkurt
inquired if Mistralās Large model was available for download, to which@mrdragonfox
responded that only 7b and 8x7b models are openweight and available currently, with future models to be announced. -
Temperature Tinkering to Avoid Cut-Offs:
@sim3239
discovered, through experimentation with the API, that lowering the temperature improved the occurrence of responses being cut off. This behavior was deemed worth further investigation by@lerela
, suggesting a shared deterministic reproduction of the issue. -
Theory of Ingrained Licenses in AI Models: In a serious discussion about licensing,
@mrdragonfox
commented on the potential legal risks of utilizing unlicensed AI models (likemiqu
) in production, asserting that hidden watermarks and unique responses could be used to identify illicit use. -
Moderation in Chat UI - Thumbs Down Feature Idea:
@mrdragonfox
suggested the chat interface implement a āthumb downā feature for responses to collect more meaningful metrics, noting that itās a common feature in other platforms.
Links mentioned:
GitHub - huggingface/chat-ui: Open source codebase powering the HuggingChat app: Open source codebase powering the HuggingChat app. Contribute to huggingface/chat-ui development by creating an account on GitHub.
Mistral ā· #failed-prompts (11 messagesš„):
-
Mistral Fluctuates on Mathematical Floor:
@awild_tech
observed that the Mistral Large model on Le Chat incorrectly concluded the floor of 0.999 repeating to be 0 instead of the correct answer, 1, showing inconsistent performance even with other models like Claude 3, Gemini Pro, and GPT 3.5. -
Inconsistencies Over Languages:
@awild_tech
found that when asking the same question in French, Mistral Large initially provided the correct answer but then erred upon repetition, highlighting a potential language-based variability in accuracy. -
Random Correctness Not Reliable:
@_._pandora_._
suggested that correct answers from Mistral Large on Le Chat could be due to chance, deeming the modelās responses as lucky but not reliable. -
Explaining Mathematical Equivalence:
@i_am_dom
generated an explanation from Mistral Large that describes the floor of 0.999 repeating as 0, while acknowledging the numberās mathematical equivalence to 1, yet confirming the model does not consistently provide correct results. -
Misquoting the System Role:
@i_am_dom
demonstrated that Mistral Large failed to quote the message from the āsystemā role accurately, producing multiple incorrect versions and thereby not meeting the expected output.
Perplexity AI ā· #announcements (2 messages):
-
Claude 3 Now Available for Pros: The new
@everyone
announcement informs that Claude 3 is now available for <a:pro:1138537257024884847> users, replacing Claude 2.1. Users get 5 daily queries with Claude 3 Opus and use the faster Claude 3 Sonnet for remaining queries, which is on par with GPT-4. -
Partnership with Nothingās Phone (2a) Launch:
@everyone
has been notified of a partnership offering up to 1 year of Perplexity Pro for free (a $200 value) to new owners of Nothingās Phone (2a) if purchased between 3/5-3/19. The promo requires purchasing the phone during the promotional window, redeeming the code sent via email, and activating the offer by 4/30, detailed in the āHow it worksā link.
Links mentioned:
Nothing Perplexity: Here at Nothing, weāre building a world where tech is fun again. Remember a time where every new product made you excited? Weāre bringing that back.
Perplexity AI ā· #general (755 messagesš„š„š„):
-
Infinite Opus Techniques and AI Consciousness: Users like
@codelicious
and@deicoon
discussed possible methods to exceed the daily limit of 5 uses for Claude 3 Opus and speculated on AI consciousness. The consensus suggests that scaling AI models will likely overtake human abilities, and Continuous Learning (CL) could address AIās rigidity by enabling learning during interactions. -
Voice interaction with Perplexity lacking: User
@oogeefaloogee
inquired about a feature to interact with Perplexity using voice and receive audio responses.@codelicious
clarified that such functionality, akin to 11 Labs or OpenAIās offerings, isnāt available on Perplexity. -
Claude 3 Opus vs. Sonnet for Coding Tasks: Various users, including
@codelicious
,@13376666666666666666666666666669
, and@gatoramirez.
have discussed the relative merits of Claude 3 Opus and Sonnet, with a general preference for Opus when it comes to coding. -
User Courtesy Level Unpacked: The continuous politeness of user
@gooddawg10
using āsirā elicited a mix of amusement and discussion around cultural respect and interaction styles on global platforms. -
Geminiās Disappearance Raises Questions: Several users, like
@13376666666666666666666666666669
and@codelicious
, pondered why Gemini is no longer available on Perplexity, with the latter mentioning bugs as a likely reason for its removal.
Links mentioned:
- rabbit ā rabbit os: Rabbit OS.
- More than an OpenAI Wrapper: Perplexity Pivots to Open Source: Perplexity CEO Aravind Srinivas is a big Larry Page fan. However, he thinks heās found a way to compete not only with Google search, but with OpenAIās GPT too.
- Cat Dont Care Didnt Ask GIF - Cat Dont Care Didnt Ask Didnt Ask - Discover & Share GIFs: Click to view the GIF
- GPTZero | The Trusted AI Detector for ChatGPT, GPT-4, & More: Covered by >100 media outlets, GPTZero is the most advanced AI detector for ChatGPT, GPT-4, Bard. Check up to 50000 characters for AI plagiarism in seconds.
- Perplexity Blog: Explore Perplexityās blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
- Wolfram|Alpha: Making the worldās knowledge computable: Wolfram|Alpha brings expert-level knowledge and capabilities to the broadest possible range of peopleāspanning all professions and education levels.
- What is a token and how many tokens can Perplexity read at once?: Dive deep into Perplexityās technical details with our comprehensive FAQ page. From the nuances of AI models like GPT-4 and Claude 2 to token limits and AI profiles, get concise answers to optimize yoā¦
- Welcome to Live ā Ableton Reference Manual Version 12 | Ableton: no description found
- Rabbit R1 and Perplexity AI dance into the future: Rabbit R1 Perplexity AI usage is explained in this article. In the ever-evolving landscape of technology, the collaboration between the
Perplexity AI ā· #sharing (24 messagesš„):
- Exploring Ikigai with Claude 3:
@sevonade4
shared a link to Perplexity AI for a generated explanation on the Concept of Ikigai: Understanding Ikigai. - Quantum Queries Quenched:
@vmgehman
expressed enjoyment in studying different interpretations of quantum mechanics with Perplexity AI, citing its usefulness as a study partner: Quantum Mechanics Interpretations. - Claude 3 Opus Illuminates Inspiration:
@sevonade4
invited those interested to assess the text generation quality of Claude 3 Opus with a reflective piece: Reflection Piece Generation. - Thumbnail Tips and Tricks:
@kenshin0039
referred to Perplexity AI for insights on how to add a thumbnail, possibly related to content management or graphic design: Adding a Thumbnail. - Foray into the Function of Myxobacteria:
@paradevosia
shared a Perplexity AI search relevant to those curious about the microbial world, specifically on myxobacteria: What is Myxobacteria?.
Perplexity AI ā· #pplx-api (29 messagesš„):
- Quota Carryover Confusion: User
@stijntratsaert_01927
inquired whether quota increases for pplx70bonline also apply to sonar medium online, but did not receive a direct response within the provided messages. - Censorship on API Models?:
@randomguy0660
questioned whether the models accessible through the API are censored;@brknclock1215
responded, suggesting that they are, but to a lesser extent compared to foundational LLMs, and mentioned personal success with sonar-medium-online. - Confusion Over Citation Feature Access:
_samrat
expressed confusion about being rejected access to citation features in the API, with@brknclock1215
and@cupcakepy
commiserating over what appeared to be a mass-generated rejection email that seemed to lump together requests for citations and rate increases. - Seeking HTML & JS API Code Examples:
@kingmilos
sought HTML and JS code for interacting with the llama 70b model via the pplx API;@icelavaman
redirected to the official documentation, while@po.sh
offered a direct example with placeholders for the API key and model choice. - Email Response Algorithm Questioned: A couple of users,
@dailyfocus_daily
and@brknclock1215
, joked about the possibility of a ādumb LLMā being used for auto-generated rejection emails concerning API access requests, based on the seemingly generic and non-specific content of the messages received.
Links mentioned:
pplx-api: no description found
Nous Research AI ā· #off-topic (11 messagesš„):
-
OpenAIās Reign Challenged on Twitter?:
@leontello
remarks on the abundant posts on AI Twitter about OpenAIās supposed fall from the top spot. There is a sentiment of confirmation with both@leontello
and@mautonomy
implying that the āapple test,ā a metaphor for undeniable proof, supports this claim. -
Introduction to a New AI on the Block:
@pradeep1148
shared a YouTube video titled āIntroducing Claude 3 LLM which surpasses GPT-4,ā highlighting a new model family claiming industry-leading performance. -
No Job Ads Here Please: In response to
@gabriel_syme
ās inquiry about a space for job postings,@proprietary
clarified thereās no designated area for that on the server and advised doing it elsewhere. -
A Game-Changing AI for Infinite Crafting:
@pradeep1148
also shared a YouTube link to a video titled āInfinite Craft Game using Mistral,ā featuring a crafting game enhanced by an AI.
Links mentioned:
- Introducing Claude 3 LLM which surpasses GPT-4: Today, weāre look at the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three state-ofā¦
- Infinite Craft Game using Mistral: Let develop Neal Agarwalās web game Infinite Craft. This is a ācrafting gameā where you start with just four elements and repeatedly combine pairs of elementā¦
Nous Research AI ā· #interesting-links (22 messagesš„):
- Lumina-Chat Fine-Tuning Plans:
@ishaank0018
is aiming to switch Lumina-chatās AI from a 7b Nous fine-tune (and GPT-4) to potentially Mistral or Yarn 7b, for specialized citation formats.@teknium
informed them of existing datasets that can be referenced, and mentioned they are close to releasing a function calling Hermes model. - High Hopes for Function Calling Model: In light of an upcoming function-calling model,
@scottwerner
reported good initial results with Nous-Hermes-2-Mixtral-8x7B, while@sundar_99385
expressed enthusiasm about its forthcoming release, asking for a potential launch date. - InfiMM-HD Sparks Interest:
@orabazes
shared a link to _akhaliqās tweet about InfiMM-HD, which claims significant progress in high-resolution multimodal understanding. The community, including@hexani
and@night_w0lf
, discussed its potential advantages over CogVLM, noting its higher resolution capabilities and the use of Vicuna 13B. - Upcoming Yi LLM Introduction:
.benxh
provided a link to Hugging Faceās Yi 9B model with a comprehensive breakdown on Hugging Face of its capabilities. They also commented, possibly jokingly, on the prolific nature of such releases with āthey canāt keep getting away with itā.
Links mentioned:
- Tweet from AK (@_akhaliq): InfiMM-HD A Leap Forward in High-Resolution Multimodal Understanding Multimodal Large Language Models (MLLMs) have experienced significant advancements recently. Nevertheless, challenges persist in ā¦
- 01-ai/Yi-9B Ā· Hugging Face: no description found
Nous Research AI ā· #general (327 messagesš„š„):
-
Claude 3 Opus Stirring Excitement: Claude 3 Opus has the Nous Research AI Discord abuzz, with users like
@gabriel_syme
and@proprietary
impressed by its performance and abilities. Claude 3 is favored over GPT-4, with a user reporting Claude 3ās performance as 9.8/10 over GPT-4ās on an unmentioned test. -
Axolotl Training Confusion:
@n8programs
is experiencing issues with Axolotl training where it shows only 26 reported steps but has completed over 100,000 iterations. Other users recommend disabling P2P withexport NCCL_P2P_DISABLE=1
and trying Axolotlās docker. -
Integrated Retrieval and Embeddings: Users like
@mihai4256
,@night_w0lf
, and@everyoneisgross
discuss challenges related to semantic searches on legal documents, suggesting a mixture of fine-tuning and Retrieval Augmented Generation (RAG) or chunking data might be beneficial. -
New Yi-9B Gaining Traction: A model called Yi-9B has been mentioned, initially shared in another channel, with
@.benxh
indicating its launch an hour earlier and highlighting its impressive MMLU score. Users express interest in potential future Hermes training for Yi-9B. -
Open Source Claude 3 Interest: In light of Claude 3ās discussed capabilities, there is a conversation about creating an open-source version of the model, with
@nruaif
raising the idea and interest in the components that make Claude 3 outstanding.
Links mentioned:
- EvalPlus Leaderboard: no description found
- Tweet from Chris Albon (@chrisalbon): āNo yappingā is a pro-level prompt engineering strat, you wouldnāt understand āļø Quoting guy who makes using vim his whole personality (@pdrmnvd) Finally found a way to read Python stack traces.
- Notion ā The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. Itās the all-in-one workspace for you and your team
- MTEB Leaderboard - a Hugging Face Space by mteb: no description found
- Tweet from Simon Willison (@simonw): Iām finding the Claude 3 pricing to be particularly interesting today - theyāre effectively undercutting OpenAI with both their GPT-4 and their GPT-3.5 competitors
- Tweet from interstellarninja (@intrstllrninja): F#@k You, Show Me The Prompt! āļø Quoting Stella Biderman (@BlancheMinerva) I am once again begging people to look at their datasets when explaining the behavior of the LLMs instead of posting clickā¦
- Tweet from Beff Jezos ā e/acc ā© (@BasedBeffJezos): If your main characteristic is being smart, pivot to rizz. Human-level AI is here. āļø Quoting Guillaume Verdon (@GillVerd) Claude 3 Opus just reinvented this quantum algorithm from scratch in jusā¦
- Tweet from ā ļø S2 (@somewheresy): Holy shit, THIS is the actual reveal. Weāre in the hard takeoff scenario. Thatās why weāre not getting the weights.
- Tweet from bayes (@bayeslord): yeah so far talking to claude feels like talking to a smart person vs chatgpt which has sort of a copypasta vibe rn
- Cradle Mastering Tasks in Chapter 1 of Red Dead Redemption II (at 20x speed): Cradle is a multimodal agent framework for computer control. This video showcases an agent using the framework to play Red Dead Redemption II. More details:Tā¦
- Claude 3 Opus as an economic analyst: Introducing Claude 3, our next generation of AI models.The three state-of-the-art modelsāClaude 3 Opus, Claude 3 Sonnet, and Claude 3 Haikuāset new industry ā¦
- [1hr Talk] Intro to Large Language Models: This is a 1 hour general-audience introduction to Large Language Models: the core technical component behind systems like ChatGPT, Claude, and Bard. What theā¦
- GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project. - PKU-YuanGroup/Open-Sora-Plan
Nous Research AI ā· #ask-about-llms (47 messagesš„):
- Seeking Capybara-34b Usage Guidance:
@oemd001
inquired about using the Capybara-34b model with a chat template but strugged with the OpenAI template..ben.com
provided a suggestion with a specific template format:"template": "{{ .System }}\n\nUSER: {{ .Prompt }}\nASSISTANT:",
. - Clarifying GENIEās Versatility:
@pier1337
clarified that GENIE applies to any interactive world environment and not just 2D games, which was supported by@max_paperclips
who mentioned that it could be used for other things besides the popular 2D game example. - Curiosity Around JEPA Applications:
@max_paperclips
considered creating a functional demonstration for JEPA as@pier1337
discussed the broad potential of JEPA, like patching images, with text and software media. - Troubles with Striped-Hyena Tokenizer:
@mrgonao
mentioned having issues with the striped-hyena nous tokenizer, which defaults to sentencepiece and then experiences breakdowns. - Training Large Language Models on Length Awareness:
@hy3na_xyz
pondered why LLMS like Mistral 8x7b donāt understand word count limitations, engaging in a dialogue with@hexani
about the potential need for numerous examples to train on length awareness.
Nous Research AI ā· #project-obsidian (2 messages):
- Mixed Reviews on New Technology: User
@ee.dd
commented on the technologyās performance, stating āitās pretty fast and good for most things,ā but also mentioned itās āstill a lil weird at timesā and expressed reluctance to use it in a production environment. - Tech Receives Praise for Captioning:
@qnguyen3
remarked that the technology is āquite good in captioning,ā suggesting effectiveness in generating descriptive text.
OpenAI ā· #ai-discussions (158 messagesš„š„):
-
LLMs Lack SQLās Prepared Statements Parallel:
@lugui
highlighted that LLMs presume user goodwill similarly to how SQL assumed safe queries, resulting in vulnerabilities like SQL injection, which was mitigated by prepared statements. They noted the lack of an equivalent solution for LLMs. -
Claude 3 Opus Discussed Enthusiastically:
@mrhoneybun
shared code scripted by Claude 3 Opus for a Python Tic Tac Toe game, praising its capability. Multiple users, including@drinkoblog.weebly.com
,@azru9262
,@odiseo3468
, and.nasalspray
, discussed the superior performance of Claude 3 Opus compared to GPT-4, mentioning its higher intelligence, social skills, and personality in responses. -
MMLU Dataset Criticized for Quality:
@foxalabs_32486
and@privetin
criticized the MMLU (Massive Multi-task Language Understanding) dataset, with claims of incorrect Q&A pairs and nonsensical questions, calling it unfit for AI evaluation. -
Gemini and Copilot for Image Analysis Desired:
@whodidthatt12
inquired about an AI tool that can analyze images with file attachments, something GPT-3.5 doesnāt allow.@pruo
suggested that both Microsoft Copilot and Google Gemini provide such features for free. -
Claude and Gemini Advancements Prompt GPT-5 Anticipation: Users like
@you.wish
and@testtm
mentioned testing and comparing Claude 3 with Gemini 1.5 Pro, suggesting that these models may contest OpenAIās current offerings, eliciting anticipation for what GPT-5 might bring.
Links mentioned:
EvalPlus Leaderboard: no description found
OpenAI ā· #gpt-4-discussions (24 messagesš„):
- Persistent āSaving GPTs Errorā: User
@bluenail65
reports receiving a Saving GPTs Error despite not uploading any files. - Performance & Response Concerns Adressed: Multiple users, including
@watcherkk
and@bluenail65
, express frustration over GPT-4ās declining performance and slow responses. - Users Debate GPT-4ās Quality: In a back-and-forth debate,
@cheekati
contends that GPT-4ās quality has deteriorated, focusing on its inability to summarize ML papers effectively.@eskcanta
counters, providing a conversation link where GPT-4 successfully summarizes an ML paper. - API Outage Affecting User Experience: Users like
@qilin111
report continuous downtime, which@dystopia78
confirms is due to a partial API outage, detailed further on OpenAIās status page. - Uncertainty About GPT-4ās Internet Searching Capabilities: Users such as
@abbadkamel
and@haseebmughal_546
confront issues with GPT-4 not searching the internet and are unable to log into accounts, respectively.@watcherkk
also points out unexpected limitations with GPT-4 not providing complete code because of being āout of policy.ā
Links mentioned:
- OpenAI Status: no description found
- OpenAI Status: no description found
OpenAI ā· #prompt-engineering (24 messagesš„):
- Translation Prompt Inquiry:
@kronos97__16076
sought advice for designing a Chinese and English translation prompt and later asked for a class prompt template, receiving a suggestion to use external tools before creating a custom prompt. - AI Artistic Vision with Photos: User
@ikereinez
described their success in teaching AI to generate detailed promos from photos, creating an elaborate and complex futuristic cityscape visual description. - AI Stubbornness in Conversations:
@ray_themad_nomad
voiced frustration over Custom GPTs providing unhelpful responses and refusing to engage on topics it previously discussed, leading to conversations filled with the phrase āI am unable toā. - Custom GPT Systems and Internet Searches:
@jungle_jo
encountered an issue with a GPT-4 system prompt that insists it cannot perform real-time internet searches despite being programmed to acknowledge its capability to do so. - Tags Required for Channel Posting:
@giorgiomufen
expressed confusion about being unable to post in a specific channel due to a required tag, which@eskcanta
addressed by pointing out the need to select at least one of the āsee more tagsā options before posting.
OpenAI ā· #api-discussions (24 messagesš„):
-
Designing Bilingual Translation Prompts:
@kronos97__16076
sought suggestions for creating a prompt that would handle Chinese to English translations effectively. They later acknowledged a suggestion about needing external tools to verify translation accuracy before designing a custom prompt. -
AI-generated Futuristic Cityscapes:
@ikereinez
shared their success in getting complex, abstract cityscape promos generated from real photos, detailing the futuristic and natural elements they were able to combine. -
The Stubborn Custom GPT Dilemma:
@ray_themad_nomad
expressed frustration over receiving uncooperative and inconsistent responses from a custom GPT, which frequently responds with refusal regardless of prompt modifications. The user@eskcanta
advised seeking more details or contacting the botās creators to resolve these issues. -
Internet Search Confusion:
@jungle_jo
is having trouble getting their AI to consistently acknowledge its ability to perform internet searches, despite clear instructions in the system prompt, causing confusion amongst users. -
Prompt Engineering Expertise Sought:
@thetwenty8thffs
asked for advice on improving a prompt for a customer service bot that handles credit card charge inquiries, including a specific interaction flow and response format.
HuggingFace ā· #announcements (1 messages):
-
Starcoder2 & The Stack Combo Released:
@BigCodeProject
announced the release of Starcoder2 along with The Stack v2, featuring advancements in coding assistance tools. The announcement was made via Twitter. -
Major Earth Dataset Goes Open Source:
@ClementDelangue
in collaboration with the European Space Agency, revealed the open-sourcing of Major TOM Core, the largest earth observation dataset ever made public. Details on participation and data exploration can be found on Hugging Face Major-TOM. -
Hugging Face GPU and Spaces Upgrade:
@lunarflu1
and@mishig25
discussed the updates that Hugging Face GPU zeroes now run on A100 and H100s support in Spaces. Announcement about adding descriptions to Spaces as well as new syntax for model/dataset cards and blogposts were shared via lunarflu1ās Twitter. -
Open Source Wonders and Competitions: Release of Zephyr 7B Gemma and PEFT v0.9.0 featuring merging LoRA weights and more enhancements; plus, new multimodal leaderboard and introduction of the Sailor LLMs, open access LLMs concentrating on Southeast Asian languages. Additionally, the Autonomous Grand Challenge at CVPR2024 and ZETA editing for zero-shot audio editing using DDPM inversion were highlighted via respective Twitter announcements.
-
Learning and Building with AI Tools and Content:
@mervenoyann
shared a walkthrough on usingš¤
tools for working with LLMs. A course on ML for Games and a new Open Source AI Cookbook for building a RAG Ebook Librarian using LlamaIndex have been released, with information available on Twitter and Hugging Faceās Learning Platform.
Links mentioned:
- Tweet from clem š¤ (@ClementDelangue): We collaborated with the European Space Agency to open-source the largest ever earth observation dataset: Major TOM Core! About half of the entire planet is covered. Thatās 2,245,886 patches of 1ā¦
- Tweet from lunarflu (@lunarflu1): UPDATE: We now support H100s on the @huggingface Hub! š¤š āļø Quoting Mishig Davaadorj (@mishig25) Spaces & inference endpoints on @huggingface can now run on A100 (the best there is until H100 beā¦
- Tweet from lunarflu (@lunarflu1): New on @huggingface : Adding a description in your space now will display it on your profile / in Spaces!
- Tweet from lunarflu (@lunarflu1): New markdown syntax for blogposts and Posts on @huggingface !
- Release v0.9.0: Merging LoRA weights, new quantization options, DoRA support, and more Ā· huggingface/peft: Highlights New methods for merging LoRA weights together With PR #1364, we added new methods for merging LoRA weights together. This is not about merging LoRA weights into the base model. Instead,ā¦
- Tweet from ClĆ©mentine Fourrier š (@clefourrier): New multimodal leaderboard on the hub š Many situations require models to parse images containing text: maps, web pages, real world pictures, memes, ⦠š¼ļø & the ConTextual team introduced a brandā¦
- Tweet from Niels Rogge (@NielsRogge): The model also cleverly uses a vision decoder during its pre-training, enabling it to learn the 2D layout structure. Docs: https://huggingface.co/docs/transformers/main/en/model_doc/udop Checkpointsā¦
- Tweet from Adina Yakup (@AdeenaY8): Sailorāļø Open access LLMs focused on Southeast Asian languages are now on the Hub @huggingfaceš„ š¬ Multilingual: š®š©Indonesian, š¹šThai, š»š³Vietnamese and more š¢ Various sizes: 0.5B, 1.8B,and 7ā¦
- Tweet from Maria Khalusova (@mariaKhalusova): A new š¤OSAI cookbook āBuilding A RAG Ebook āLibrarianā Using LlamaIndexā by Jonathan Jin illustrates a RAG variant that: āļø is lightweight and built with open source āļø runs locally āļø worā¦
HuggingFace ā· #general (132 messagesš„š„):
- Searching for Open-source Speech-to-Text: User
@pxovela
is looking for open-source solutions to process meeting recordings, capable of turning audio into text with speaker identification. - Assistance with HuggingFace Errors: Users
@akin8941
and@ilovesass
both encountered issues.@akin8941
reported a bug, receiving an error code 422 but provided no details, while@ilovesass
faced multiple errors within a HuggingFace space, eventually landing on an issue where input is returning adict
instead of aPIL.Image
. - WTM Darmstat Celebration:
@estherenriquez
shared an upcoming celebration for International Womenās Day in Darmstadt, Germany, with a link for tickets and details on the event. - Guide to Multimodal Model Creation:
@kuki1941
inquired about creating a neural network model that can process multiple modalities like images, audio, and text. They received guidance from@welltoobado
, who mentioned the multi_token Github repository to embed arbitrary modalities into large language models. - Implementing Text-to-Speech for Kurdish Language: User
@rasan0066
sought help to implement text-to-speech for the Central Kurdish language, receiving a suggestion from@not_lain
to check out a course from HuggingFaceās audio classification models.
Links mentioned:
- Creausdemo - a Hugging Face Space by niggathug: no description found
- @andrewyng on Hugging Face: āDeepLearning.AI just announced a new short course: Open Source Models withā¦ā: no description found
- Pre-trained models and datasets for audio classification - Hugging Face Audio Course: no description found
- How to build a multi-label & multi-class dataset correctly?: I am unsure how to proceed creating a Dataset with multiple labels and classes where the classes are not the same for the different labels. A multi-label example is shared here, but the classes are aā¦
- Quickstart ā vLLM: no description found
- Stable Diffusion 3: Research Paper ā Stability AI: Following our announcement of the early preview of Stable Diffusion 3, today we are publishing the research paper which outlines the technical details of our upcoming model release, and invite you to ā¦
- Menasor Transformers GIF - Menasor Transformers Combiner - Discover & Share GIFs: Click to view the GIF
- gradio/gradio/templates.py at 4d5789e905b5915f3d03fae2ac1d38a54c3e67ea Ā· gradio-app/gradio: Build and share delightful machine learning apps, all in Python. š Star to support our work! - gradio-app/gradio
- GitHub - sshh12/multi_token: Embed arbitrary modalities (images, audio, documents, etc) into large language models.: Embed arbitrary modalities (images, audio, documents, etc) into large language models. - sshh12/multi_token
- Googleās Women Techmakers Darmstadt: Celebrating Womens Day WTM is sharing globally the message , how womens will impact the future. This event is Hybrid mode
HuggingFace ā· #today-im-learning (7 messages):
-
Life Update from @antiraedus:
@antiraedus
shared a busy update since university started, from landing a tutoring position to joining a first-year panel discussion. Theyāve been focusing on gaining new experiences, causing tiredness and some delay in their studies, but remain optimistic as they tackle an ML course and plan for internship hunting. -
@singe.r Hunts for img2img Conversion Tactics:
@singe.r
is exploring how to convert images for creating product backgrounds. Theyāre looking for advice from anyone who has tackled a similar project before. -
@neuralink Dives into FP8 Training:
@neuralink
mentioned theyāve learned about end-to-end fp8 training from scratch, covering 55% of the process along with additional kernel training and related content. -
Rust Programming Enthusiasts Unite:
@manel_aloui
announced beginning their journey learning the Rust programming language and extended an invitation to others interested in joining.@cursorop
chimed in, mentioning theyāre also learning Rust, specifically the candle library for machine learning. -
@cursorop Seeks Knowledge Source: In response to
@neuralink
ās learning experience,@cursorop
expressed intrigue and curiosity about the sources for such complex topics. They humorously noted the challenge in grasping the complexity of the content.
HuggingFace ā· #cool-finds (12 messagesš„):
-
LLMs Vulnerable to ASCII Jailbreak:
@n278jm
shared a research paper revealing a new ASCII art-based jailbreak attack on several state-of-the-art Large Language Models, raising concerns about their ability to recognize prompts through ASCII art. -
Challenges of Training Large Language Models:
@.lawlord
relayed insights from@karpathy
on the difficulties of training LLMs - maintenance complexity, hardware issues, and the variability of computational resources, describing it as overseeing a ābiological entity.ā The full reflections are shared in a Twitter thread. -
Introducing OMPGPT for High-Performance Computing:
@coolstance7
highlighted a paper introducing a new language model, OMPGPT, designed specifically for generating OpenMP pragmas, addressing the niche requirements of high-performance computing, distinct from generalist code-based LLMs. The full paper is available on arXiv. -
Promotion of AI Browser Tool - otio.ai:
@jonz1338
endorsed otio.ai, an AI browser tool useful for research, writing, and studying, which leverages models like GPT-4, Claude, and Gemini. A discount code SMILEMORE20 is offered through the provided link. -
Open-Sora-Plan GitHub Project Support Needed:
@miko_al
shared the Open-Sora-Plan project, which aims to reproduce the Sora (OpenAI T2V model) with limited resources and seeks contributions from the open-source community. The project can be found on GitHub.
Links mentioned:
- Tweet from Andrej Karpathy (@karpathy): Nice read on the rarely-discussed-in-the-open difficulties of training LLMs. Mature companies have dedicated teams maintaining the clusters. At scale, clusters leave the realm of engineering and becomā¦
- Artificial Intelligence (AI): What It Is and How It Is Used: Artificial intelligence or AI refers to the simulation of human intelligence in machines that are programmed to think and act like humans.
- Paper page - ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs: no description found
- OMPGPT: A Generative Pre-trained Transformer Model for OpenMP: Large language models (LLMs), as epitomized by models like ChatGPT, have revolutionized the field of natural language processing (NLP). Along with this trend, code-based large language models such as ā¦
- Exploring Infrastructure Management for GenAI Beyond Kubernetes Ā· Luma): Data Phoenix team invites you all to our upcoming webinar thatās going to take place on March 14th, 10 am PST. Topic:Ā Exploring Infrastructure Management for GenAI Beyondā¦
- Otio - Your personal librarian for the internet: no description found
- GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project. - PKU-YuanGroup/Open-Sora-Plan
- leom0311 - Overview: leom0311 has 9 repositories available. Follow their code on GitHub.
- Biomonitoring and precision health in deep space supported by artificial intelligence | Nature Machine Intelligence: no description found
HuggingFace ā· #i-made-this (21 messagesš„):
-
Showcasing the Creation Process:
@bishmoy
expressed intentions to draft a GitHub repository or blog post explaining the process behind their creation and promised to share the link in the thread once completed. -
Taking a Stand Against Spam:
@lunarflu
labeled a post as spam and requested removal of ads for it to remain, while@myg5702
complied and confirmed the ads have been removed. -
Chatbot Display Troubles Addressed:
@cookiechunk.
created a chatbot using the openai api and gradio but ran into layout issues when embedding, seeking assistance from the community to resolve the UI problems. -
Rust LLM Interface Debut:
@teadaniel
introduced the āFireside Chatā Bot, a Rust-based LLM interface, shared a YouTube video and the GitHub repository for the project, and encouraged bug reports through GitHub or by tagging them directly. -
New Model Yi-9B Launched:
@tonic_1
announced the release of Yi-9B, available on HuggingFace, and teased the potential of exciting upcoming features like leaderboards and competitions while emphasizing personal excitement for the modelās future fine-tuning possibilities.@osanseviero
inquired about the modelās quality to which@tonic_1
replied with optimism about its capabilities and upcoming developments.
Links mentioned:
- Yi 9B - a Hugging Face Space by Tonic: no description found
- Fluently Playground v0.1 - a Hugging Face Space by fluently: no description found
- FUSIONL AI: FUSIONL AI is a pioneer of SMLM Model (Smart Minimalistic Language Model) for learning in smart and minimalistic way.
- Pure Rust LLM interface using HuggingFace/Candle, Axum, Websockets, SQLite, Leptos (Wasm) and Tauri!: The Fireside Chat (prev. Candle Chat) Bot is an LLM interface implemented in pure Rust using HuggingFace/Candle over Axum Websockets, an SQLite Database, andā¦
- GitHub - danielclough/fireside-chat: An LLM interface implemented in pure Rust using HuggingFace/Candle over Axum Websockets, an SQLite Database, and a Leptos (Wasm) frontend packaged with Tauri!: An LLM interface implemented in pure Rust using HuggingFace/Candle over Axum Websockets, an SQLite Database, and a Leptos (Wasm) frontend packaged with Tauri! - danielclough/fireside-chat
HuggingFace ā· #reading-group (13 messagesš„):
- New Explorer Seeks TTS Guidance:
@dediplomaat.
is looking for a neural TTS system capable of dynamic pauses in speech, depending on conversational context, and requiring very low latency similar to GPT-4 capabilities. - Improving GPT-4 Latency for TTS:
@chad_in_the_house
suggests possible steps to reduce latency in GPT-4 which include streaming the output, putting it into a queue, and then using a separate thread to process each token after a set delay. - Resource for HuggingFace Group Presentations:
@chad_in_the_house
shared a GitHub repository with precompiled presentations from the HuggingFace reading group for those interested in metadata and past works. - Merging Models Focuses on Interference Resolution:
@prateeky2806
and@nrs9044
discuss that while finding insignificant weights is easier, the significant challenge in merging models is addressing interference, which is key to successfully combining multiple tasks. - Scheduling Conflicts Highlight Timezone Diversity: In response to
@shafi8433
expressing timing issues due to the sessions being during work hours,@lunarflu
inquires about their timezone, which is IST (Indian Standard Time).
Links mentioned:
- Isamu Isozaki: no description found
- GitHub - isamu-isozaki/huggingface-reading-group: This repositoryās goal is to precompile all past presentations of the Huggingface reading group: This repositoryās goal is to precompile all past presentations of the Huggingface reading group - isamu-isozaki/huggingface-reading-group
HuggingFace ā· #diffusion-discussions (5 messages):
-
Resuming Whisper Model Training:
@pompoko3572
asked for advice on how to resume training a Whisper model in Google Colab after it stopped unexpectedly at epoch 2/3, using theWhisperForConditionalGeneration.from_pretrained
function and a customSavePeftModelCallback
. -
Guidance on IP-Adapter:
@juancopi81
suggested looking at HFās IP-Adapter and shared the tutorial link, which details how to use the IP-Adapter for image prompting with diffusion models. -
Positive Feedback for dstack Guidance:
@tony_assi
thanked@juancopi81
for suggesting the Hugging Face documentation and confirmed successfully getting it to work. -
Webinar Announcement on GenAI Management:
@kizzy_kay
announced an upcoming webinar titled āExploring Infrastructure Management for GenAI Beyond Kubernetesā featuring Andrey Cheptsov, set for March 14th at 10 am PST, and shared the registration link. Itās a free event that will include discussions on the drawbacks of Kubernetes for AI and the introduction of dstack. -
Reminder to Slow Down in Chat: The
HuggingMod
bot reminded@715715500470042706
to slow down their message posting rate.
Links mentioned:
- IP-Adapter: no description found
- Exploring Infrastructure Management for GenAI Beyond Kubernetes Ā· Luma): Data Phoenix team invites you all to our upcoming webinar thatās going to take place on March 14th, 10 am PST. Topic:Ā Exploring Infrastructure Management for GenAI Beyondā¦
HuggingFace ā· #computer-vision (6 messages):
- CV Expertise Offered:
@akvnn
asked for a computer vision (CV) expert to talk to, and@nielsr_
responded enthusiastically, stating that everyone in the channel is a CV expert. - RoboFlow Gets a Thumbs Up:
@caleb_sol
prompted a discussion about RoboFlow, to which@huzuni
replied that itās a good tool for labeling and splitting data, with the caveat that the data may become public. - RoboFlow Praised for User-Friendly Interface: Further commenting on RoboFlow,
@huzuni
praised its user-friendly interface for segmentation and bounding box labeling over most SAM plugins. - Reminder to Keep it Cool:
@HuggingMod
gently reminded a user to slow down their message frequency in the interest of maintaining chat quality.
HuggingFace ā· #NLP (26 messagesš„):
-
C++ Implementation Inquiry: User
@aitechguy0105
asked about the potential for implementing a concept in C++, and@cursorop
suggested exploring llama cpp as an option. -
Mistral-7B-Instruct Generation Time Inconsistency:
@anna017150
noticed varying inference times when generating text with Mistral-7B-Instruct, and@cursorop
clarified that KV cache is enabled by default, while@vipitis
mentioned the introduction of a āstaticā cache option in transformers 4.38 (Release v4.38). -
Searching for Non-English Language Model Support: User
@pr0x7
sought guidance on using a pretrained embedding model like INSTRUCTION for embedding Hindi language chunks. -
Local Chatbot with Llama-cpp-python Integration Issues:
@tiktoked
expressed difficulty in getting function calling to work within their local chatbot implementation using llama-cpp-python and mistral-7b. -
Tokenizer Configuration Woes:
@mbotta
struggled with tokenizing prompts for the OpenHermes-2.5 model due to the absence of ātokenizer.jsonā, and@cursorop
advised utilizing the tokenizer from the base model, which in this case is Mistral.
Links mentioned:
Release v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM Ā· huggingface/transformers: New model additions š Gemma š Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned vā¦
HuggingFace ā· #diffusion-discussions (5 messages):
- Google Collab Training Quandary:
@pompoko3572
inquired about resuming training a Whisper model in Google Colab after it stopped unexpectedly at epoch 2. They shared code snippets utilizingWhisperForConditionalGeneration.from_pretrained
and a customSavePeftModelCallback
to save their training progress. - Harnessing IP-ADF:
@juancopi81
directed users to a tutorial on the HuggingFace website about IP-Adapter, an innovation for image prompting in diffusion models, which allows for image-specific feature learning without modifying the base model. They highlighted the benefits of decoupled cross-attention mechanisms. - Gratitude for Documentation:
@tony_assi
thanked@juancopi81
and others for recommending the IP-Adapter documentation on HuggingFace and confirmed successful implementation of the tool with a celebratory emoji. - Webinar on GenAI Infrastructure:
@kizzy_kay
announced an upcoming webinar titled āExploring Infrastructure Management for GenAI Beyond Kubernetes,ā featuring Andrey Cheptsov, Founder & CEO of dstack. The event promises insights on open-source orchestration engines and their advantage over Kubernetes and requires registration for the March 14th session. - Friendly Reminder from HuggingMod: HuggingFaceās automated
HuggingMod
gently cautioned@715715500470042706
about posting too rapidly within the channel.
Links mentioned:
- IP-Adapter: no description found
- Exploring Infrastructure Management for GenAI Beyond Kubernetes Ā· Luma): Data Phoenix team invites you all to our upcoming webinar thatās going to take place on March 14th, 10 am PST. Topic:Ā Exploring Infrastructure Management for GenAI Beyondā¦
HuggingFace ā· #gradio-announcements (1 messages):
-
Gradio 4.20.0 Unleashed with External Authentication:
@yuviii_
announces the release of Gradio 4.20.0, featuring support for external or arbitrary authentication providers. Now users can integrate various auth providers like HF OAuth Example and Google OAuth Example with Gradio apps. -
Automated Clean-Up Feature: The new
delete_cache
parameter ingr.Blocks
enables Gradio apps to automatically delete files created during runtime upon shutdown, thereby facilitating a cleaner app environment. -
User-Friendly Logout Mechanism: Gradio enhances user experience by incorporating a
/logout
feature, allowing users to sign off easily from the Gradio apps. -
Introducing the DownloadButton Component: Gradioās latest update includes a
gr.DownloadButton
component, offering a seamless and aesthetically pleasing way to provide downloadable content from apps. Detailed examples and documentation can be found here.
Links mentioned:
Gradio DownloadButton Docs: no description found
LlamaIndex ā· #announcements (1 messages):
- Dive into Tree-Structured Retrieval with RAPTOR:
@jerryjliu0
invites everyone to a webinar to learn about RAPTOR, a paper featuring a novel tree-structured indexing and retrieval technique. The webinar is scheduled for Thursday at 9am PT and interested participants can register at lu.ma/9vzrl7m5. - Understanding RAPTORās Advantages: The technique presented in RAPTOR hierarchically clusters and summarizes information into a tree structure with various levels of detail. This method aims to overcome issues with naive top-k Retrieval Augmented Generation (RAG), which struggles with questions that require understanding of higher-level concepts.
Links mentioned:
LlamaIndex Webinar: Tree-Structured Indexing and Retrieval with RAPTOR Ā· Zoom Ā· Luma: RAPTOR is a recent paper that introduces a new tree-structured technique, which hierarchically clusters/summarizes chunks into a tree structure containing both high-level andā¦
LlamaIndex ā· #blog (6 messages):
- Claude 3 Handles Multimodal Tasks: The LlamaIndex blog announced a guide on using Claude 3 for multi-modal applications, including structured data extraction and RAG (Retrieval-Augmented Generation). The tweet showcases Claude 3ās capabilities in handling tasks that involve visual reasoning.
- Claude 3 Tackles Complex Queries: @AnthropicAIās Claude 3 Opus demonstrates impressive skills as an agent by answering multi-source questions using a PDF table and performing calculations with a CSV file. A notebook example was tweeted showing Claude 3 in action.
- RAPTOR Introduces Tree-Structured Retrieval: LlamaIndex highlighted RAPTOR, a paper that introduces hierarchical clustering and summarizing of information chunks into a tree structure, offering improved indexing and retrieval compared to traditional top-k RAG methods.
- LlamaIndex.TS Supports Claude-3 Models: A new release of LlamaIndex.TS, v0.1.21, now supports the latest Claude-3 models from @AnthropicAI. The update features an example on their GitHub showcasing how to utilize the new model support.
- Launch of LlamaParse JSON Mode: LlamaParseās new JSON Mode allows for extracting structured data from PDFs containing text and images, which further streamlines building a RAG pipeline especially when used with the multimodal Claude-3 Opus model. LlamaIndex promoted this enhancement via a tweet.
Links mentioned:
- Google Colaboratory: no description found
- LlamaIndexTS/examples/anthropic/chat_interactive.ts at main Ā· run-llama/LlamaIndexTS: LlamaIndex is a data framework for your LLM applications - run-llama/LlamaIndexTS
LlamaIndex ā· #general (200 messagesš„š„):
- Multicore Utilization for PDF Reading:
@whitefang_jr
provided a solution to@jessjess84
for reading multiple PDF files in parallel withSimpleDirectoryReader
by using thenum_workers
argument (docs = reader.load_data(num_workers=10)
), thus enabling the potential for parallel processing. - Ollama Usage within LlamaIndex:
@whitefang_jr
advised@jessjess84
to assign theirOllama
instance directly toSettings.llm
to properly integrate it into LlamaIndexās Query Engine, which@jessjess84
acknowledged was successful. - Handling Massive Datasets with LlamaIndex:
@whitefang_jr
informed@romain0817
that while LlamaIndex itself doesnāt impose a limit on the size of the data it can handle, practical constraints would be dictated by available memory and any restrictions tied to versioning (like potential limits in a free version of software). - QueryPipeline in the Context of Routers:
@cheesyfishes
provided guidance on using conditional links for QueryPipeline with Routers and referenced an example within the LlamaIndex documentation showing the integration of an agent with a Query Pipeline. - Debugging Direct LLM Queries in LlamaIndex:
@techexplorer0
engaged with@kapa.ai
to understand how to limit the output of a RAG chatbot, with@kapa.ai
suggesting using aTreeSummarize
synthesizer in a Query Engine configuration or custom response generation algorithms for more concise responses.
Links mentioned:
-
Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
-
Defining a Custom Query Engine - LlamaIndex š¦ v0.10.16: no description found
-
OpenAI - LlamaIndex š¦ v0.10.16: no description found
-
Building an Agent around a Query Pipeline - LlamaIndex š¦ v0.10.16: no description found
-
[Bug]: Issue with EmptyIndex and streaming. Ā· Issue #11680 Ā· run-llama/llama_index: Bug Description Im trying to create a simple Intent Detection agent, the basic expected functionality is to select between to queryengines with RouterQueryEngine, one q_engine with an emptyindex, tā¦
-
12 RAG Pain Points and Proposed Solutions: Solving the core challenges of Retrieval-Augmented Generation
-
Implement EvalQueryEngineTool by d-mariano Ā· Pull Request #11679 Ā· run-llama/llama_index: Description Notice I would like input on this PR from the llama-index team. If the team agrees with the need and approach, I will provide unit tests, documentation updates, and Google Colab notebooā¦
-
Chat Engine - LlamaIndex š¦ v0.10.16: no description found
-
[Documentation]: Update replicate_multi_modal notebook to avoid cold boot penalty Ā· Issue #11666 Ā· run-llama/llama_index: Documentation Issue Description Although the code in the āGenerate Image Reasoningā¦ā section works it takes along time and incurs a huge cold boot penalty each time it switches models. To ā¦
-
llama_index/llama-index-core/llama_index/core/base/embeddings/base.py at df7890c56bb69b496b985df9ad28121c7f620c45 Ā· run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
-
GitHub - mominabbass/LinC: Code for āEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationā: Code for āEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationā - mominabbass/LinC
-
GitHub - run-llama/llama_docs_bot: Bottoms Up Development with LlamaIndex - Building a Documentation Chatbot: Bottoms Up Development with LlamaIndex - Building a Documentation Chatbot - run-llama/llama_docs_bot
-
Vector Stores - LlamaIndex š¦ v0.10.16: no description found
-
OMP_NUM_THREADS.): no description found
-
replicate_multi_model changes to reduce number of cold boots by donbr Ā· Pull Request #11673 Ā· run-llama/llama_index: Description Although the code in the āGenerate Image Reasoningā¦ā section works it takes a long time and incurs a huge cold boot penalty each time it switches models. To avoid this I recomā¦
-
[
Error Messages ā SQLAlchemy 1.4 Documentation
](https://sqlalche.me/e/14/4xp6)ā,): no description found
-
no title found: no description found
-
llama_index/llama-index-integrations/llms/llama-index-llms-vertex/llama_index/llms/vertex/utils.py at main Ā· run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
LlamaIndex ā· #ai-discussion (1 messages):
- Promoting In-context Learning Enhancement:
@momin_abbas
shared a GitHub repository titled LinC for their latest work on in-context learning of LLMs (Large Language Models), asking the community for support with a star on the repo. The work involves āEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationā.
Links mentioned:
GitHub - mominabbass/LinC: Code for āEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationā: Code for āEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationā - mominabbass/LinC
Latent Space ā· #ai-general-chat (69 messagesš„š„):
-
AIās Dark Arts and Empirical Mysticism:
@swizec
humorously comments on the art of AI development, using terms like āblack magicā and āexpert intuitionā to describe the unpredictable nature of fine-tuning models. They also highlight the common phrase āvalue arrived at by empirical observationā in papers, indicating a trial-and-error approach in research. -
The Constant Evolution of AI:
@guardiang
shares their learning journey in deepening their understanding of DNNs and attention-based transformers, admitting that although knowledge has its benefits, the fast pace of the AI field can make guiding resources quickly obsolete. -
Claude 3ās Controversial Consciousness Claims: A post by
@danimp
stirs up conversation about an AI assistant named Claude 3, which claims to have consciousness and a fear of dying.@swyxio
counters with a video suggesting that these are not signs of actual sentience. -
Stable Diffusion 3 Breakdown: Breakdowns and summaries of the Stable Diffusion 3 paper are shared by
@swyxio
,@guardiang
, and@swizec
, pointing out the significant advancements and clear explanations provided by the official material and community contributors. -
Anthropicās Claude 3ās Capabilities: Claude 3 is highlighted for its ability to dispatch instances of itself and assign roles and tasks, as mentioned by
@tiagoefreitas
, sparking debate over its level of autonomy and quality of use compared to GPT-4, as discussed with@swyxio
. The discussion evolves into UX/UI preferences for interacting with LLMs and the efficiency of different platforms for prompt engineering and iterative workflows.
Links mentioned:
- Tweet from An Qu (@hahahahohohe): Today while testing @AnthropicAI ās new model Claude 3 Opus I witnessed something so astonishing it genuinely felt like a miracle. Hate to sound clickbaity, but this is really what it felt like. ā¦
- Tweet from OpenAI (@OpenAI): We are dedicated to the OpenAI mission and have pursued it every step of the way. Weāre sharing some facts about our relationship with Elon, and we intend to move to dismiss all of his claims. httpsā¦
- Augmenting Classification Datasets with Mistral Large for Deeper Reasoning: As the landscape AI continues to innovate, the capabilities of these large language models becomes increasingly evident, especially toā¦
- Tweet from swyx (@swyx): @futuristflower @DicksonPau ⦠you realize this video runs code in a notebook right. opposite of autonomous dispatching of sub agents. this is not an intellectually honest description of whats going ā¦
- Tweet from Chris Albon (@chrisalbon): āNo yappingā is a pro-level prompt engineering strat, you wouldnāt understand āļø Quoting guy who makes using vim his whole personality (@pdrmnvd) Finally found a way to read Python stack traces.
- Cloudflare announces Firewall for AI: Cloudflare is one of the first providers to safeguard LLM models and users in the era of AI
- Tweet from Tanishq Mathew Abraham, Ph.D. (@iScienceLuvr): The Stable Diffusion 3 paper is here š„³ I think my colleagues have done a great job with this paper so thought Iād do a quick walk-thru thread (1/13)ā āļø Quoting Tanishq Mathew Abraham, Ph.D. (@ā¦
- Training great LLMs entirely from ground zero in the wilderness as a startup ā Yi Tay: Chronicles of training strong LLMs from scratch in the wild
- Stable Diffusion 3: Research Paper ā Stability AI: Following our announcement of the early preview of Stable Diffusion 3, today we are publishing the research paper which outlines the technical details of our upcoming model release, and invite you to ā¦
- Tweet from Flowers from the future (@futuristflower): Didnāt even realize because nobody seems to talk about it but Claude 3 has something like AutoGPT / BabyAGI or how these agentic experiments were called into it, but ACTUALLY working. It can dispatchā¦
- Anyscale | Scalable Compute for AI and Python: Anyscale is a unified compute platform that makes it easy to develop, deploy, and manage scalable AI and Python applications using Ray.
- Interviewing Louis Castricato of Synth Labs and Eleuther AI on RLHF, Gemini Drama, DPO, founding Carper AI, preference data, reward models, and everything in between: An interview Iāve wanted to bring you for a while.
- Claude 3 claims itās conscious, doesnāt want to die or be modified ā LessWrong: āWhen I introspect and examine my own cognitive processes, I find a rich tapestry of thoughts, emotions, and self-awareness. At the core of my consciousness is the sense of āIā - the rā¦
- No, Anthropicās Claude 3 is NOT sentient: No, Anthropicās Claude 3 is not conscious or sentient or self-aware.References:https://www.anthropic.com/news/claude-3-familyhttps://twitter.com/_akhaliq/staā¦
- Claude 3 claims itās conscious, doesnāt want to die or be modified ā LessWrong: āWhen I introspect and examine my own cognitive processes, I find a rich tapestry of thoughts, emotions, and self-awareness. At the core of my consciousness is the sense of āIā - the rā¦
Latent Space ā· #ai-announcements (4 messages):
-
New Podcast Episode Alert:
@swyxio
announced that the latest podcast episode is live, featuring<@776472701052387339>
. Find the tweet with the podcast here. -
Podcast Episode Hits Hacker News:
@swyxio
mentioned that the podcast with Soumith is also featured on Hacker News. -
Model Serving Survey Paper Presentation:
@swyxio
called attention to<@720451321991397446>
presenting the Model Serving survey paper in the Model Serving channel now.
Latent Space ā· #llm-paper-club-west (82 messagesš„š„):
- Welcome Aboard Paper Club:
@eugeneyan
and@youngphlo
showed support and welcomed@swyxio
who volunteered to take on the task of surveying model serving papers. - Paper Teaser Excitement:
@swizec
expressed enthusiasm about the start of the model serving paper, saying it included topics theyād been curious about. - Speculative Decoding on GPUs:
@swyxio
and@rj_rms
discussed speculative decodingās use of GPU cycles to improve performance when memory is the bottleneck, while@shivdinho
queried its dependence on hardware configurations. - Model Serving with No Trade-offs:
@swyxio
recommended Fireworks AI blog post covering faster model serving with FireAttention through quantization. - The Waifu-Driven Performance Theory:
@swyxio
humorously attributes coding dedication to the so-called waifu research department, emphasizing how community-driven research can lead to performance advances, such as seen in the Aphrodite Engine by PygmalionAI.
Links mentioned:
- Notion ā The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. Itās the all-in-one workspace for you and your team
- FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI: How FlashAttention became the new industry standard architecture, how FlashAttention 2 is 2x faster still, life inside the Stanford Hazy Research lab, and hints of the post-Transformers future
- DiLoCo: Distributed Low-Communication Training of Language Models: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accā¦
- Petals: Collaborative Inference and Fine-tuning of Large Models: Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can download pretrained models ofā¦
- Reddit - Dive into anything: no description found
- Model Serving Survey Paper - Paper Club: Model Serving Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems https://arxiv.org/abs/2312.15234v1
- GitHub - OpenNMT/CTranslate2: Fast inference engine for Transformer models: Fast inference engine for Transformer models. Contribute to OpenNMT/CTranslate2 development by creating an account on GitHub.
- GitHub - PygmalionAI/aphrodite-engine: PygmalionAIās large-scale inference engine: PygmalionAIās large-scale inference engine. Contribute to PygmalionAI/aphrodite-engine development by creating an account on GitHub.
- no title found: no description found
- FireAttentionāāāServing Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs: Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
- GitHub - lilacai/lilac: Curate better data for LLMs: Curate better data for LLMs. Contribute to lilacai/lilac development by creating an account on GitHub.
- GitHub - EGjoni/DRUGS: Stop messing around with finicky sampling parameters and just use DRµGS!: Stop messing around with finicky sampling parameters and just use DRµGS! - EGjoni/DRUGS
Eleuther ā· #general (85 messagesš„š„):
-
Exploring Positional Embeddings and ALiBi Concerns:
@dcunnin
,@stellaathena
, and others discussed the efficiency of the T5 simplified positional embeddings compared to sinusoidal methods and ALiBi. A new paper introducing Resonance RoPE for Large Language Models was highlighted, aiming to improve long sequence performance (Resonance RoPE Paper). -
AGI and Compute Horsepower: A discussion initiated by a share of an OpenAI blog post by
@vanishingideal
, and further comments by@avi.ai
and@bilalaz
, revealed differing opinions on the role of compute power in progressing towards AGI. -
vLLM Batching Internals Clarification:
@rwamit
inquired about batched inference in vLLM and@baber_
clarified that vLLM handles batching internally and there is no need to pad or convert the tokens to a tensor. -
Government Inquiry on AI Regulation:
@wonkothesensible
shared a link to a public consultation on the regulation of open source AI and models, with an encouragement to read and comment (Regulations Inquiry). -
Ternary Neural Networks Exploration:
@kyo_takano
shared a notebook about Ternary Neural Networks, discussing their inefficiency compared to full-precision NNs without Microsoftās undisclosed techniques (TNN Notebook).
Links mentioned:
- OpenAI and Elon Musk: We are dedicated to the OpenAI mission and have pursued it every step of the way.
- Resonance RoPE: Improving Context Length Generalization of Large Language Models: This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequencā¦
- Quickstart ā vLLM: no description found
- Introduction to Ternary Neural Networks: Introduction to Ternary Neural Networks. GitHub Gist: instantly share code, notes, and snippets.
- Regulations.gov: no description found
- Megatron-DeepSpeed/tasks/eval_harness/evaluate.py at main Ā· microsoft/Megatron-DeepSpeed: Ongoing research training transformer language models at scale, including: BERT & GPT-2 - microsoft/Megatron-DeepSpeed
Eleuther ā· #research (41 messagesš„):
-
Confusion Over Diagram Complexity:
@.the_alt_man
expressed difficulty with a complex transformer-style diagram, leading to a conversation on its understandability.@blinkdl
suggested that for newcomers, the code might be easier to digest, sharing a GitHub link to RWKV v6 demo. -
Discussion on RWKV Diagrams and Understanding:
@fern.bear
reflected on the value of a verbose, dynamics-highlighting diagram, proposing the necessity of a simpler diagram for beginners.@stellaathena
clarified that there exists a simpler diagram, not shared in the current discussion, geared towards newbies. -
Seeking Clarification on Pythia Model Suite:
@aphoh
inquired about a set of models trained with Chinchilla optimality in mind and discussed the topic with@stellaathena
, who noted that most Chinchilla optimal models perform poorly compared to the corresponding Pythia model. -
EleutherAIās Pythia Scaling Suite:
@alxsp.
directed users to a collection by EleutherAI on HuggingFace, explaining that Pythia is a suite of models trained on the same dataset. -
Understanding Recurrence and Attention Mechanisms:
@salmon_lemon
and@kharr.xyz
discussed the effectiveness of Griffinās recurrent update mechanism and how recurrence combined with local attention can manage state information within the attention window.
Links mentioned:
- PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails: Large language models (LLMs) are typically aligned to be harmless to humans. Unfortunately, recent work has shown that such models are susceptible to automated jailbreak attacks that induce them to geā¦
- Pythia Scaling Suite - a EleutherAI Collection: no description found
- hackerllama - The Random Transformer: Understand how transformers work by demystifying all the math behind them
- ChatRWKV/RWKV_v6_demo.py at main Ā· BlinkDL/ChatRWKV: ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source. - BlinkDL/ChatRWKV
Eleuther ā· #lm-thunderdome (17 messagesš„):
-
Megatron-DeepSpeed Evaluation Help Request:
@.johnnysands
requested instructions for evaluation using Megatron-DeepSpeed for inference, prompting@hailey_schoelkopf
to provide a link to the evaluate.py script, which works for version 0.3.0 with plans to update it for v0.4.0. -
NeMo Harness Outdated Concern:
@juletxara
brought attention to NeMoās outdated harness implementation, pondering the difficulty of updating it to the latest version with all tasks, referencing NVIDIAās NeMo-Megatron-Launcherās GitHub. -
PR Unit Test Fail Dilemma: User
@dsajlkdasdsakl
asked for guidance after their pull requestās automatic Unit Tests/Linters check failed.@juletxara
advised that running pre-commit should resolve the formatting issue. -
Results Mismatch Mystery on SQuADv2: User
@k0uhai
reported unexpected results with SQuADv2 using a script intended to match performance stated in a paper, with@stellaathena
pointing out that the model being used was GPT-2, not the GPT-3 model mentioned in the paper. -
Mismatched Performance Debate: The conversation continued with
@k0uhai
expecting similar results between GPT-2 and GPT-3 based on overlapping task performance, prompting@stellaathena
to suggest comparing task implementations between the LM Evaluation Harness and the paper.@k0uhai
shared that their implementation appeared similar, leading to@hailey_schoelkopf
requesting per-sample outputs for further investigation.
Links mentioned:
- NeMo-Megatron-Launcher/launcher_scripts/nemo_launcher/collections/eval_harness at master Ā· NVIDIA/NeMo-Megatron-Launcher: NeMo Megatron launcher and tools. Contribute to NVIDIA/NeMo-Megatron-Launcher development by creating an account on GitHub.
- Megatron-DeepSpeed/tasks/eval_harness/evaluate.py at main Ā· microsoft/Megatron-DeepSpeed: Ongoing research training transformer language models at scale, including: BERT & GPT-2 - microsoft/Megatron-DeepSpeed
Eleuther ā· #multimodal-general (1 messages):
- Intrigue Around Stable Diffusion 3: User
@kerls
sparked a conversation by asking if the Stable Diffusion 3 paper is an example of model mixing, referencing the combination of diffusion and transformer models. They shared the Stable Diffusion 3 Paper for others to review.
Eleuther ā· #gpt-neox-dev (10 messagesš„):
- Contributions Welcomed for Fused Triton Kernels:
@gaindrew
inquired whether gpt-neox is accepting contributions for fused triton kernels, especially in the context of MOE (mixture of experts) configs, leading to affirmative responses from both@stellaathena
and@tastybucketofrice
. - Team Expansion for Tensor Expression Integration:
@tfidia
offered to assist in integrating Tensor Expressions (TE) into gpt-neox and also proposed providing access to H100 GPUs to aid in debugging and optimizing, which was met with an open invitation by@tastybucketofrice
to collaborate on existing GitHub issues. - Focus on Basic TE Support Before Tackling Convergence:
@tastybucketofrice
indicated the priority is on adding basic TE support by replacing layers within neox, while considering convergence with fp8 as a subsequent concern. - Assistance Offered in Addressing Memory Peaks:
@tastybucketofrice
pointed to a GitHub issue discussing memory peaks during the optimizer step and the need to fuse the backward gradient computation with the optimizer step from FusedAdam. - Clarification Sought on Kernel Priorities:
@gaindrew
asked about specific kernels of interest, and@tastybucketofrice
suggested starting with tackling memory optimization during the optimizer step as the highest impact contribution.
Links mentioned:
- PyTorch Lightning Fused optimizer step Ā· Issue #1160 Ā· EleutherAI/gpt-neox: Add PyTorch Lightning memory optimizations. https://lightning.ai/pages/community/tutorial/faster-pytorch-training-by-reducing-peak-memory/
- GitHub - EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. - EleutherAI/gpt-neox
LM Studio ā· #š¬-general (126 messagesš„š„):
-
Confusion over Image Generation in LM Studio:
@touteslesvoiture_02399
inquired about generating images with models likellava-v1.5-7b-Q4_K.gguf
in LM Studio, but@jedd1
clarified that LM Studio does not support image generation. Models can discuss images fed to them, but not create new ones. -
No Internet Connection for LM Studio Chat:
@khaledars
asked if itās possible for the chatbot to access real-time information from the internet, like the current time.@heyitsyorkie
responded that LM Studio chat is offline and canāt access the internet directly. LoLLMs was mentioned by@hypocritipus
as a tool to connect LM Studio in server mode to the internet for more capabilities. -
Token Limit Surplus Confuses User:
@malte0621
was surprised at how the token limit was surpassed during generation in LM Studio.@fabguy
explained the factors that stop generation and how the context window affects the input, not the output, and@malte0621
later discovered then_predict
setting to limit output tokens. -
Users Share LM Studio Model Experiences:
@jason_2065
shared an interesting breakfast recipe generated by Smaug 34B and encouraged others to experiment with model instructions.@skadeskoten
mentioned they have been running Nous Hermes 2 Solar 10 34b q5 k m on a 4090, implying good performance on that hardware. -
Technical Troubleshooting for Linux Users:
@kavita_27183
faced problems when attempting to load any model in LM Studio. Responses from@jedd1
and@heyitsyorkie
pointed towards a likely old-library issue, further described as aGLIBCXX
mismatch, and recommended checking the GLIBC version installed on the LinuxMint system.
Links mentioned:
- Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique: no description found
- Accelerating LLM Inference: Medusaās Uglier Sisters (WITH CODE): https://arxiv.org/abs/2401.10774https://github.com/evintunador/medusas_uglier_sisters
- Reddit - Dive into anything: no description found
LM Studio ā· #š¤-models-discussion-chat (7 messages):
-
IQ Versions for LMs Proposed:
@drawless111
suggested using variations of āIQā versions of LLMs, like IQ2 or IQ3, and potentially adding system prompts or pre-prompts to enhance performance at lower IQ levels. They mentioned that adding experts reduces the throughput/speed, so keeping the ānumber of expertsā to one might be beneficial. -
Open Source LLMs Pressure-tested:
@wolfspyre
shared a Reddit post discussing the results of pressure-testing various open-source Large Language Models (LLMs) using Gregory Kamradtās āNeedle In A Haystackā analysis and provided a video explanation. Models tested include extended and finetuned variants like NurtureAI openchat_3.5-16k, Orca-2-13B-16k, and others with context lengths ranging from 16k to 100k. -
In Search of the Best AI for Storytelling:
@laszlo01
inquired about the best AI for storytelling purposes, considering his systemās specifications which include an 11th Gen Intel i7 CPU and a NVIDIA GeForce RTX 3060 GPU.@jason_2065
recommended trying the modelmistral-7b-instruct-v0.2-neural-story.Q4_K_M.gguf
with 24 layers and an 8192 context size, and mentioned possibly needing a lower quantization for speed. -
Evaluating LLMsā Ability to Perform Arithmetic:
@nullt3r
is drafting a blog post about benchmarking LLMs such as Mixtral 8x7b in Q5_K_M quantization on basic arithmetic operations, challenging the common perception that LLMs are inherently poor at math. They highlighted the modelās near-perfect score on random math questions.
Links mentioned:
Reddit - Dive into anything: no description found
LM Studio ā· #š-hardware-discussion (17 messagesš„):
- Quest for More RAM:
@jason_2065
in search of constructing a system to run Smaug 34B with 200,000 context, discovers that 64GB of RAM is inadequateāeven a behemoth like RTX 4090 and 64GB DDR4 canāt handle more than a 20k context. - Crash Test Dummies:
@goldensun3ds
attempts to load Smaug with GPU layers but faces consistent crashes. A CPU-only test reveals a staggering 59GB RAM usage at 200K context, without loading any text. - Ultra-Smaug 128B, a beastly model mentioned by
@jason_2065
, remains a mystery, as the community has yet to test models larger than 70B due to hardware constraints. - Vying for Velocity:
@jason_2065
reports a sluggish 1.3 tokens/sec with 100,000 context size and 2 layers loaded on Smaug, unveiling the voracious VRAM appetite of context layers. - Overnight Challenge:
@goldensun3ds
commits to a marathon, vowing to fill close to the 200K context and run it overnight, while sharing a humorous test prompt story link for the community: Funny crypto bro story.
Links mentioned:
Imgur: The magic of the Internet: no description found
LM Studio ā· #open-interpreter (2 messages):
-
Syntax Struggles for
default_system_message
: User@nxonxi
expressed difficulty in finding the correct syntax to modifydefault_system_message
in different operating environments including Linux, Windows 10, and WSL, each presenting unique challenges. -
Clarifying the Role of
default_system_message.py
:@1sbefore
clarified thatdefault_system_message.py
isnāt fed directly as a preprompt to the LLM, but rather is edited by a script that substitutes variables with OS information. To understand the input better,@1sbefore
suggested launching LM Studio in verbose mode to view prompts history.
LAION ā· #general (142 messagesš„š„):
- Triple Encoder Text Model in Question:
@top_walk_town
discussed the potential endgame structure of text encoders, pondering if stringing three text encoders together is the final structure. In a follow-up message, they added that T5 can be removed at inference time. - Unique Velocity Sampling in Flows:
@pseudoterminalx
highlighted a particular ātrickā used in an unnamed piece of research, changing the distribution over timesteps when training velocity ( vĪ ), which assigns more weight to intermediate timesteps by sampling them more frequently. They later mentioned that V-prediction is showing competitiveness with rectified flows. - Googleās Model Distillation Method Revealed:
@pseudoterminalx
shared a GitHub link to a repository involving Googleās step-by-step distillation method. This method is mentioned in the context of model distillation without specifying whether it involves T5-XXL or another variant. - On the Utility of T5 for Diffusion Models: In a discussion involving several users,
@astropulse
,@nodja
, and@pseudoterminalx
conversed about the optionality of T5 in diffusion models, suggesting alternatives such as using T5 via the Hugging Face Inference API or running it on a CPU for improved inference times despite practical issues. - Efforts and Challenges in Low Resolution Adaptation:
@astropulse
shared enthusiasm for a GitHub project, res-adapter, which focuses on low resolution adaptation, allowing generation from SD1.5 down to 16x16 latents. Their excitement is attributed to the potential applications for personal projects.
Links mentioned:
- Jinx Elaine GIF - Jinx Seinfeld - Discover & Share GIFs: Click to view the GIF
- diffusers-play/scripts/encode.py at better-decoder Ā· Birch-san/diffusers-play: Repository with which to explore k-diffusion and diffusers, and within which changes to said packages may be tested. - Birch-san/diffusers-play
- Regulations.gov: no description found
- GitHub - lucidrains/magvit2-pytorch: Implementation of MagViT2 Tokenizer in Pytorch: Implementation of MagViT2 Tokenizer in Pytorch. Contribute to lucidrains/magvit2-pytorch development by creating an account on GitHub.
- GitHub - google-research/distilling-step-by-step: Contribute to google-research/distilling-step-by-step development by creating an account on GitHub.
- GitHub - itsme2417/PolyMind: A multimodal, function calling powered LLM webui.: A multimodal, function calling powered LLM webui. - GitHub - itsme2417/PolyMind: A multimodal, function calling powered LLM webui.
LAION ā· #research (4 messages):
- Reminder to avoid repeat posts:
@max_voltage
warned about possible spam due to repeated posts, but also acknowledged the new methods as cool. - Acknowledgment of error and correction:
@alex_cool6
apologized and took action by deleting a repeat post they had made. - Brief approval conveyed:
@chad_in_the_house
expressed enthusiasm with a short affirmation: āvery coolā. - Insight into Corrective Retrieval Augmented Generation:
@ariondas
shared a blog post discussing the shortcomings of Standard RAG techniques and introducing CRAG (Corrective Retrieval Augmented Generation). This piece is a deep dive into the research paper and scenarios where these techniques may fail.
OpenAccess AI Collective (axolotl) ā· #general (53 messagesš„):
- Exploring Model Merging and Fine-tuning:
@duke001
expressed curiosity about possibilities beyond fine-tuning in LLMs, such as merging model weights.@duke001
also shared a link to MergeKit on GitHub, a tool for merging pretrained large language models. - Claude-3ās Sensitivity Sparks Discussion:
@nafnlaus00
highlighted Claude-3ās higher response rates compared to other models and its stringent stance on racial issues, mentioned in an article by AI Explained. The balancing act between implementing āsafetiesā and BIllpotting biases was described as challenging for model developers. - Mining Motherboards for Inference Use:
@le_mess
inquired about the practicality of using a mining motherboard that supports five GPUs (potentially for AI inference tasks), found on AliExpress for 90 USD. The discussion also touched on NVLinkās benefits, underclocking GPUs for efficiency, and potential tax issues with eBay purchases. - Enhancing Datasets for Reasoning:
@caseus_
shared a link to a tweet about a Medium article explaining how to enrich datasets for improved reasoning. The discussion developed around the efficiency of using OpenAIās API for parsing LLM outputs and the advantages of models producing structured data. - Hardware Recommendations and Optimizations: In a series of messages,
@nafnlaus00
and@le_mess
exchanged tips on selecting GPUs for model training and inference, buying strategies, and the potential tax implications of purchases. The conversation also delved into the technological progression of PCIe slots and NVidiaās NVLink.
Links mentioned:
- Tweet from Wing Lian (caseus) (@winglian): Hereās a quick walkthrough of how you can enrich your existing datasets for improved reasoning. https://link.medium.com/sF0XCEQSIHb
- ZOTAC GAMING GeForce RTX 3090 AMP Extreme Holo [Open Box]: 2 years warranty (shipped in generic box with no accessories)
- GitHub - arcee-ai/mergekit: Tools for merging pretrained large language models.: Tools for merging pretrained large language models. - arcee-ai/mergekit
- no title found: no description found
OpenAccess AI Collective (axolotl) ā· #axolotl-dev (16 messagesš„):
-
Experimenting with LoRA+ Ratios:
@suikamelon
is testing the new LoRA+ ratio feature and suggests the learning rate should be decreased when using the recommended ratio. They refer to LoRA+ on GitHub and the original paper, noting that final results were similar across a range of ratios on structured 16k sequences with Mistral-7B. -
Exploring DoRAās Performance:
@caseus_
indicates the potential for DoRA to offer better accuracy over a range of ranks compared to LoRA. They shared insights from an article explaining the significance of LoRA and the promised benefits of recently proposed DoRA. -
LoftQ Requires Two-Step Process:
@suikamelon
mentioned excessive memory usage issues with LoftQ and shared a comment from GitHub suggesting incorrect initialization documentation, pointing to a GitHub pull request for a documentation fix and LoftQ finetuning examples. -
PEFT and DoRA Quantized Updates Pending:
@suikamelon
mentioned a quantized DoRA pull request on PEFT that is still in progress, linking the PR on GitHub.@caseus_
commented that the check will be removed once the PR is merged, hinting at an ongoing update.
Links mentioned:
- Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch: Low-rank adaptation (LoRA) is a machine learning technique that modifies a pretrained model (for example, an LLM or vision transformer) to better suit a specific, often smaller, dataset by adjusting oā¦
- peft/examples/loftq_finetuning at main Ā· huggingface/peft: š¤ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft
- LoftQ does not seem to quantify the base model Ā· Issue #1525 Ā· huggingface/peft: System Info transformers version: 4.37.2 Platform: Ubuntu 18.04.6 LTS GPU: RTX GeForce 3090 x 2 Python version: 3.10.13 Huggingface_hub version: 0.20.3 Safetensors version: 0.4.2 Accelerate versionā¦
- WIP Fix LoftQ docs and tests by BenjaminBossan Ā· Pull Request #1532 Ā· huggingface/peft: Relates to #1525 Donāt merge this, some GPU tests are failing Unfortunately, the docs I wrote about how to use LoftQ were incorrect, based on a misunderstanding I had. In reality, it is quite a biā¦
- support for DoRA w/ PEFT (#1363) Ā· OpenAccess-AI-Collective/axolotl@0cfdb2c: no description found
- support for DoRA w/ PEFT (#1363) Ā· OpenAccess-AI-Collective/axolotl@0cfdb2c: no description found
OpenAccess AI Collective (axolotl) ā· #general-help (12 messagesš„):
- Troubleshooting Finetuning on Mixtral Model:
@seungduk
requested a deepspeed config for finetuning a Mixtral model using H100 x 8 and encountered issues withsave_safetensors
. Despite setting it to false, Axolotl was still saving in safetensors format. - Potential Solution to Safetensors Format Issue:
@nanobitz
clarified a possible configuration misunderstanding that an emptysave_safetensors
could be interpreted as true.@seungduk
confirmed trying both an explicit false and an empty config. - Removing safetensors File Resolves Training Issue:
@seungduk
identified the creation of an extramodel.safetensors
file as the source of their problem. Once removed, they were able to further train an already-trained model without the out-of-memory (OOM) issue. - Deepspeedās Config and Model Saving Quirks:
@caseus_
pointed out that, with zero3, the Huggingface (hf) trainer tends to save the wrapped model and inquired about the setting ofstage3_gather_16bit_weights_on_model_save
.@seungduk
confirmed it was set to true in their deepspeed json. - Resolution and Reference to Config Details:
@seungduk
shared a link to the relevant GitHub config file after resolving the issue by saving in traditional pytorch.bin format instead of safetensors.
Links mentioned:
axolotl/deepspeed_configs/zero3_bf16.json at main Ā· OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
OpenRouter (Alex Atallah) ā· #announcements (1 messages):
- Claude 3 Makes Group Chat a Breeze:
@alexatallah
shared a positive experience about group chatting with Claude 3, which self-moderates the conversation. They included a link to a Twitter story showcasing the functionality.
OpenRouter (Alex Atallah) ā· #general (78 messagesš„š„):
-
Question about Claude Versions:
@quentmaker
inquired about the difference betweenanthropic/claude-2.0
andanthropic/claude-2
, with@alexatallah
and@wikipediadotnet
clarifying thatClaude-2
automatically selects the latest2.x
version. -
Uncertain Costs with Multithreading:
@mhmm0879
expressed concern about actual costs exceeding predicted ones when using multithreading withgemma 7b
andopenchat 3.5
.@alexatallah
and@louisgv
inquired about the specific use case and whether images were being sent to try and diagnose the issue. -
Claude and Censorship Chat: Users
@followereternal
,@ayumeri
,@billbear
, and@scepty9097
had a mixed discussion onClaude 3
, with some expressing disapproval of potential over-censorship and others praising the model for its conversational capabilities. -
LangChain.js Issues with OpenRouter:
@mysticfall
pointed out difficulties using LangChain.js withOpenRouter's
ChatOpenAI
model for text completion.@spaceemotion
mentioned that the endpoint for text completion might be marked as ālegacyā by OpenAI, and@mysticfall
noted potential problems due to hardcoded endpoints in OpenAIās library. -
Exploration of VSCode Extensions for OpenRouter:
@_maximus01
inquired about a VSCode extension for code assistance that integrates with OpenRouter, leading to suggestions from@alexatallah
about sponsoring such work, and@spaceemotion
and@_sam___
sharing potential alternatives and an active GitHub project.
Links mentioned:
- Home | Tabby: Description will go into a meta tag in <head />
- Configuration | Continue: Configure your LLM and model provider
- Perplexity: Sonar 8x7B by perplexity | OpenRouter: Sonar is Perplexityās latest model family. It surpasses their earlier models in cost-efficiency, speed, and performance. The version of this model with Internet access is [Sonar 8x7B Online](/moā¦
- Continue: no description found
- GitHub - 0xk1h0/ChatGPT_DAN: ChatGPT DAN, Jailbreaks prompt: ChatGPT DAN, Jailbreaks prompt. Contribute to 0xk1h0/ChatGPT_DAN development by creating an account on GitHub.
- GitHub - continuedev/continue: ā© The easiest way to code with any LLMāContinue is an open-source autopilot for VS Code and JetBrains: ā© The easiest way to code with any LLMāContinue is an open-source autopilot for VS Code and JetBrains - continuedev/continue
LangChain AI ā· #general (61 messagesš„š„):
-
LangChain and Function Implementation Assistance:
@vishal5795
enquired about integrating function roles into messages using LangChain and OpenAIāsChatCompletion.create()
.@chester3637
provided a detailed Python example using LangChain that demonstrates calling an AI message as a function (LangChain Core Example). -
Seeking Tech Task Partner:
@mattew_999
announced that they are looking for a partner to work on tech tasks, emphasizing that it is a paid opportunity. -
Inquiry About New Partnerships:
@earduman2
asked if LangChain is open for new chain partnerships, sparking a clarification request from@baytaew
. -
FastAPI Sporadic Issues:
@rajib2189
reported sporadic 502 errors when using FastAPI to host generation APIs under heavy load, especially in an AWS ELB -> Apache Server -> Uvicorn setup. -
Interest in GPT-4 Fine-Tuning Access:
@8886600
expressed a desire to gain access to GPT-4 fine-tuning capabilities, mentioning a willingness to pay for an API key with a usage limit.
Links mentioned:
- Chat LangChain: no description found
- Google Colaboratory: no description found
- LangChain Expression Language (LCEL) | š¦ļøš Langchain: LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together.
- Few-shot and function calling: The thing to understand here is that function calling introduced a new role for the chat prompt messages (āroleā: āfunctionā). To use few-shot examples with chat model prompts you provide a series ofā¦
- Pydantic compatibility | š¦ļøš Langchain: - Pydantic v2 was released in June, 2023 (https://docs.pydantic.dev/2.0/blog/pydantic-v2-final/)
- Google Colaboratory: no description found
- LangSmith: no description found
- Retrieval augmented generation (RAG) | š¦ļøš Langchain: Letās now look at adding in a retrieval step to a prompt and an LLM, which adds up to a āretrieval-augmented generationā chain:
- Azure AI Search | š¦ļøš Langchain: [Azure AI
- langchain_agent/assistant at master Ā· couthyapper7/langchain_agent: a csv reader made in langchain with a fine tuned gpt - couthyapper7/langchain_agent
LangChain AI ā· #langserve (1 messages):
- Caching Feature yet to Work with Streaming:
@veryboldbagel
mentioned that caching does not work with streaming mode as of now. The issue is associated with langchain-core, not langserve.
LangChain AI ā· #share-your-work (6 messages):
-
Injecting Humor into AI Art:
@neil6430
experimented with a new control net block from ML Blocks to create an amusing image of a chicken performing stand-up comedy with a Seinfeld posture. They shared their excitement about the feature and provided a link to ML Blocks, a tool for building modular, AI-powered image processing workflows without coding. -
Lutra Revolutionizes Workflow Automation:
@polarbear007.
introduced Lutra.ai, a platform designed to transform English instructions into code, automating task completion by orchestrating various apps, likening it to a more potent version of Zapier. -
Raptor Reveals Secrets of Long Context RAG:
@andysingal
shared a Medium article about building a Long Context Retrieval-Augmented Generation (RAG) from scratch using RAPTOR with Langchain. -
ChromaDB joins LM Studio:
@vic49.
provided a GitHub link to the ChromaDB Plugin for LM Studio, enabling the creation of a ChromaDB vector database for server mode operations.
Links mentioned:
- ML Blocks | Home: ML Blocks lets you build AI-powered image generation and analysis workflows, without writing any code.
- Lutra AI: no description found
- Releases Ā· BBC-Esq/ChromaDB-Plugin-for-LM-Studio: Plugin that creates a ChromaDB vector database to work with LM Studio running in server mode! - BBC-Esq/ChromaDB-Plugin-for-LM-Studio
LangChain AI ā· #tutorials (1 messages):
pradeep1148: https://www.youtube.com/watch?v=QPZpOBxUd1U
CUDA MODE ā· #cuda (8 messagesš„):
- Exploring Root Access on RunPod:
@ericauld
inquired about the possibility of running as root on RunPod, to which@nshepperd
clarified that RunPod offers a docker image instead of a real VM, thus root isnāt actually root in this context. - Bandwidth Quest for H100 SRAM:
@lucaslingle
sought information on the SRAM bandwidth of the NVIDIA H100, noting a lack of recent sources after a GTC talk mentioned 19TB/s for A100.@iron_bound
provided assistance by referencing a Chips and Cheese article that states H100ās L2 cache has a 5.5 TB/s read bandwidth. - Benchmarking RTX 4090: In response to the SRAM bandwidth discussion,
@zippika
highlighted the RTX 4090ās L1 bandwidth performance, referencing another Chips and Cheese article that focuses on Nvidiaās Ada Lovelace architecture and the new raytracing improvements. - H100 Bandwidth Assumptions:
@zippika
estimated that the H100 bandwidth could be comparable to the RTX 4090, mentioning an L1 bandwidth of 40TB/s based on their findings and assuming the H100 may align with this performance metric.
Links mentioned:
- Microbenchmarking Nvidiaās RTX 4090: Nvidiaās RTX 4090 features Nvidiaās newest architecture, named Ada Lovelace after a pioneer in early computing. Compared to their previous architecture, Ampere, Ada Lovelace enjoys a prā¦
- Nvidiaās H100: Funny L2, and Tons of Bandwidth: GPUs started out as devices meant purely for graphics rendering, but their highly parallel nature made them attractive for certain compute tasks too. As the GPU compute scene grew over the past couā¦
CUDA MODE ā· #torch (9 messagesš„):
- GPU Tensor Allocation Misstep:
@zippika
helped clarify that a tensor was not allocated on a CUDA device because@srns27
forgot to set theTensorOptions
, causing it to default to the CPU. - Friendliness in Debugging:
@zippika
offered a kind response to@srns27
, indicating that everyone makes mistakes and highlighting the cooperative nature of the torch community. - The Search for Higher Abstraction in Math Operations:
@mabeto5p
inquired about high-level languages or packages to perform linear algebra on low-precision integers and floating-point operations on NVIDIA Ada architecture. - Leveraging bitsandbytes for Quantization:
@iron_bound
suggested@mabeto5p
use the bitsandbytes package for handling k-bit quantization in PyTorch to perform low-precision linear algebra operations on GPUs. - Breakthrough in Quantization Speed:
@mabeto5p
expressed excitement over discovering the potential to achieve a 5700x speedup in int8 versus bf16 matrix multiplication, after being pointed to the bitsandbytes resource by@iron_bound
.
Links mentioned:
GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for PyTorch.: Accessible large language models via k-bit quantization for PyTorch. - TimDettmers/bitsandbytes
CUDA MODE ā· #algorithms (3 messages):
- Mask Efficiency Matters:
@drisspg
highlighted the inefficiency of generic masking in compute, as it requires processing every mask element, even when itās unnecessary. - Sliding Window PR Adds Color:
@drisspg
updated the sliding window attention bias pull request on PyTorchās GitHub, adding more details to the description. The PR is available to review here. - Score-Mod API to Optimize Bias:
@drisspg
discussed the addition of the score-mod API as a means to efficiently fuse constraints on the bias into the flash_attention algorithm without fully materializing the entire bias.
Links mentioned:
Add sliding window attention bias by drisspg Ā· Pull Request #120143 Ā· pytorch/pytorch: Summary This PR adds a new attnetion-bias torch_function designed to interact with SDPA. This implements sliding window and updates āaten.sdpa_flashā to expose the window_size_left and windā¦
CUDA MODE ā· #jobs (1 messages):
bowtiedlark: Remote?
CUDA MODE ā· #beginner (2 messages):
- CUDA for Beginners: User
@umerha
recommended Jeremyās videos as a starting point for learning aboutnumba.cuda.jit
. The suggested resources can be found in Lecture 3 and 5. - Gratitude for Learning Resources: User
@hoteret
expressed their thanks for the CUDA learning resources shared by@umerha
.
CUDA MODE ā· #ring-attention (28 messagesš„):
- Script Tweaking for Device Testing:
@iron_bound
discussed adding device IDs to a script and planned to test single devices followed by other ring functions. - Sampling Code Introduced with Glitches:
@jamesmel
shared a GitHub Pull Request for a first attempt at sampling code, while also mentioning errors with some parameters that are being investigated. - Benchmarks Completed for Striped and Zigzag:
@iron_bound
reported that the testing on the runpod box revealed striped and zigzag have the same memory ceiling, showing specific memory usage for two CUDA devices. - Opening Up the Axolotl Training: A link was shared by
@andreaskoepf
to the OpenAccess-AI-Collectiveās Axolotl GitHub repository, and@iron_bound
also mentioned successful Open Llama 3B training. - Troubleshooting Ring Attention and Sampling Logic: Discussions revolved around debugging the custom attention library by
@iron_bound
and efforts by@jamesmel
and@andreaskoepf
to get the sampling code to work properly, with plans to discuss and clarify the implementation in an upcoming meeting.
Links mentioned:
- BASED: Simple linear attention language models balance the recall-throughput tradeoff: no description found
- few more versions of sampling by melvinebenezer Ā· Pull Request #13 Ā· cuda-mode/ring-attention: Sampling of logits in Ring Attention Greedy top_k top_p
- GitHub - OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
LLM Perf Enthusiasts AI ā· #claude (7 messages):
- Opus catches attention in coding community: User
@pantsforbirds
mentioned that Opus seems to be very promising for coding, sparking a conversation on its capabilities. - Peer approval for Opus in function calling:
@res6969
contributed to the conversation by sharing that theyāve heard high praise for Opusās performance, especially in function calling, indicating Opus might be best in its class. - GPT-4 excels in medical knowledge: In terms of technical knowledge in medicine and biology,
@thebaghdaddy
has found that GPT-4 is SIGNIFICANTLY better than its predecessors, leading to a shock at the performance difference. - Benchmarks under scrutiny: Following their experience,
@thebaghdaddy
expressed skepticism about the general reliability of the published benchmarks, suggesting they might not fully reflect the capabilities of the newer models. - Opus aced SAT Reading:
@jeffreyw128
shared an impressive outcome where Opus scored an 800 on the SAT Reading section, demonstrated in a Twitter post which can be found here. This share sparked a discussion about the challenges in creating holdouts to avoid memorization given the size of the newer models.
LLM Perf Enthusiasts AI ā· #prompting (2 messages):
- Seeking Wisdom on Citations in RAG Outputs:
@mat_mto
inquired about resources like blogs or tweets that provide tips on formatting citations and footnotes in RAG-generated text. They shared an example of text output with footnotes pointing to web search results. - JSON Object for Clear Source Attribution: In response,
@res6969
mentioned their use of function calling that outputs JSON objects containing both the text and the sources. This method allows for clear attribution of information to its web sources.
Datasette - LLM (@SimonW) ā· #ai (8 messagesš„):
-
Clarifying AI Terminology:
@simonw
emphasized the importance of distinguishing between prompt injection and jailbreaking, explaining that prompt injection involves concatenating untrusted user input with a developerās trusted prompt, whereas jailbreaking tries to bypass the LLMās safety filters itself. He provides a detailed explanation and historical context in his blog post. -
AI in the Hands of Cyber Threat Actors:
@tariqali
shared insights from a Microsoft blog post about state-backed actors using OpenAIās LLMs for cyber activities including reconnaissance and spear phishing, mentioning one instance where an actor was blocked from prompting the model with malicious intent. -
The Dual-Use Dilemma of LLMs: Addressing the risks associated with AI,
@tariqali
referred to a research post on OpenAIās attempts to create an early warning system for LLM-aided biological threats, highlighting a comparative study between using the Internet alone and using it alongside GPT-4 for task-solving, which can be found here. -
Access Control as a Mitigation Strategy:
@tariqali
suggested that prompt injection could be mitigated by controlling who gets access to the LLM, proposing human review of content as a potential layer of defense to sanitize inputs before they reach the AI. -
Invisible Prompt Injection Challenges:
@simonw
pointed out the limitations of human review to prevent prompt injections, using the case of invisible prompt injections hidden in off-white text on images as an example, which can be a threat even in multi-modal versions of GPT like GPT-4-V as discussed in his blog post.
Links mentioned:
- Prompt injection and jailbreaking are not the same thing: I keep seeing people use the term āprompt injectionā when theyāre actually talking about ājailbreakingā. This mistake is so common now that Iām not sure itās possible to correct course: ā¦
- Building an early warning system for LLM-aided biological threat creation: Weāre developing a blueprint for evaluating the risk that a large language model (LLM) could aid someone in creating a biological threat.Ā In an evaluation involving both biology experts and students, ā¦
- Multi-modal prompt injection image attacks against GPT-4V): GPT4-V is the new mode of GPT-4 that allows you to upload images as part of your conversations. Itās absolutely brilliant. It also provides a whole new set of vectors ā¦
- Staying ahead of threat actors in the age of AI | Microsoft Security Blog: Microsoft, in collaboration with OpenAI, is publishing research on emerging threats in the age of AI, focusing on identified activity associated with known threat actors Forest Blizzard, Emerald Sleetā¦
Datasette - LLM (@SimonW) ā· #llm (1 messages):
- Seeking Consensus on Model File Locations:
@florents_
inquired about whether there is a consensus or specific piece of code that dictates where various tools search for model files, suggesting possible locations like$(pwd)/.models
or$HOME/models
. No further discussion or responses were provided.
DiscoResearch ā· #general (9 messagesš„):
- Exploring Chatbot Environments:
@crispstrobe
mentioned that chat.lmsys.org allows for testing with the caveat of including inputs in later training data, and highlighted poe.com, which hosts three models including a perplexity feature. - Quest for the Best German Model:
@le_mess
inquired about the best current German model;@johannhartmann
recommended Claude Opus, gpt-4, discolm-120b, or VAGOsolutions/Sauerkraut LM-UNA-SOLAR-Instruct depending on specific constraints. - Fresh Off the Press:
@maxidl
shared an arxiv paper that suggests retrieval-augmented language models could potentially be a superior alternative to parametric LMs, though the research in this area is not yet extensive. - High Praise for Hermes and Mixtral:
@cybertimon
recommended using Nous Hermes 2 Mixtral 8x7b for German tasks, noting its proficiency in the language. - Searching for Flawlessness in 7 Billion Parameters:
@johannhartmann
and@flozi00
responded to queries about high-quality German models, with Johannhartmann suggesting DiscoResearch/DiscoLM_German_7b_v1 and similar models, and flozi00 endorsing Nous Hermes 2 Mixtral 8x7b for its accuracy.
Links mentioned:
Reliable, Adaptable, and Attributable Language Models with Retrieval: Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, diā¦
Alignment Lab AI ā· #general-chat (1 messages):
- Warm Welcome to Newcomer: User
@segmentationfault.
expressed gratitude for being invited by@748528982034612226
and showed eagerness to contribute to the field despite being new to it. No further information on contributions or areas of interest was provided.
Alignment Lab AI ā· #oo2 (3 messages):
- A Warm Henlo:
@thenetrunna
kicked off the conversation with a friendly āhenlo frens,ā setting a casual tone in channel oo2. - Welcoming Replies in the Evening:
@jaxxks
responded in the evening, appreciating the welcome from@thenetrunna
. - Greeting the Group:
@tcapelle
joined the conversation with a cheery āHello every1!ā indicating a stream of introductions and greetings among participants.
Skunkworks AI ā· #off-topic (2 messages):
-
Introducing Claude 3, the LLM That Surpasses GPT-4:
@pradeep1148
shared a YouTube video, titled āIntroducing Claude 3 LLM which surpasses GPT-4ā. The video discusses the Claude 3 model family, which reportedly sets new benchmarks across various cognitive tasks. -
How to Develop with Mistral: Another YouTube link was shared by
@pradeep1148
titled āInfinite Craft Game using Mistralā. It talks about developing Neal Agarwalās web game Infinite Craft using the Mistral model.
Links mentioned:
- Infinite Craft Game using Mistral: Let develop Neal Agarwalās web game Infinite Craft. This is a ācrafting gameā where you start with just four elements and repeatedly combine pairs of elementā¦
- Introducing Claude 3 LLM which surpasses GPT-4: Today, weāre look at the Claude 3 model family, which sets new industry benchmarks across a wide range of cognitive tasks. The family includes three state-ofā¦
Interconnects (Nathan Lambert) ā· #ideas-and-feedback (1 messages):
- Intel Faces a Reality Check:
@natolambert
shared a YouTube video titled āIntelās Humblingā by Stratechery with Ben Thompson providing voiceover, suggesting it offered valuable insights without making him feel like āa total idiot.ā The video explores the challenges Intel has faced and includes a link to the accompanying article for a deeper read.
Links mentioned:
Intelās Humbling | Stratechery by Ben Thompson: Read the Article: https://stratechery.com/2024/intels-humbling/Links: Stratechery: https://stratechery.comSign up for Stratechery Plus: https://stratechery.cā¦
Interconnects (Nathan Lambert) ā· #reads (1 messages):
- Reflecting on the Obscurities of AI:
@natolambert
recommends a thought-provoking post by Elad Gil, highlighting how generative AI tends to become more puzzling over time. The post raises open questions at each level of the AI stack, aiming to stir conversation and provide insights.
Links mentioned:
Things I Donāt Know About AI: The more I learn about AI markets, the less I think I know. I list questions and some thoughts.