Frozen AI News archive

OpenAI's Instruction Hierarchy for the LLM OS

**OpenAI** published a paper introducing the concept of privilege levels for LLMs to address prompt injection vulnerabilities, improving defenses by 20-30%. **Microsoft** released the lightweight **Phi-3-mini** model with 4K and 128K context lengths. **Apple** open-sourced the **OpenELM** language model family with an open training and inference framework. An instruction accuracy benchmark compared 12 models, with **Claude 3 Opus**, **GPT-4 Turbo**, and **Llama 3 70B** performing best. The **Rho-1** method enables training state-of-the-art models using only 3% of tokens, boosting models like **Mistral**. **Wendy's** deployed AI-powered drive-thru ordering, and a study found **Gen Z** workers prefer generative AI for career advice. Tutorials on deploying **Llama 3** models on AWS EC2 highlight hardware requirements and inference server use.

Canonical issue URL

In general, every modern operating system has the concept of "protection rings", offering different levels of privilege on an as-needed basis:

image.png

Until ChatGPT, models trained as "spicy autocomplete" were always liable to prompt injections:

image.png

so the solution is of course privilege levels for LLMs. OpenAI published a paper on how they think about it for the first time:

image.png

This is presented as an alignment problem - each level can be aligned or misaligned, and the reactions to misalignment can either be ignore and proceed or refuse (if no way to proceed). The authors synthesize data to generate decompositions of complex request, placed at different levels, varied for alignment and injection attack type, applied on various domains.

The result is a general system design for modeling all prompt injections, and if we can generate data for it, we can model it:

image.png

With this they can nearly solve prompt leaking and improve defenses by 20-30 percentage points.

As a fun bonus, the authors find that just adding the instruction hierarchy in the system prompt LOWERS performance for baseline LLMs but generally improves Hierarchy-trained LLMs.

image.png


Table of Contents

[TOC]


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Models and Benchmarks

AI Applications and Use Cases

AI Research and Techniques


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Models and Architectures

AI Companies and Funding

AI Research and Techniques


AI Discord Recap

A summary of Summaries of Summaries

1. New AI Model Releases and Benchmarking

2. Efficient Inference and Quantization Techniques

3. RAG Systems, Multi-Modal Models, and Diffusion Advancements

4. Prompt Engineering and LLM Control Techniques


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Perplexity AI Discord

Perplexity Rolls Out New Pro Service: Perplexity has launched Perplexity Enterprise Pro, touting enhanced data privacy, SOC2 compliance, and single sign-on capabilities, with companies like Stripe, Zoom, and Databricks reportedly saving 5000 hours a month. Engineers looking for corporate solutions can find more details and pricing at $40/month or $400/year per seat.

Funding Fuels Perplexity's Ambitions: Perplexity AI has closed a significant funding round, securing $62.7M and attaining a valuation of $1.04B, with notable investors including Daniel Gross and Jeff Bezos. The funds are slated for growth acceleration and expanding distribution through mobile carriers and enterprise partnerships.

AI Model Conundrums and Frustrations: Lively discussions evaluated AI models like Claude 3 Opus, GPT 4, and Llama 3 70B, with users pointing out their various strengths and weaknesses, while voicing exasperation about the message limit in Opus. Further, the community tested various AI-powered web search services, such as you.com and chohere, noting performance variances.

API Developments and Disappointments: On the API front, requests abound for an API akin to GPT that can scour the web and stay current, leading users to explore Perplexity's sonar online models and sign up for citations access. The conversation included a clarification that image uploads are not supported by the API now or in the foreseeable future, with llama-3-70b instruct and mixtral-8x22b-instruct suggested for coding tasks.

Perplexity's Visibility and Valuation Soars: The enterprise's valuation has surged to potentially $3 billion as they seek additional funding after a leap from $121 million to $1 billion. Srinivas, CEO, shared this jump on Twitter and discussed Perplexity AI's position in the AI technology race against competitors like Google in a CNBC interview. Meanwhile, users explore capabilities and report visibility issues with Perplexity AI searches, as seen with search results and less clear visibility issues.


Nous Research AI Discord


LM Studio Discord


OpenAI Discord


CUDA MODE Discord

Lightning Strikes on CUDA Verification: Lightning AI users have faced a complex verification process, leading to recommendations to contact support or tweet for expedited service. Lightning AI staff responded by emphasizing the importance of meticulous checks, partly to prevent misuse by cryptocurrency miners.

Sync or Swim in CUDA Development: Developers shared knowledge on CUDA synchronization, cautioning against using __syncthreads post thread exit and noting Volta's enforcement of __syncthreads across active threads. A link to a specific GitHub code snippet was shared for further inspection.

Coalescing CUDA Knowledge: The CUDA community engaged in discussions about function calls affecting memory coalescing, the role of .cuh files, and optimization strategies, with an emphasis on profiling using tools like NVIDIA Nsight Compute. For practical query, resources were pointed to the COLMAP MVS CUDA project.

PyTorch Persists on GPU: PyTorch operations were affirmed to stay entirely on the GPU, highlighting the seamless and asynchronous nature of operations like conv2d, relu, and batchnorm, and negating the need for CPU exchanges unless synchronization-dependent operations are invoked.

Tensor Core Evolves, GPU Debates Heat Up: Conversations about Tensor Cores revealed performance doubling from the 3000 to 4000 series. Cost versus speed was debated with the 4070 Ti Super being a focal point for its balance of cost and next-gen capabilities, despite a more complex setup than its older counterparts.

CUDA Learning in an Educational Spotlight: A Google Docs link was provided for a chapter discussion, and Kernel code optimizations with scarce documentation like flash decoding became potential topics for a guest speaker like @tri_dao.

CUDA's Teaching Potential Mentioned: The community underlined the educational promise of CUDA kernel implementations, alluding to their inclusion in university curricula, and pointing towards a didactic exploration of parallel programming. Suggestions included leveraging llm.c as course material.

A Smooth Tune for Learning CUDA: "Lecture 15: CUTLASS" was released on YouTube, featuring new intro music with classic gaming vibes, available at this Spotify link.

Mixed Precision Gains Momentum: Microsoft's BitBLAS library caught attention for its potential in facilitating quantized LLM deployment, with TVM as a backend consideration for on-device inference and mixed-precision operations like the triton i4 / fp16 fused gemm.

Precision and Speed Debate in LLM: FP8 performance measurements of 29.5ms compared to BF16's 43ms sparked discussions on the potential and limitations of precision reduction. The importance of deterministic losses across batch sizes was noted, with loss inconsistencies prompting investigations into CUBLAS_PEDANTIC_MATH and intermediate activation data.


Eleuther Discord

Boosting Image Model Open Source Efforts: The launch of ImgSys, an open source generative image model arena, was announced with detailed preference data available on Hugging Face. Additionally, the Open CoT Leaderboard, focusing on chain-of-thought (CoT) prompting for large language models (LLMs), has been released, showing accuracy improvements through enhanced reasoning models, although the GSM8K dataset's limitation to single-answer questions was noted as a drawback.

Innovations in AI Scaling and Decoding: Research presented methods for tuning LLMs to behavioral principles without labels or demos, specifically an algorithm named SAMI, and NVIDIA's Align Your Steps to quicken DMs' sampling speeds Align Your Steps research. Facebook detailed a 1.5 trillion parameter recommender system with a 12.4% performance boost Facebook's recommender system paper. Exploring copyright issues, an economic approach using game theory was proposed for generative AI. Concern grew over privacy vulnerabilities in AI models, highlighted by insights into extracting training data.

Considerations on AI Scaling Laws: An energetic discussion on AI scaling law models emphasized the fitting approach and whether residuals around zero suggested superior fits, as well as the implications of omitting data during conversions for analysis Math Stack Exchange discussion on least squares. Advocacy appeared for omitting smaller models from the analysis due to their skewing influence on the results and a critique identified potential issues with a Chinchilla paper's confidence interval interpretation.

Tokenization Turns Perplexing: Tokenization practices caused debate, highlighting inconsistencies between tokenizer versions and changes in space token splitting. A frustration was expressed about the lack of communication on breaking changes from the developers of tokenizers.

Combining Token Insights with Model Development: GPT-NeoX developers tackled integrating RWKV and updating the model with JIT compilation, fp16 support, pipeline parallelism, and model compositional requirements GPT-NeoX Issue #1167, PR #1198. They sought to ensure AMD compatibility for wider hardware support and deliberated model training consistency amidst tokenizer version changes.


Stability.ai (Stable Diffusion) Discord

Portraits Pop in Photorealism: Juggernaut X and EpicrealismXL stand out for generating photo-realistic portraits in Forge UI, though RealVis V4.0 is gaining traction for delivering high-quality results with simpler prompts. The steep learning curve for Juggernaut has been noted as a point of frustration among users.

Forge UI Slays the Memory Monster: A lively debate centers on the trade-offs between Forge UI's memory efficiency and A1111's performance, with a nod to Forge UI's suitability for systems with less VRAM. Despite preferences for A1111 from some users, concerns about potential memory leaks in Forge UI persist.

Mix and Match to Master Models: Users are exploring advanced methods to refine model outputs by combining models using Lora training or dream booth training. This approach is particularly useful for honing in on specific styles or objects while enhancing precision, with techniques like inpaint, bringing additional improvements to facial details.

Stable Diffusion 3 Anticipation and Access: The community buzzes with anticipation for the upcoming Stable Diffusion 3.0, discussing limited API access and speculating on potential costs for full utilization. Current access to SD3 appears constrained to an API with limited free credits, fostering discussions regarding future licensing and use.

Resolution to the Rescue: To combat issues with blurry Stable Diffusion outputs, higher resolution creation and SDXL models in Forge are proposed as solutions. The community is dissecting the potentials of fine-tuning, with tools like Kohya_SS to help guide those looking to push the boundaries of image clarity and detail.


HuggingFace Discord


LAION Discord

MagVit2's Update Quandary: Engineers raise questions about the magvit2-pytorch repository; skepticism exists regarding its ability to match scores from the original paper since its last update was three months ago.

Creative AIs Going Mainstream?: Adobe reveals Adobe Firefly Image 3 Foundation Model, claiming to take a significant leap in creative AI by providing enhanced quality and control, now experimentally accessible in Photoshop.

Resolution Revolution or Simple Solution?: HiDiffusion promises enhanced resolution and speed for diffusion models with minimal code alteration, sparking discussions about its applicability; yet some expressed doubt on improvements with a "single line of code".

Apple's Visual Recognition Venture: A member shared insight into Apple's CoreNet, a model seemingly focused on CLIP-level visual recognition, discussed without further elaboration or a direct link.

MoE Gets an Intelligent Overhaul: The new Multi-Head Mixture-of-Experts (MH-MoE) enhances Sparse MoE (SMoE) models by improving expert activation, offering a more nuanced analytical understanding of semantics, as detailed in a recent research paper.


OpenRouter (Alex Atallah) Discord


Modular (Mojo 🔥) Discord

Benchmarks and Brains Debate on Conscious AI: Skepticism was noted surrounding AI achieving artificial consciousness, with discussions focusing on the need for advancements in quantum or tertiary computing versus software innovations alone. References were made to quantum computing's perceived shortcomings for AI development due to its indeterminate nature, and the seldom-mentioned tertiary computing with a link to Setun, an early ternary computer.

Random Number Generation Gets Optimized: Deep dives into the performance of the random.random_float64 function revealed it to be suboptimal, prompting community action via a bug report on ModularML Mojo GitHub. Recommendations for future RNGs were to include both high-performance and cryptographically secure options.

Pointers and Parameters Take Center Stage: Mojo community contributors shared insights and code examples using pointers and traits, discussing issues like segfaults with UnsafePointer and implementation differences between nightly and stable Mojo versions. A generic quicksort algorithm for Mojo was shared, highlighting how pointers and type constraints work in practice.

Challenges in Profiling and Heap Allocation: In Modular's #[community-projects], techniques for tracking heap allocations using xcrun, and profiling challenges were shared, indicating the practical struggles AI engineers face in optimization. A new community project, MoCodes, which is a computing-intensive Error Correction (De)Coding framework developed in Mojo, was introduced and is accessible at MoCodes on GitHub.

Clandestine Operations with Strings and Compilers: Concerns were raised in #[nightly] about treating an empty string as valid and differentiating String() from String("") due to C interoperability issues. A bug report for printing empty strings causing future prints to be corrupted was mentioned, alongside discussions over null-terminated string problems and their impact on Mojo's compiler and standard library, with a specific stdlib update referenced at ModularML Mojo pull request.

Mojo Hits a Milestone at PyConDE: Mojo, described as "Python's faster cousin," was featured at PyConDE, marking its first year with a talk by Jamie Coombes. Community sentiment was explored, noting skepticism from some quarters, such as the Rust community, about Mojo's potential, with the talk accessible here.


OpenAccess AI Collective (axolotl) Discord

Llama-3's Learning Curve: Observations within the axolotl-dev channel flagged an increased learning rate as the culprit for gradual loss divergence in the llama3 BOS fix branch. To ameliorate out-of-memory concerns on the yi-200k models due to sample packing inefficiencies, shifting to paged Adamw 8bit optimizer was recommended.

Medical AI Makes Strides: Internist.ai 7b, a model specializing in the medical field, now boasts a performance surpassing GPT-3.5 after being blindly evaluated by 10 medical doctors, signaling an industry shift towards more curated datasets and expert-involved training methods. Access the model at internistai/base-7b-v0.2.

Phi-3 Mini's GPU Gluttony: The Phi-3 model updates stirred conversation in the general channel, revealing its hefty demand for 512 H100-80G GPUs for adequate training—a stark contrast to initial expectations of modest resource needs.

Optimization Overdose: AI aficionados in the community-showcase channel celebrated the release of OpenELM by Apple, and the buzz around Snowflake's 408B Dense + Hybrid MoE model. On a related note, tech enthusiasts were also amped about the new features released with PyTorch 2.3.

Toolkit Tussle – Unsloth vs. Axolotl: In the rlhf channel, members pondered over the suitable toolkit between Unsloth and Axolotl, considering Sequential Fine-Tuning (SFT) and Decision Process Outsourcing (DPO) applications to select the most effective library for their work.


LlamaIndex Discord


Interconnects (Nathan Lambert) Discord


OpenInterpreter Discord

TTS Innovations and Pi Prowess: Engineers discussed RealtimeTTS, a GitHub project for live text-to-speech, as a more affordable solution than offerings like ElevenLabs. A guide for starting with Raspberry Pi 5 8GB running Ubuntu was highlighted alongside shared expertise on utilizing Open Interpreter with the hardware, detailed in a GitHub repo.

OpenInterpreter Explores the Clouds: There was an expressed interest in deploying OpenInterpreter O1 on cloud platforms, with mentions of brev.dev compatibility and inquiries into Scaleway. Local voice control advancements were noted with Home Assistant's new voice remote, suggesting implications for hardware compatibility.

Approaching AI-Hardware Frontier: Members shared progress on manufacturing the 01 Light device, including an announcement for an event on April 30th to discuss details and roadmaps. Conversations also included utilizing AI on external devices such as the "AI Pin project" and an example showcased in a Twitter post by Jordan Singer.

Accelerating AI Inferencing: The potential use of OpenVINO Toolkit for optimizing AI inference in stable diffusion implementations was discussed. The cross-platform ONNX Runtime was referenced for its role in accelerating ML models across various frameworks, while MLflow, an open-source MLOps platform, was singled out for its ability to streamline ML and generative AI workflows.

Product-Focused Updates and Assistance: Updates were shared regarding executing Open Interpreter code, where users were instructed to use the --no-llm_supports_functions flag and to check for software updates to fix local model issues. An outreach for help with the Open Empathic project was also noted, emphasizing the need to expand the project's categories.


Latent Space Discord

Hydra Slithers into Config Management: AI engineers are actively adopting Hydra and OmegaConf for better configuration management in machine learning projects, citing Hydra's machine learning-friendly features.

Perplexity Attracts Major Funding: Perplexity has secured a significant funding round of $62.7M, achieving a $1.04B valuation with investors like NVIDIA and Jeff Bezos onboard, hinting at a strong future for AI-driven search solutions. Perplexity Investment News

AI Engineering Manual Released: Chip Huyen's new book, AI Engineering, is making waves by highlighting the significance of building applications with foundation models and prioritizing AI engineering techniques. Exploring AI Engineering

Decentralized AI Development Gains Momentum: Prime Intellect has announced an innovative infrastructure to promote decentralized AI development and collaborative global model training, along with a $5.5M funding round. Prime Intellect's Approach

Join the Visionary Course: HuggingFace unveils a new community-driven course on computer vision, inviting participants across the spectrum, from beginners to experts seeking to stay abreast of the field's progress. Computer Vision Course Invitation

Discussing TimeGPT's Innovations: The US paper club is organizing a session on TimeGPT, addressing time series analysis, with the paper's authors and a special guest, offering a unique opportunity for in-depth learning. Register for TimeGPT Event


tinygrad (George Hotz) Discord


DiscoResearch Discord

Mixtral on Top: The Mixtral-8x7B-Instruct-v0.1 outshone Llama3 70b instruct in a RAG evaluation according to German metrics; a suggestion to add loglikelihood_acc_norm_nospace as a metric was made to address format discrepancies, and after template adjustments, DiscoLM German 7b saw varied results. Evaluation results and the evaluation template are available for closer examination.

Haystack's Dynamic Querying: Haystack LLM framework has been enhanced to index tools as OpenAPI specs, retrieve the top_k service based on user intent, and dynamically invoke the right tool; exemplified in a hands-on notebook.

Batch Inference Conundrums: One member mulled over how to send a batch of prompts through a local mixtral setup with 2 A100s, with TGI and vLLM as potential solutions; others preferred litellm.batch_completion for its efficiency. For scalable inference, llm-swarm was mentioned, although its necessity for dual GPU setups remains debatable.

DiscoLM Details Deliberated: A dive into DiscoLM's use of dual EOS tokens was made, addressing multiturn conversation management, whereas ninyago simplified DiscoLM_German coding issues by dropping the attention mask and utilizing model.generate. To enhance output length, switching to max_new_tokens was recommended over max_tokens, and despite imminent model improvements, community contributions to DiscoLM quantizations were welcomed.

Grammar Choices Grappled: The community discussed the impact of using the informal "du" versus formal "Sie" when prompting DiscoLM models in German, highlighting cultural nuances that could affect language model interactions.


LangChain AI Discord

Boost Your RAG Chatbot: Enhancements for a RAG chatbot were hot topics, as users explored adding web search result displays to augment database knowledge. Strategies to create a quick chat interface tapping into vector databases were also discussed, with tools like Vercel AI SDK and Chroma mentioned as potential accelerators.

Navigate JSON Like a Pro: Users sought ways to define metadata_field_info in a nested JSON structure for Milvus vector database use, indicative of the community's deep dive into efficient data structuring and retrieval.

Learn Langchain Chain Types With New Series: A new Langchain video series debuted, detailing the different chain types such as API Chain and RAG Chain to assist users in creating more nuanced reasoning applications. The educational content, available on YouTube, is aimed at expanding the toolset of AI engineers.

Pioneering Unification in RAG Frameworks: A member's discussion on adapting and refining RAG frameworks through Langchain's LangGraph emphasized topics like adaptive routing and self-correction. The innovative approach was detailed in a shared Medium post.

RAG Evaluation Unpacked: The RAGAS Platform spotlighted an article evaluating RAGs, inviting feedback and brainstorming on product development. The community is encouraged to provide insights and participate in the discussion through the links to the community page and the article.


Datasette - LLM (@SimonW) Discord


Cohere Discord

Whitelist Woes and CLI Tips for Cohere: A user sought information on the IP range for Cohere API and was offered a temporary solution with a specific IP: 34.96.76.122. The dig command was recommended for updates, mapping a need for clear whitelisting documentation in professional settings.

AI Career Sage Advice: Within the guild, there was agreement that substantial technical skills and the ability to articulate them trump networking in AI career progression. This highlights the community's consensus on the value of deep know-how over mere connections.

Level Up Your LLM Game: Somebody was curious about advancing their skills in machine learning and LLMs, with the group's advice emphasizing problem-solving and seeking real-world inspiration. This underscores the engineering mindset of tackling pragmatic concerns or being motivated by genuine curiosity.

Cohere Goes Commando with Open Source Toolkit: Cohere's Coral app has been made open-source, spurring developers to add custom data sources and deploy applications to the cloud. The Cohere Toolkit is now available, fueling the community to innovate with Cohere models across various cloud platforms.

Cohere, Command-r-ations, and Virtual Guides: There's buzz around using Cohere Command-r with RAG in BotPress due to perceived advantages over ChatGPT 3.5, and an AI Agent concept for Dubai Investment and Tourism was shared, that can converse with Google Maps and www.visitdubai.com. This reflects the growing interest in fine-tuning LLM applications to specific tasks and regional services.


Skunkworks AI Discord


Mozilla AI Discord

Relevant Links:


AI21 Labs (Jamba) Discord

Jamba's Resource Appetite Exposed: A user inquired about Jamba's compatibility with LM Studio, highlighting the interest due to its memory capacity rivaling Claude, yet another user voiced the challenge of running Jamba on systems with less than 200GB of RAM and a robust GPU, like the NVIDIA 4090.

Cooperation Call to Tackle Jamba’s Demands: Difficulty in provisioning adequate Google Cloud instances for Jamba surfaced, prompting a call for collaboration to address these resource allocation issues.

Flag on Inappropriate Content: The group was alerted about posts potentially breaching Discord's community guidelines, which included promotions of Onlyfans leaks and other age-restricted material.


LLM Perf Enthusiasts AI Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (929 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (47 messages🔥):

Link mentioned: Answer.AI - Efficient finetuning of Llama 3 with FSDP QDoRA: We’re releasing FSDP QDoRA, a scalable and memory-efficient method to close the gap between parameter efficient finetuning and full finetuning.


Unsloth AI (Daniel Han) ▷ #help (192 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (13 messages🔥):


Unsloth AI (Daniel Han) ▷ #suggestions (63 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #announcements (2 messages):


Perplexity AI ▷ #general (802 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (10 messages🔥):

Links mentioned:


Perplexity AI ▷ #pplx-api (9 messages🔥):


Nous Research AI ▷ #off-topic (10 messages🔥):

Link mentioned: Tweet from Sawyer Merritt (@SawyerMerritt): NEWS: Apple cuts Vision Pro shipments by 50%, now ‘reviewing and adjusting’ headset strategy. "There may be no new Vision Pro model in 2025" https://9to5mac.com/2024/04/23/kuo-vision-pro-ship...


Nous Research AI ▷ #interesting-links (17 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (358 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (26 messages🔥):

Link mentioned: Continual Learning for Large Language Models: A Survey: Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale. However, updates are necessary to endow LLMs with new skills and kee...


Nous Research AI ▷ #bittensor-finetune-subnet (1 messages):

paradox_13: What are the miner rates?


Nous Research AI ▷ #rag-dataset (100 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #world-sim (101 messages🔥🔥):

Links mentioned:


LM Studio ▷ #💬-general (235 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (175 messages🔥🔥):

Links mentioned:


LM Studio ▷ #announcements (1 messages):


LM Studio ▷ #🧠-feedback (9 messages🔥):


LM Studio ▷ #📝-prompts-discussion-chat (11 messages🔥):

Link mentioned: bartowski/Llama-3-Smaug-8B-GGUF · Hugging Face: no description found


LM Studio ▷ #🎛-hardware-discussion (132 messages🔥🔥):

Link mentioned: 👾 LM Studio - Discover and run local LLMs: Find, download, and experiment with local LLMs


LM Studio ▷ #langchain (1 messages):

vic49.: Yeah, dm me if you want to know how.


LM Studio ▷ #amd-rocm-tech-preview (19 messages🔥):


OpenAI ▷ #ai-discussions (338 messages🔥🔥):

Link mentioned: Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs | Lex Fridman Podcast #426: Edward Gibson is a psycholinguistics professor at MIT and heads the MIT Language Lab. Please support this podcast by checking out our sponsors:- Yahoo Financ...


OpenAI ▷ #gpt-4-discussions (21 messages🔥):


OpenAI ▷ #prompt-engineering (34 messages🔥):


OpenAI ▷ #api-discussions (34 messages🔥):


CUDA MODE ▷ #general (16 messages🔥):

Link mentioned: cuda-matmult/main.cu at main · tspeterkim/cuda-matmult: Contribute to tspeterkim/cuda-matmult development by creating an account on GitHub.


CUDA MODE ▷ #cuda (10 messages🔥):

Links mentioned:


CUDA MODE ▷ #torch (3 messages):


CUDA MODE ▷ #beginner (5 messages):


CUDA MODE ▷ #pmpp-book (4 messages):


CUDA MODE ▷ #youtube-recordings (3 messages):

Links mentioned:


CUDA MODE ▷ #hqq (6 messages):

Link mentioned: GitHub - microsoft/BitBLAS: BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.: BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment. - microsoft/BitBLAS


CUDA MODE ▷ #llmdotc (331 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #massively-parallel-crew (4 messages):


Eleuther ▷ #general (6 messages):

Links mentioned:


Eleuther ▷ #research (189 messages🔥🔥):

Links mentioned:


Eleuther ▷ #scaling-laws (56 messages🔥🔥):

Link mentioned: Proving Convergence of Least Squares Regression with i.i.d. Gaussian Noise: I have a basic question that I can't seem to find an answer for -- perhaps I'm not wording it correctly. Suppose that we have an $n$-by-$d$ matrix, $X$ that represents input features, and we...


Eleuther ▷ #interpretability-general (4 messages):

Links mentioned:


Eleuther ▷ #lm-thunderdome (12 messages🔥):


Eleuther ▷ #gpt-neox-dev (50 messages🔥):

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (311 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #announcements (1 messages):


HuggingFace ▷ #general (211 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (5 messages):

Link mentioned: Build an Agent with Long-Term, Personalized Memory: This video explores how to store conversational memory similar to ChatGPT's new long-term memory feature.We'll use LangGraph to build a simple memory-managin...


HuggingFace ▷ #cool-finds (17 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (11 messages🔥):

Links mentioned:


HuggingFace ▷ #computer-vision (4 messages):


HuggingFace ▷ #NLP (1 messages):


HuggingFace ▷ #diffusion-discussions (11 messages🔥):

Link mentioned: Discord | Your Place to Talk and Hang Out: Discord is the easiest way to talk over voice, video, and text. Talk, chat, hang out, and stay close with your friends and communities.


LAION ▷ #general (221 messages🔥🔥):

Links mentioned:


LAION ▷ #research (19 messages🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Link mentioned: DeepGaze: no description found


OpenRouter (Alex Atallah) ▷ #general (203 messages🔥🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #💬︱twitter (2 messages):


Modular (Mojo 🔥) ▷ #ai (2 messages):

Link mentioned: Setun - Wikipedia: no description found


Modular (Mojo 🔥) ▷ #🔥mojo (132 messages🔥🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #community-projects (9 messages🔥):


Modular (Mojo 🔥) ▷ #community-blogs-vids (7 messages):

Link mentioned: Tweet from Mojo 🔥 - Is it Python's faster cousin or just hype? PyConDE & PyData Berlin 2024: On 2023-05-02, the tech sphere buzzed with the release of Mojo 🔥, a new programming language developed by Chris Lattner, renowned for his work on Clang, LLVM, and Swift. Billed as "Python's...


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (8 messages🔥):

Link mentioned: [BUG] random.random_float64 is extremely slow · Issue #2388 · modularml/mojo: Bug description Generating one random number at a time in a for loop is extremely slow, almost 2 orders of magnitude slower than a numba-jitted equivalent. Context: I tried to use a simple Monte Ca...


Modular (Mojo 🔥) ▷ #📰︱newsletter (1 messages):

Zapier: Modverse Weekly - Issue 31 https://www.modular.com/newsletters/modverse-weekly-31


Modular (Mojo 🔥) ▷ #🏎engine (3 messages):


Modular (Mojo 🔥) ▷ #nightly (25 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (147 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (10 messages🔥):


OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages):


OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages):

aillian7: Is there a format for ORPO that i can use for a conversational use case?


OpenAccess AI Collective (axolotl) ▷ #rlhf (1 messages):


OpenAccess AI Collective (axolotl) ▷ #community-showcase (9 messages🔥):

Link mentioned: internistai/base-7b-v0.2 · Hugging Face: no description found


OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (11 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (5 messages):

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.


LlamaIndex ▷ #blog (4 messages):

Link mentioned: Google Colaboratory: no description found


LlamaIndex ▷ #general (140 messages🔥🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (39 messages🔥):

Link mentioned: AI CEO says people's obsession with reaching artificial general intelligence is 'about creating God': Arthur Mensch doesn't feel concerned about AI surpassing human intelligence, but he does worry about American tech giants dominating the field.


Interconnects (Nathan Lambert) ▷ #news (21 messages🔥):

Link mentioned: Tweet from Susan Zhang (@suchenzang): oh no not this again


Interconnects (Nathan Lambert) ▷ #ml-questions (22 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (8 messages🔥):


Interconnects (Nathan Lambert) ▷ #memes (8 messages🔥):


Interconnects (Nathan Lambert) ▷ #posts (10 messages🔥):


OpenInterpreter ▷ #general (69 messages🔥🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (11 messages🔥):

Links mentioned:


OpenInterpreter ▷ #ai-content (3 messages):

Links mentioned:


Latent Space ▷ #ai-general-chat (64 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

Link mentioned: LLM Paper Club (Survey Day) · Zoom · Luma: The TimeGPT authors have bumped to next week so today we're gonna go thru a few of the old papers on slido! Also submit and vote for our next paper:…


tinygrad (George Hotz) ▷ #general (10 messages🔥):

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (28 messages🔥):

Links mentioned:


DiscoResearch ▷ #mixtral_implementation (5 messages):

Links mentioned:


DiscoResearch ▷ #general (9 messages🔥):

Links mentioned:


DiscoResearch ▷ #discolm_german (19 messages🔥):

Link mentioned: Pipelines: no description found


LangChain AI ▷ #general (25 messages🔥):

Links mentioned:


LangChain AI ▷ #share-your-work (3 messages):

Links mentioned:


Datasette - LLM (@SimonW) ▷ #ai (22 messages🔥):

Link mentioned: microsoft/Phi-3-mini-4k-instruct-gguf: Microsoft's Phi-3 LLM is out and it's really impressive. This 4,000 token context GGUF model is just a 2.2GB (for the Q4 version) and ran on my Mac using the …


Datasette - LLM (@SimonW) ▷ #llm (5 messages):

Links mentioned:


Cohere ▷ #general (21 messages🔥):

Links mentioned:


Cohere ▷ #project-sharing (5 messages):


Skunkworks AI ▷ #general (5 messages):

Link mentioned: nisten/llama3-8b-instruct-32k-gguf · Hugging Face: no description found


Skunkworks AI ▷ #datasets (6 messages):

Link mentioned: Answer.AI - Efficient finetuning of Llama 3 with FSDP QDoRA: We’re releasing FSDP QDoRA, a scalable and memory-efficient method to close the gap between parameter efficient finetuning and full finetuning.


Skunkworks AI ▷ #finetuning (1 messages):


Mozilla AI ▷ #llamafile (10 messages🔥):

Links mentioned:


AI21 Labs (Jamba) ▷ #general-chat (4 messages):

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.


LLM Perf Enthusiasts AI ▷ #general (1 messages):

jeffreyw128: https://twitter.com/wangzjeff/status/1783215017586012566


LLM Perf Enthusiasts AI ▷ #gpt4 (1 messages):

Links mentioned:


LLM Perf Enthusiasts AI ▷ #openai (1 messages):