Frozen AI News archive

Ring Attention for >1M Context

**Google Gemini Pro** has sparked renewed interest in long context capabilities. The CUDA MODE Discord is actively working on implementing the **RingAttention** paper by Liu, Zaharia, and Abbeel, including extensions from the World Model RingAttention paper, with available PyTorch and CUDA implementations. TheBloke Discord discussed various topics including **LLM guessing game evaluation**, chatbot UX comparisons between **Nvidia's Chat with RTX** and **Polymind**, challenges in **retrieval-augmented generation (RAG)** integration, VRAM optimization, fine-tuning for character roleplay using **Dynamic Prompt Optimization (DPO)**, and model choices like **deepseek-coder-6.7B-instruct**. There was also discussion on ML workflows on Mac Studio, with preferences for **llama.cpp** over **ollama**, and scaling inference cost-effectively using GPUs like the **4090** on Runpod. LM Studio users face manual update requirements for version **0.2.16**, which includes support for **Gemma models** and bug fixes, especially for MacOS. The Gemma 7B model has had performance issues, while Gemma 2B received positive feedback.

Canonical issue URL

UPDATE FOR YESTERDAY: sorry for the blank email - someone posted a naughty link in the langchain discord that caused the buttondown rendering process to error out. We've fixed it so you can see yesterday's Google Gemini recap here.

Gemini Pro has woken everyone up to the benefits of long context. The CUDA MODE Discord has started a project to implement the RingAttention paper (Liu, Zaharia, Abbeel, and extended with the World Model RingAttention paper)

image.png

The paper of course came with a pytorch impl and lucidrains also has a take. But you can see the CUDA impl here: https://github.com/cuda-mode/ring-attention


Table of Contents

[TOC]

PART 1: High level Discord summaries

TheBloke Discord Summary

LLM Guessing Game Evaluation: Experiments with language models demonstrated their potential in understanding instructions, specifically for interactive guessing games where accurate number selection and user engagement are key.

UX Battleground: Chatbots: A heated debate around chatbot interfaces juxtaposed the cumbersome Nvidia's Chat with RTX against the nimble Polymind, underscoring the importance of user-friendly configurations.

RAG's Rigorous Implementation Road: Retrieval and generation feature integration sparked discussion, with attention on the complexity of incorporating such features cleanly and effectively into projects.

Discord Bots CSS Woes: Frustration was aired over CSS challenges when customizing Discord bots, highlighting the struggle for seamless integration between UI design and bot functionality.

VRAM: The Unseen Compute Currency: With an iron focus on resource optimization, the discourse centered on harmonizing VRAM capacity with model demands, emphasizing the balance between performance and computational overhead.

Character Roleplay Fine-tuning Finesse: Users like @superking__ and @netrve shared insights into the art of fine-tuning AI for character roleplay, with strategies revolving around comprehensive base knowledge and targeted training through Dynamic Prompt Optimization (DPO).

AI Story and Role-Play Enthusiasm: The release of new models targeted at story-writing and role-playing, trained on human-generated content for improved, steerable interactions in ChatML, has sparked keen interest for real-world testing.

Code Classification Conundrum: A quest for the ideal LLM to classify code relevance within a RAG pipeline led to the contemplation of deepseek-coder-6.7B-instruct, as community members seek further guidance.

Mistral Model Download Drought: An unelaborated request for local Mistral accessibility surfaces, but with too little information for constructive community support.

Workflow Woes on Mac Studio: The ML workflow struggle on Mac Studio was articulated, including a potential switch from ollama to llama.cpp, praising its simplicity and questioning the industry's push towards ollama.

VSCode Dethroned by Zed: Users like @dirtytigerx promote Zed as superior to Visual Studio Code, highlighting its minimal design and speed. An opening for Pulsar, an Atom-based text editor now open-sourced, is perceived with interest.

Scaling Inference with Tactical GPU Deployment: Cost-effective approaches to scaling inference servers are discussed, suggesting initial prototyping with affordable GPUs like the 4090 on runpod before full-scale deployment, mindful of the dependability of service agreements with cloud providers.


LM Studio Discord Summary


Nous Research AI Discord Summary

Scaling LLMs to New Heights: @gabriel_syme highlighted a repository focused on data engineering for scaling language models to 128K context, a significant advancement in the field. The VRAM requirements for such models at 7B scale exceed 600GB, a substantial demand for resources as noted by @teknium.

Google Enters the LLM Arena: Google introduced Gemma, a series of lightweight, open-source models, with enthusiastic coverage from @sundarpichai and mixed community feedback comparing Gemma with existing models like Mistral and LLaMA. Users @big_ol_tender and @mihai4256 engaged in various discussions, from the impact of instruction placement to VM performance across different services.

Open Source Development and Support: @pradeep1148 shared a video suggesting self-reflection could improve RAG models, and @blackblize sought guidance on using AI for artistic image generation with microscope photos. Meanwhile, @afterhoursbilly and @_3sphere critiqued AI-generated imagery of Minecraft's inventory UI.

Emerging AI Infrastructure Discussions: Conversations on Nous-Hermes-2-Mistral-7B-DPO-GGUF reflected queries about its comparison to other models, and @iamcoming5084 talked about out-of-memory errors with Mixtral 8x7b models. Strategies for hosting large models like Mixtral 8x7b were also examined, with users debating over different tools and pointing out errors in inference codes (corrected inference code for Nous-Hermes-2-Mistral-7B-DPO).

Collaborative Project Challenges: In #project-obsidian, @qnguyen3 notified of project delays due to personal circumstances and suggested direct messaging for coordination on the project front.


Eleuther Discord Summary


LAION Discord Summary


Mistral Discord Summary


OpenAI Discord Summary


HuggingFace Discord Summary


Latent Space Discord Summary


LlamaIndex Discord Summary


OpenAccess AI Collective (axolotl) Discord Summary


CUDA MODE Discord Summary


Perplexity AI Discord Summary

Gemini Unveiled: @brknclock1215 helps dispel confusion around Google’s Gemini model family, sharing resources like a two-month free trial for Gemini Advanced (Ultra 1.0), a private preview for Gemini Pro 1.5, and directing users to a blog post detailing the differences.

Bot Whisperers Wanted: There's a jesting interest in the Perplexity AI bot, with users discussing its offline status and how to use it. For the perplexed about Perplexity's Pro version and billing, users shared a link to the FAQ for clarity.

API Conundrums and Codes: Contributors report discrepancies between Perplexity's API and website content, seeking improved accuracy. Guidance suggests using simpler queries, while an ongoing issue with gibberish responses from the pplx-70b-online model is acknowledged with an outlook towards resolution. Integrating Google's GEMMA with Perplexity's API is also queried.

Cryptocurrency and Health Searches on Spotlight: Curious minds conducted Perplexity AI searches on topics ranging from cryptocurrency trading jargon to natural oral health remedies, highlighting a community engaged in diverse subjects.

Financial Instruments Query: A quest for understanding led to a search query on financial instruments, pointing to a trend where technical specificity is key in discussions revolving around finance.


LangChain AI Discord Summary


DiscoResearch Discord Summary


Skunkworks AI Discord Summary


Datasette - LLM (@SimonW) Discord Summary


Alignment Lab AI Discord Summary


LLM Perf Enthusiasts AI Discord Summary


AI Engineer Foundation Discord Summary


PART 2: Detailed by-Channel summaries and links

TheBloke ▷ #general (1132 messages🔥🔥🔥):

Links mentioned:


TheBloke ▷ #characters-roleplay-stories (299 messages🔥🔥):

Links mentioned:


TheBloke ▷ #training-and-fine-tuning (3 messages):


TheBloke ▷ #coding (163 messages🔥🔥):

Links mentioned:


LM Studio ▷ #💬-general (598 messages🔥🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (149 messages🔥🔥):

Links mentioned:


LM Studio ▷ #announcements (4 messages):

Links mentioned:


LM Studio ▷ #🧠-feedback (30 messages🔥):

Links mentioned:


LM Studio ▷ #🎛-hardware-discussion (130 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (266 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ctx-length-research (97 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #off-topic (16 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (38 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (419 messages🔥🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (9 messages🔥):

Links mentioned:


Nous Research AI ▷ #collective-cognition (3 messages):


Nous Research AI ▷ #project-obsidian (3 messages):


Eleuther ▷ #general (101 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (305 messages🔥🔥):

Links mentioned:


Eleuther ▷ #interpretability-general (43 messages🔥):

Links mentioned:


Eleuther ▷ #lm-thunderdome (64 messages🔥🔥):

Links mentioned:


Eleuther ▷ #multimodal-general (6 messages):


Eleuther ▷ #gpt-neox-dev (1 messages):

Links mentioned:

Analysing The Impact of Sequence Composition on Language Model Pre-Training: Most language model pre-training frameworks concatenate multiple documents into fixed-length sequences and use causal masking to compute the likelihood of each token given its context; this strategy i...


LAION ▷ #general (346 messages🔥🔥):

Links mentioned:


LAION ▷ #research (65 messages🔥🔥):

Links mentioned:


LAION ▷ #paper-discussion (1 messages):

said2000: https://arxiv.org/abs/2402.05608


Mistral ▷ #general (296 messages🔥🔥):

Links mentioned:


Mistral ▷ #models (20 messages🔥):

Links mentioned:

Chat with Open Large Language Models: no description found


Mistral ▷ #deployment (54 messages🔥):

Links mentioned:

vLLM | Mistral AI Large Language Models: vLLM can be deployed using a docker image we provide, or directly from the python package.


Mistral ▷ #finetuning (7 messages):


Mistral ▷ #showcase (13 messages🔥):

Links mentioned:


Mistral ▷ #la-plateforme (12 messages🔥):

Links mentioned:

no title found: no description found


OpenAI ▷ #ai-discussions (57 messages🔥🔥):

Links mentioned:


OpenAI ▷ #gpt-4-discussions (51 messages🔥):


OpenAI ▷ #prompt-engineering (91 messages🔥🔥):


OpenAI ▷ #api-discussions (91 messages🔥🔥):


HuggingFace ▷ #general (186 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (7 messages):

Links mentioned:

nanotron/examples/doremi at main · huggingface/nanotron: Minimalistic large language model 3D-parallelism training - huggingface/nanotron


HuggingFace ▷ #cool-finds (8 messages🔥):


HuggingFace ▷ #i-made-this (22 messages🔥):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (10 messages🔥):


HuggingFace ▷ #computer-vision (1 messages):


HuggingFace ▷ #NLP (36 messages🔥):


HuggingFace ▷ #diffusion-discussions (10 messages🔥):


Latent Space ▷ #ai-general-chat (78 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (3 messages):

Links mentioned:

Latent Space (Paper Club & Other Events) · Luma: View and subscribe to events from Latent Space (Paper Club & Other Events) on Luma. Latent.Space events. PLEASE CLICK THE RSS LOGO JUST ABOVE THE CALENDAR ON THE RIGHT TO ADD TO YOUR CAL. "Ad...


Latent Space ▷ #llm-paper-club-west (173 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #blog (3 messages):


LlamaIndex ▷ #general (246 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (3 messages):


OpenAccess AI Collective (axolotl) ▷ #general (149 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (26 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (51 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #runpod-help (6 messages):

Links mentioned:

Docker: no description found


CUDA MODE ▷ #general (2 messages):

Links mentioned:


CUDA MODE ▷ #triton (3 messages):


CUDA MODE ▷ #cuda (18 messages🔥):

Links mentioned:

GitHub - aredden/torch-bnb-fp4: Contribute to aredden/torch-bnb-fp4 development by creating an account on GitHub.


CUDA MODE ▷ #torch (5 messages):


CUDA MODE ▷ #suggestions (1 messages):

Links mentioned:


CUDA MODE ▷ #jobs (1 messages):

Links mentioned:

Apply now: Senior Machine Learning Engineer (m/f/d) | Munich: The job of your dreams in Munich: Senior Machine Learning Engineer (m/f/d). Join the SIXT team! We are looking forward to your application!


CUDA MODE ▷ #beginner (12 messages🔥):

Links mentioned:


CUDA MODE ▷ #youtube-recordings (1 messages):


CUDA MODE ▷ #jax (11 messages🔥):

Links mentioned:


CUDA MODE ▷ #ring-attention (39 messages🔥):

Links mentioned:


Perplexity AI ▷ #general (58 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (3 messages):


Perplexity AI ▷ #pplx-api (20 messages🔥):

Links mentioned:


LangChain AI ▷ #general (38 messages🔥):

Links mentioned:


LangChain AI ▷ #langserve (1 messages):


LangChain AI ▷ #share-your-work (3 messages):


LangChain AI ▷ #tutorials (1 messages):

pradeep1148: https://www.youtube.com/watch?v=Eb7QF1nDWGU


DiscoResearch ▷ #general (29 messages🔥):

Links mentioned:


DiscoResearch ▷ #benchmark_dev (1 messages):

Links mentioned:

HuggingFaceH4/open_llm_leaderboard · MMLU blog post discussion: no description found


Skunkworks AI ▷ #general (1 messages):


Skunkworks AI ▷ #off-topic (4 messages):

Links mentioned:


Skunkworks AI ▷ #papers (1 messages):

nagaraj_arvind: I mentioned KTO at the end. But did not get into the details.


Datasette - LLM (@SimonW) ▷ #ai (2 messages):

Links mentioned:


Datasette - LLM (@SimonW) ▷ #llm (4 messages):


Alignment Lab AI ▷ #general-chat (1 messages):

scopexbt: Hey all, i cant find anything about token, do we have one?


Alignment Lab AI ▷ #oo (2 messages):


LLM Perf Enthusiasts AI ▷ #general (1 messages):

res6969: Stay away from salesforce, itll be the biggest mistake you make as a company


LLM Perf Enthusiasts AI ▷ #opensource (1 messages):

potrock: https://blog.google/technology/developers/gemma-open-models/


LLM Perf Enthusiasts AI ▷ #embeddings (1 messages):


AI Engineer Foundation ▷ #events (3 messages):

Links mentioned: