Frozen AI News archive

Mixture of Depths: Dynamically allocating compute in transformer-based language models

**DeepMind** introduces the Mixture-of-Depths (MoD) technique, dynamically allocating FLOPs across transformer layers to optimize compute usage, achieving over **50% faster** forward passes without training impact. MoD selectively processes tokens using top-k routing, improving efficiency and potentially enabling faster ultra-long context handling. The method can combine with Mixture-of-Experts (MoE) for decoupled routing of queries, keys, and values. Reddit discussions highlight concerns about **LLM hype** overshadowing other AI tech, improvements in transformer efficiency, a new Think-and-Execute framework boosting algorithmic reasoning by **10-20%**, and Visual Autoregressive modeling (VAR) surpassing diffusion models in image quality and speed. On-device model Octopus v2 outperforms GPT-4 in function calling accuracy and latency.

Canonical issue URL

Top news of the day is DeepMind's MoD paper describing a technique that, given a compute budget, can dynamically allocate FLOPs to different layers instead of uniformly. The motivation is well written:

Not all problems require the same amount of time or effort to solve. Analogously, in language modeling not all tokens and sequences require the same time or effort to accurately make a prediction. And yet, transformer models expend the same amount of compute per token in a forward pass. Ideally, transformers would use smaller total compute budgets by not spending compute unnecessarily.

The method uses top-k routing allowing for selective processing of tokens, thus maintaining a fixed compute budget. You can compare it to a "depth" sparsity version of how MoEs scale model "width":

image.png

We leverage an approach akin to Mixture of Experts (MoE) transformers, in which dynamic token-level routing decisions are made across the network depth. Departing from MoE, we choose to either apply a computation to a token (as would be the case for a standard transformer), or pass it through a residual connection (remaining unchanged and saving compute). Also in contrast to MoE, we apply this routing to both forward MLPs and multi-head attention. Since this therefore also impacts the keys and queries we process, the routing makes decisions not only about which tokens to update, but also which tokens are made available to attend to. We refer to this strategy as Mixture-of-Depths (MoD) to emphasize how individual tokens pass through different numbers of layers, or blocks, through the depth of the transformer

Per Piotr, Authors found that routing ⅛ tokens through every second layer worked the best. They also make an observation that the cost of attention for those layers decreases quadratically, so this could be an interesting way of making ultra long context length much faster. There's no impact at training time, but can be "upwards of 50% faster" per forward pass.

The authors also demonstrate how MoD can be combined with MoE (eg by having a no-op expert) to decouple the routing for queries, keys, and values:

image.png


Table of Contents

[TOC]


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence. Comment crawling still not implemented but coming soon.

AI Research and Development

AI Products and Services

AI Hardware and Performance

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Models and Architectures

Techniques and Frameworks

Datasets

Compute Infrastructure

Discussions and Perspectives

Memes and Humor


AI Discord Recap

A summary of Summaries of Summaries

1. Cutting-Edge LLM Advancements and Releases

2. Parameter-Efficient LLM Fine-Tuning Techniques

3. Architectural Innovations for Efficient Transformers

4. Open-Source AI Frameworks and Community Efforts


PART 1: High level Discord summaries

Perplexity AI Discord


Stability.ai (Stable Diffusion) Discord

Maximizing Image Fidelity: Technical suggestions to circumvent issues with generating 2k resolution realistic images emphasized lower resolution generation followed by upscaling, minimizing steps, and engaging "hiresfix". Trade-offs between quality and distortions during upscaling framed the dialogue.

SD3 Release Leaves Crowd Restless: While some guild members are eagerly awaiting Stable Diffusion 3 (SD3), others sense a delay, which has led to mixed feelings ranging from anticipation to skepticism and comparisons with other models like Ideogram and DALLE 3.

AI Meets Art: Creative discussions unfolded around using AI for artistic endeavors, highlighting Daz AI in image generation, and the intricacies of finessing models for art-specific outputs, such as generating clothing designs in Stable Diffusion.

VRAM to the Rescue: Technical discourse delved into model resource demands, particularly operating models across various VRAM allotments and the anticipation of SD3's performance on standard consumer GPUs.

Demystifying Stable Diffusion Know-how: Users shared insights and sought advice on optimizing Stable Diffusion model versions and interfaces, covering best practices for image finetuning and effective model checkpoint management.


OpenAI Discord

Fine-Tuning API Gets a Makeover: OpenAI has rolled out updates to the fine-tuning API, aiming to give developers more control over model customization. The enhancements include new dashboards and metrics, and expand the custom models program, as detailed in OpenAI's blog post and an accompanying YouTube tutorial.

AI Discussions Heat Up: Across channels, there is debate around concepts such as AI cognition and ASCII art generation, probing AI's potential in 3D printing, and balancing excitement for releases with security measures. Additionally, implementation queries on using AI for document analysis and fine-tuning for data enhancement were highlighted, alongside an observation of inconsistent behavior when setting the assistant's temperature to 0.0.

Prompt Engineering Tactics Unveiled: Members are sharing strategies to make GPT-3 produce longer outputs and to constrain responses to specific documentation. Tips range from starting a new chat with "continue" to stern instructions that make the AI confirm the existence of answers within provided materials.

Assertive Prompting May Boost GPT Accuracy: To ensure that GPT's outputs are based strictly on supplied content, the advice is to give clear and assertive prompts. Whether discussing the nature of consciousness to mimic human responses or reinforcing documentation-specific replies, the community explores the semblance of an AI's understanding.

Clarity on GPT-4 Usage Costs: Discussions clarify that incorporating GPT models into apps requires a subscription plan, such as the Plus plan, as all models now operate under GPT-4. Users seeking enhanced functionality with GPT models must consider this when developing AI-powered applications.


LM Studio Discord


Nous Research AI Discord

Bold New Leap for LoRA: A proposal has been made to apply Low-Rank Adaptation (LoRA) to Mistral 7B, aiming to augment its capabilities. Plans are afoot to integrate a taxonomy-driven approach for sentence categorization.

State-of-the-Art Archival and Web Crawling Practices: Discussions highlighted the thin line between archival groups and data hoarding, with a nod toward Common Crawl for web crawling excluding Twitter. The promotion of Aurora-M, a 15.5B parameter open-source, multilingual LLM with over 2 trillion training tokens was noted, in addition to tools for structuring LLM outputs like Instructor.

LLM Landscape Expanded: Announcements included a 104B LLM, C4AI Command R+, with RAG functionality and support for multiple languages available on Hugging Face. The community also discussed GPT-4 fine-tuning pricing and welcomed updates on an AI development teased by @rohanpaul_ai, while highlighting the LLaMA-2-7B model's 700K token context length training and the uncertainty regarding fp8 usability on Nvidia's 4090 GPUs.

Datasets and Tools Forge Ahead: An introduction to Augmentoolkit, which converts books and computes into instruction-tuning datasets, was discussed. Excitement surrounded Severian/Internal-Knowledge-Map with its novel approach to LM understanding, and the neurallambda project's aim to enable reasoning in AI with lambda calculus.

Dynamic Function Calling: An example of function calling with Hermes is to be demonstrated in a repository, alongside serious debugging efforts for its functioning with Vercel AI SDK RSC. The Hermes-Function-Calling repository faced critique, resulting in adherence to the Google Python Style Guide. Previewed was the Eurus-7B-KTO model, garnering interest for its use in the SOLAR framework.

Dependency Dilemmas and Dataset Stratagems: An emerging dependency issue was acknowledged without further context. The RAG dataset channel elucidated plans for pinning summaries, exploring adaptive RAG techniques, and the utilization of diverse data sources for RAG, along with discussions of Interface updates from Command R+ and Claude Opus.

World Building Steams Ahead with WorldSim: Tokens circulated regarding the WorldSim Versions & Command Sets and the Command Index, covering user experience details like custom emoji suggestions. Also brewing were thoughts on new channels for philosophy cross-pollinated with AI and a TRS-80 telepresence experience reflecting on Zipf's law. Anticipation buzzed for a WorldSim update with enhanced UX, hoping to ground self-steering issues.


Unsloth AI (Daniel Han) Discord

GPU Memory Gains: The GaLore update promises to enhance GPU memory efficiency with fused kernels, sparking discussions on integrating it with Unsloth AI for superior performance.

Model Packing Misfits: Caution is advised against employing packing parameter on Gemma models due to compatibility issues, although it can hasten training by concatenating tokenized sequences.

Optimization Opportunities: There's ongoing exploration into combining Unsloth with GaLore for memory and speed optimizations, despite GaLore's default performance lag behind Lora.

Anticipating Unsloth's New Features: Unsloth AI plans to release a "GPU poor" feature by April 22 and an "Automatic optimizer" in early May. The available Unsloth Pro since November 2023 is examined for distribution improvements.

Dataset Diversity in Synthetic Generation: Format flexibility is deemed inconsequential for synthetic dataset generation’s impact on performance, allowing for personal preference in formats chosen for fine-tuning LLMs.

Eagerly Awaiting Kaggle’s Reset: Kaggle enthusiasts await the new season, leveraging additional sleep hours due to Daylight Saving Time adjustments, while seeking AI news sources and discussing pretraining datasets potentially including libgen or scihub.

Unsloth Enables Streamlined Inference: Community feedback praises Unsloth’s ease of use for inference processes, with additional resources like batch inference guidelines being shared.

Finetuning Workshops Tackled: Users brainstorm on how to deliver effective finetuning workshops with hands-on experiences, involving innovations such as preparing models beforehand or employing LoRaX as a web UI for model interaction.

Version Control for Stability: Concerns about the impact of Unsloth updates on model reproducibility prompted a consensus on the necessity for strict versioning, to ensure numerical consistency and reversibility.

Parameter Efficiency in Fine-Tuning: A new fine-tuning technique called ReFT is showcased for being highly parameter-efficient, described in detail within a GitHub repo and an accompanying paper.


Eleuther Discord

Wiki Wisdom Now Publicly Accessible: Members tackled the challenges of accessing Wikitext-2 and Wikitext-103 datasets, sharing links from Stephen Merity's page and Hugging Face, with concerns over the ease of use of raw data formats.

GateLoop Replication Spark Debate: Skepticism regarding the GateLoop architecture's perplexity scores met clarifying information with released code, igniting discussions on experiment replication and the performance of various attention mechanisms.

Modular LLMs at the Forefront: Intense discussions focused on Mixture of Experts (MoE) architectures, spanning interpretability, hierarchical vs. flat structures, and efficiency strategies in Large Language Models (LLMs), referencing multiple papers and a Master's thesis tease suggesting an upcoming breakthrough in MoE Floating Point Operations (FLOPs).

Interpretability Implementations Interchange: Queries about the availability of an opensource implementation of AtP* led to the sharing of the GitHub repo for AtP*, while David Bau sought community support on GitHub for nnsight to fulfill NSF reviewer requirements.

From Troubleshooting to Trials in the Thunderdome: Discussions in #lm-thunderdome dove into troubleshooting, from syntax quirks with top_p=1 to confusion over model argument compatibility and efficiency gains from batch_size=auto, advising fresh installations or the use of Google Colab for certain issues.

Gemini Garners Cloud Support: A brief message highlighted Gemini's support implementation by AWS, with a mention of support from Azure as well.


Modular (Mojo 🔥) Discord

Boosting Mojo's Debugging Capabilities: Engineers queried about debugging support for editors like neovim, incorporating the Language Server Protocol (LSP) for enhanced problem-solving.

Dynamic Discussions on Variant Types: The use of Variant type was endorsed over isinstance function in Mojo, highlighting its dynamic data storage abilities and type checks using isa and get/take methods as shown in the Mojo documentation.

Basalt Lights Up ML Framework Torch: The newly minted Machine Learning framework Basalt is making headlines, differentiated as "Deep Learning" and comparable to PyTorch, with its foundational version v.0.1.0 on GitHub and related Medium article.

Counting Bytes, Not Just Buckets: A discourse on bucket sizing for value storage highlighted that each bucket holds UInt32 values, a mere 4 bytes each. This attention to memory efficiency is critical for handling up to 2^32 - 1 values.

Evolving Interop with Python: Progress in interfacing Python with Mojo was revealed, focusing on the use of PyMethodDef and PyCFunction_New, with stable reference counting and no issues to date. The current developments can be viewed on rd4com's GitHub branch.


OpenAccess AI Collective (axolotl) Discord


LlamaIndex Discord


OpenRouter (Alex Atallah) Discord

Claude Gets Tangled in Safety Nets: Users report higher decline rates when utilizing Claude with OpenRouter API compared to Anthropic's API, suspecting OpenRouter might have added extra "safety" layers that interfere with performance.

Restoring Midnight Rose: Midnight Rose experienced downtime but was brought back online after restarting the cluster. The incident has sparked talks among users for switching to a more resilient provider or technology stack.

A Symphony of Modals: Following a shift to multimodal functionality, the Claude 3 model now accepts image inputs, necessitating code updates by developers. More details are announced here.

Command R+ Sparks Code-Conducting Excitement: Command R+, a 104B parameter model from Cohere, noted for its strong coding and multilingual capabilities, has excited users about its incorporation in OpenRouter, and comprehensive benchmarks can be found here.

Troubleshooting the Mixtral Puzzle: The Mixtral-8x7B-Instruct encountered issues following a JSON schema, which was successfully resolved by OpenRouter, not the providers, creating an eagerness for fixes and updates to streamline use with JSON modes.


HuggingFace Discord

A New Contender in Image Generation: A Visual AutoRegressive (VAR) model is proposed that promises to outshine diffusion transformers in image generation, boasting significant improvements in Frechet inception distance (FID) from 18.65 to 1.80 and an increase in inception score (IS) from 80.4 to 356.4.

Rethinking Batch Sizes for Better Minima: Engineers are debating whether smaller batch sizes, even though they slow down training, could achieve better results by not skipping over optimal local minima, in contrast to larger batch sizes that might expedite training but perform suboptimally.

Update Your Datasets like Git: AI practitioners are reminded that updates to datasets and models on Hugging Face require the same git-like discipline—an update locally followed by a commit and push—to reflect changes on the platform.

Bridging AI and Music with Open Source: A breakthrough was shared in the form of a musiclang2musicgen pipeline demonstrated through a YouTube video, promoting the viability of open-source solutions in audio generation.

Stanford's Treasure Trove for NLP Newbies: For those starting in NLP and deciding between Transformer architectures and traditional models like LSTM, the recommendation is to utilize the Stanford CS224N course, available through a YouTube playlist, as a first-rate resource.

Tuning and Deploying LLMs: Questions arose concerning Ollama model deployment, especially regarding memory requirements for the phi variant, along with inquiries on whether local deployment or API-based solutions like OpenAI's are more suitable for particular use cases.


tinygrad (George Hotz) Discord

Tinygrad's NPU Buzz and Intel GPU Gossip: Discussion in the guild mentioned that while tinygrad lacks dedicated NPU support on new laptops, it provides an optimization checklist for comparing performance with onnxruntime. Guild members also dissected the Linux kernel 6.8's capability to drive Intel hardware, especially post-Ubuntu 24.04 LTS release, eyeing advancement in Intel's GPUs and NPUs' kernel drivers.

Scalability Dialogue and Power Efficiency Talks: Dialogues touched on tinygrad's future scalability, with George Hotz indicating the potential for significant scaling using a 200 GbE full 16x interconnect slot and teased multimachine support. There was also a comparison of NPUs and GPUs in terms of power efficiency, highlighting NPUs' ability to match GPU performance with considerably less power consumption.

Prospects and Perils in Kernel Development: Among AI engineers, there was recognition of the obstacles presented by AVX-512 and interest in Intel making improvements based on a discussion thread on Real World Technologies. Conversations also covered AMD's open-source intentions with a side of skepticism towards the actual impact, and looked forward to how the AMD Phoronix update will affect the scene.

Learning Through Tinygrad's JIT: A post cleared confusion regarding JIT cache collection, and a community member contributed study notes to aid in performance profiling with DEBUG=2 for tinygrad. There's a collective effort to refine a community-provided TinyJit tutorial, as the author welcomed corrections, signaling the community's commitment to mutual learning and documentation accuracy.

Community Collaboration Encouraged: The conversations conveyed a strong sentiment for peer collaboration, urging knowledgeable members to submit pull requests to correct inaccuracies in TinyJit documentation, thus promoting a help-forward approach among the guild participants.


Interconnects (Nathan Lambert) Discord


LangChain AI Discord


LAION Discord

AI Skirmishes with Stress and Time: The community is discussing AIDE's achievements in Kaggle competitions, questioning if it's comparable to the human contestant experience that involves factors like stress and time constraints. No consensus was reached, but the debate highlights the growing capabilities of AI in competitive data science.

Back to Basics with Apple and PyTorch: The technical crowd is expressing frustration over Apple’s MPS with some recommending trying the PyTorch nightly branch for potential fixes. Additionally, the benefits of PyTorch on macOS, specifically the aot_eager backend, were shown with a case of the backend reducing image generation time significantly when leveraging Apple's CoreML.

A Glimpse into Audio AI: There's curiosity about capabilities such as DALL·E's image edit history and the desire to implement a similar feature within SDXL. Moreover, questions arose about voice-specific technologies for parsing podcast audio beyond conventional speaker diarization.

Revival of Access and Information: Discussions revealed concerns over Reddit's API access being cut and its effects on developers and the blind community, as well as the reopening of the subreddit /r/StableDiffusion and its implications for the community.

Computational Smarts in Transformers: The buzz is about Google's token compression method which aims to shrink model size and computational load, and a paper discussing a dynamic FLOPs allocation strategy in transformer models, employing a top-$k$ routing algorithm that balances computational resources and performance. This method is described in the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models".


Latent Space Discord

Dynamic Allocation Divides the Crowd: DeepMind's approach to dynamic compute in transformers, dubbed Mixture-of-Depths, garners mixed reactions; some praise its compute reductions while others doubt its novelty and practicality.

Claude Masters Tools: Anthropic's Claude exhibits impressive tool use, stirring discussions about the practical applications and scalability of such capabilities within AI systems.

Paper Club Prepares to Convene: The San Diego AI community announces a paper club session, encouraging participants to select and dive into AI-related articles, with a simple sign-up process available to those eager to join.

ReFT Redefines Fine-Tuning: Stanford introduces ReFT (Representation Finetuning), touting it as a more parameter-efficient fine-tuning method, which has the AI field weighing its pros and cons against existing techniques.

Keras vs. PyTorch: A Heated Benchmark Battle: François Chollet highlights a benchmark where Keras outperforms PyTorch, sparking debates over benchmarks' fairness and the importance of out-of-the-box speed versus optimized performance.

Enroll in AI Education: Latent Space University announces its first online course with a focus on coding custom ChatGPT solutions, inviting AI engineers to enroll and emphasizing the session's applicability for those looking to deepen their knowledge in AI product engineering.


OpenInterpreter Discord

OpenInterpreter Talks the Talk: An innovative wrapper for voice interactions with OpenInterpreter has been developed, though it falls short of 01's voice capabilities. The community is engaging in the set up and compatibility challenges, with Windows users struggling and CTRL + C not exiting the terminal as expected.

Compare and Contrast with OpenAI: A mysterious Compare endpoint has surfaced in the OpenAI API's playground, yet without formal documentation; it facilitates direct comparisons between models and parameters.

Python Predicaments and Ubuntu Upset: OpenInterpreter's 01OS is wrestling with Python 3.11+ incompatibility issues, suggesting a step back to Python 3.10 or lower for stability. Meanwhile, Ubuntu 21 and above users find no support for OpenInterpreter due to Wayland incompatibility, as x11 remains a necessity as noted in Issue #219.

Listening In, No Response: Users have reported troubling anomalies with 01's audio connection, where sound is recorded but not transferred for processing, indicating potential new client-side bugs.

Conda Conundrum: To handle troublesome TTS package installations, the recommendation is to create a Conda environment using Python 3.10 or lower, followed by a repository re-clone and a clean installation to bypass conflicts.


CUDA MODE Discord

BitMat Breakthrough in LLM: The BitMat implementation was brought into the spotlight, reflecting advances in the "Era of 1-bit LLMs" via an efficient method hosted on GitHub at astramind-ai/BitMat.

QuaRot Quashes Quantization Quibbles: A newly introduced quantization scheme called QuaRot promises effective end-to-end 4-bit quantization of Large Language Models, with the notable achievement of a quantized LLaMa2-70B model maintaining 99% of its zero-shot performance.

CUDA Kernel Tutorial Gets Thumbs Up: A revered Udacity course on "Intro to Parallel Programming" was resurfaced for its enduring relevance on parallel algorithms and performance tuning, applicable even a decade after its introduction.

HQQ-GPT-Fast Fusion: There was a fiery conversation in the #hqq channel regarding integrating and benchmarking HQQ with gpt-fast, focusing on leveraging Llama2-7B models and experimenting with 3/4-bit quantization strategies for optimizing LLMs.

Enhanced Visualization Aims for Clarity: Triton-viz discussions aimed at better illustrating data flows in visualizations with amendments like directional arrows, value display on interactive elements, and possible shifts to JavaScript frameworks such as Three.js for superior interactivity.


Datasette - LLM (@SimonW) Discord


DiscoResearch Discord

Judge A Book By Its Creativity: The new EQBench Creative Writing and Judgemark leaderboards have sparked interest with their unique assessments of LLMs' creative output and judgement capabilities. Notably, the Creative Writing leaderboard leverages 36 narrowly defined criteria for better model discrimination, and a 0-10 quality scale has been recommended for nuanced quality assessments.

COMET's New Scripts Land on GitHub: Two scripts for evaluating translations without references, comet_eval.ipynb & overall_scores.py, are now available in the llm_translation GitHub repository, signaling a step forward in transparency and standardized LLM performance measurement.

Cohere's Demo Outshines the Rest: A new demo by CohereForAI on Hugging Face's platform has showcased a significant leap in AI models' grounding capabilities, inviting discussions on its potential to shape future model developments.

Old School Translations Get Schooled: The Hugging Face model, command-r, seemingly makes traditional methods of LLM Middle High German translation training obsolete with its translation prowess and is suggested to revolutionize linguistic database integrations during inference.

Pondering the Future of Model Licensing: The potential open-sourcing of CohereForAI's model license is a hot topic, with comparative discussions involving GPT-4 and Nous Hermes 2 Mixtral underscoring the expected community growth and innovation that could mirror the Mistral model's impact.


Mozilla AI Discord


Skunkworks AI Discord


LLM Perf Enthusiasts AI Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1314 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (11 messages🔥):


Perplexity AI ▷ #pplx-api (18 messages🔥):


Stability.ai (Stable Diffusion) ▷ #general-chat (600 messages🔥🔥🔥):

Links mentioned:


OpenAI ▷ #annnouncements (1 messages):

Link mentioned: Introducing improvements to the fine-tuning API and expanding our custom models program: We’re adding new features to help developers have more control over fine-tuning and announcing new ways to build custom models with OpenAI.


OpenAI ▷ #ai-discussions (539 messages🔥🔥🔥):

Links mentioned:


OpenAI ▷ #gpt-4-discussions (11 messages🔥):


OpenAI ▷ #prompt-engineering (15 messages🔥):


OpenAI ▷ #api-discussions (15 messages🔥):


LM Studio ▷ #💬-general (198 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (85 messages🔥🔥):

Links mentioned:


LM Studio ▷ #announcements (1 messages):

Link mentioned: Tweet from LM Studio (@LMStudioAI): If you've been around these parts for long enough, you might be missing @TheBlokeAI as much as we do 🥲. Us & @bartowski1182 decided to try to help fill the void. We're excited to share the n...


LM Studio ▷ #🧠-feedback (8 messages🔥):


LM Studio ▷ #📝-prompts-discussion-chat (2 messages):


LM Studio ▷ #🎛-hardware-discussion (21 messages🔥):


LM Studio ▷ #🧪-beta-releases-chat (54 messages🔥):

Links mentioned:


LM Studio ▷ #autogen (10 messages🔥):


LM Studio ▷ #langchain (1 messages):


LM Studio ▷ #amd-rocm-tech-preview (27 messages🔥):

Link mentioned: Reddit - Dive into anything: no description found


LM Studio ▷ #crew-ai (22 messages🔥):


Nous Research AI ▷ #ctx-length-research (2 messages):


Nous Research AI ▷ #off-topic (10 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (10 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (182 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (48 messages🔥):

Links mentioned:


Nous Research AI ▷ #bittensor-finetune-subnet (2 messages):


Nous Research AI ▷ #rag-dataset (31 messages🔥):

Links mentioned:


Nous Research AI ▷ #world-sim (108 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (189 messages🔥🔥):

Links mentioned:

AutoQuant is the evolution of my previous AutoGGUF notebook…": no description foundGoogle Colaboratory: no description foundunsloth (Unsloth): no description foundGitHub - myshell-ai/JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars: Reaching LLaMA2 Performance with 0.1M Dollars. Contribute to myshell-ai/JetMoE development by creating an account on GitHub.ASCII art elicits harmful responses from 5 major AI chatbots: LLMs are trained to block harmful responses. Old-school images can override those rules. GitHub - OpenNLPLab/LASP: Linear Attention Sequence Parallelism (LASP): Linear Attention Sequence Parallelism (LASP). Contribute to OpenNLPLab/LASP development by creating an account on GitHub.GaLore and fused kernel prototypes by jeromeku · Pull Request #95 · pytorch-labs/ao: Prototype Kernels and Utils Currently: GaLore Initial implementation of fused kernels for GaLore memory efficient training. TODO: triton Composable triton kernels for quantized training and ...


Unsloth AI (Daniel Han) ▷ #random (21 messages🔥):


Unsloth AI (Daniel Han) ▷ #help (137 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #suggestions (35 messages🔥):

Links mentioned:


Eleuther ▷ #general (67 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (207 messages🔥🔥):

Links mentioned:


Eleuther ▷ #scaling-laws (3 messages):


Eleuther ▷ #interpretability-general (6 messages):

Links mentioned:


Eleuther ▷ #lm-thunderdome (39 messages🔥):

Links mentioned:


Eleuther ▷ #gpt-neox-dev (1 messages):


Modular (Mojo 🔥) ▷ #general (18 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #💬︱twitter (5 messages):


Modular (Mojo 🔥) ▷ #🔥mojo (236 messages🔥🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #community-projects (5 messages):

Links mentioned:


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (1 messages):


Modular (Mojo 🔥) ▷ #nightly (10 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (23 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (13 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (12 messages🔥):

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.


OpenAccess AI Collective (axolotl) ▷ #datasets (6 messages):

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.


OpenAccess AI Collective (axolotl) ▷ #announcements (1 messages):


OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (140 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (21 messages🔥):

Please note that the last bullet point is a report of inappropriate content present in the chat, which should be moderated according to the rules of the platform.

Links mentioned:


LlamaIndex ▷ #announcements (1 messages):

jerryjliu0: webinar is in 15 mins! ^^


LlamaIndex ▷ #blog (4 messages):


LlamaIndex ▷ #general (160 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (4 messages):


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (155 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #general (74 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (3 messages):

Link mentioned: Tweet from Siddish (@siddish_): stream with out reasoning -> dumb response 🥴 stream till reasoning -> slow response 😴 a small LLM hack: reason most likely scenarios proactively while user is taking their time


HuggingFace ▷ #cool-finds (8 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (17 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (5 messages):


HuggingFace ▷ #computer-vision (11 messages🔥):


HuggingFace ▷ #NLP (13 messages🔥):


HuggingFace ▷ #diffusion-discussions (5 messages):

Link mentioned: Stanford CS224N: Natural Language Processing with Deep Learning | 2023: Natural language processing (NLP) is a crucial part of artificial intelligence (AI), modeling how people share information. In recent years, deep learning ap...


tinygrad (George Hotz) ▷ #general (87 messages🔥🔥):

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (8 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (41 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (3 messages):


Interconnects (Nathan Lambert) ▷ #random (41 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #nlp (8 messages🔥):

Link mentioned: Mixture-of-Depths: Dynamically allocating compute in transformer-based language models: Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific ...


Interconnects (Nathan Lambert) ▷ #sp2024-history-of-open-alignment (1 messages):

natolambert: the mascot for this talk lol


LangChain AI ▷ #general (85 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #langchain-templates (3 messages):

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.


LangChain AI ▷ #share-your-work (2 messages):

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

Links mentioned:


LAION ▷ #general (66 messages🔥🔥):

Links mentioned:


LAION ▷ #research (3 messages):

Link mentioned: Mixture-of-Depths: Dynamically allocating compute in transformer-based language models: Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific ...


Latent Space ▷ #ai-general-chat (61 messages🔥🔥):

Links mentioned:

    Representation Engineering Mistral-7B an Acid Trip
  

: no description foundUnderstanding and managing the impact of machine learning models on the web | Hacker News: no description foundTweet from Aran Komatsuzaki (@arankomatsuzaki): ReFT: Representation Finetuning for Language Models 10x-50x more parameter-efficient than prior state-of-the-art parameter-efficient fine-tuning methods repo: https://github.com/stanfordnlp/pyreft a...Tweet from cohere (@cohere): Today, we’re introducing Command R+: a state-of-the-art RAG-optimized LLM designed to tackle enterprise-grade workloads and speak the languages of global business. Our R-series model family is now av...Tweet from Ben (e/sqlite) (@andersonbcdefg): amazing. "you like MoE? what if we made one of the experts the identity function." kaboom, 50% FLOPs saved 🤦‍♂️ ↘️ Quoting Aran Komatsuzaki (@arankomatsuzaki) Google presents Mixture-of-De...Tweet from Sherjil Ozair (@sherjilozair): How did this get published? 🤔 ↘️ Quoting AK (@_akhaliq) Google presents Mixture-of-Depths Dynamically allocating compute in transformer-based language models Transformer-based language models sp...SDxPaperClub · Luma: The SDx Paper Club. The paper to be presented is [TBD] by [TBD] Twitter | Discord | LinkedInCommand R+: no description foundLogin | Cohere: Cohere provides access to advanced Large Language Models and NLP tools through one easy-to-use API. Get started for free.GitHub - myshell-ai/JetMoE: Reaching LLaMA2 Performance with 0.1M Dollars: Reaching LLaMA2 Performance with 0.1M Dollars. Contribute to myshell-ai/JetMoE development by creating an account on GitHub.Representation Engineering and Control Vectors - Neuroscience for LLMs: tl;dr A recent paper studied large language model’s (LLM) reactions to stimuli in a manner similar to neuroscience, revealing an enticing tool for controlling and understanding LLMs. I write her...[AINews] Cohere Command R+, Anthropic Claude Tool Use, OpenAI Finetuning: AI News for 4/3/2024-4/4/2024. We checked 5 subreddits and 364 Twitters and 26 Discords (385 channels, and 5656 messages) for you. Estimated reading time...Latent Space (Paper Club & Other Events) · Events Calendar: View and subscribe to events from Latent Space (Paper Club & Other Events) on Luma. Latent.Space events. PLEASE CLICK THE RSS LOGO JUST ABOVE THE CALENDAR ON THE RIGHT TO ADD TO YOUR CAL. "Ad...GitHub - Paitesanshi/LLM-Agent-Survey: Contribute to Paitesanshi/LLM-Agent-Survey development by creating an account on GitHub.Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. It's the all-in-one workspace for you and your team


Latent Space ▷ #ai-announcements (4 messages):

Link mentioned: Code a custom ChatGPT: This is the foundation of AI products. If you want to be an AI engineer these are MUST KNOW topics and API's. Everything from ChatGPT to robust AI powered summarization and classification use th...


OpenInterpreter ▷ #general (29 messages🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (26 messages🔥):

Links mentioned:


CUDA MODE ▷ #triton (3 messages):

Links mentioned:


CUDA MODE ▷ #torch (1 messages):

marksaroufim: https://twitter.com/soumithchintala/status/1776311683385880983


CUDA MODE ▷ #algorithms (1 messages):

Link mentioned: QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs: We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way t...


CUDA MODE ▷ #suggestions (1 messages):

Link mentioned: Intro to the Class - Intro to Parallel Programming: This video is part of an online course, Intro to Parallel Programming. Check out the course here: https://www.udacity.com/course/cs344.


CUDA MODE ▷ #beginner (2 messages):

Link mentioned: Google Colaboratory: no description found


CUDA MODE ▷ #jax (1 messages):


CUDA MODE ▷ #ring-attention (1 messages):

Link mentioned: GitHub - OpenNLPLab/LASP: Linear Attention Sequence Parallelism (LASP): Linear Attention Sequence Parallelism (LASP). Contribute to OpenNLPLab/LASP development by creating an account on GitHub.


CUDA MODE ▷ #hqq (27 messages🔥):

Links mentioned:


CUDA MODE ▷ #triton-viz (17 messages🔥):


Datasette - LLM (@SimonW) ▷ #ai (34 messages🔥):

Links mentioned:


DiscoResearch ▷ #benchmark_dev (10 messages🔥):

Links mentioned:


DiscoResearch ▷ #discolm_german (7 messages):

Links mentioned:


Mozilla AI ▷ #announcements (1 messages):

Link mentioned: Solo - Free AI Website Creator: Solo uses AI to instantly create a beautiful website for your business


Mozilla AI ▷ #llamafile (11 messages🔥):

Links mentioned:


Skunkworks AI ▷ #general (1 messages):

Link mentioned: Mixture-of-Depths: Dynamically allocating compute in transformer-based language models: Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific ...


Skunkworks AI ▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=KxOqjKq2VyY


Skunkworks AI ▷ #papers (1 messages):

carterl: https://arxiv.org/abs/2404.02684


LLM Perf Enthusiasts AI ▷ #claude (2 messages):