Frozen AI News archive

Multi-modal, Multi-Aspect, Multi-Form-Factor AI

Between April 12-15, **Reka Core** launched a new GPT4-class multimodal foundation model with a detailed technical report described as "full Shazeer." **Cohere Compass** introduced a foundation embedding model for indexing and searching multi-aspect enterprise data like emails and invoices. The open-source **IDEFICS 2-8B** model continues Google's Flamingo multimodal model reproduction. **Rewind** pivoted to a multi-platform app called Limitless, moving away from spyware. Reddit discussions highlighted **Apple MLX** outperforming **Ollama** and **Mistral Instruct** on M2 Ultra GPUs, GPU choices for LLMs and Stable Diffusion, and AI-human comparisons by Microsoft Research's Chris Bishop. Former PayPal CEO Dan Schulman predicted **GPT-5** will drastically reduce job scopes by 80%. **Mistral** CEO Arthur Mensch criticized the obsession with AGI as "creating God."

Canonical issue URL

Whole months happen in some days in AI - just as Feb 15 saw Sora and Gemini 1.5 and a bunch of other launches, the ides of April saw huge launches from:

Reka Core

A new GPT4-class multimodal foundation model...

image.png

... with an actually useful technical report...

image.png

... being "full Shazeer"

image.png

Cohere Compass

our new foundation embedding model that allows indexing and searching on multi-aspect data. Multi-aspect data can best be explained as data containing multiple concepts and relationships. This is common within enterprise data — emails, invoices, CVs, support tickets, log messages, and tabular data all contain substantial content with contextual relationships.

image.png

IDEFICS 2-8B

Continued work from last year's IDEFICS, a totally open source reproduction of Google's Flamingo unreleased multimodal model.

image.png

Rewind pivots to Limitless

It’s a web app, Mac app, Windows app, and a wearable.

Spyware is out, Pendants are in.

image.png


Table of Contents

[TOC]


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence. Comment crawling works now but has lots to improve!

AI Models and Performance

LLM and AI Developments

Industry and Career

Tools and Resources

Hardware and Performance

Memes and Humor


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

AI Models and Architectures

AI Capabilities and Benchmarks

Open Source and Democratizing AI

Industry and Ecosystem


AI Discord Recap

A summary of Summaries of Summaries

  1. Advancements in Large Language Models (LLMs): There is significant excitement and discussion around new releases and capabilities of LLMs across various platforms and organizations. Key examples include:
  1. Optimizations and Techniques for LLM Training and Inference: Extensive discussions revolve around optimizing various aspects of LLM development, including:
  1. Open-Source Initiatives and Community Collaboration: The AI community demonstrates a strong commitment to open-source development and knowledge sharing, as evidenced by:
  1. Datasets and Data Strategies for LLM Development: Discussions highlight the importance of data quality, curation, and strategic approaches to training data, including:
  1. Misc

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

Stable Diffusion 3 Sparks Excitement: AI enthusiasts are buzzing with anticipation over Stable Diffusion 3 (SD3), discussing its potential for efficiency gains and debating the merits of SD Forge for optimizing performance on less powerful GPUs.

Pixart-Sigma Pushing VRAM Boundaries: The application of Pixart-Sigma with T5 conditioning within ComfyUI provokes discussions around VRAM usage, with participants noting that T5 maintains VRAM usage under 10.3GB even in 8-bit compression mode.

AI Tools for Content Creation Get Spotlight: Query exchanges regarding AI tools like ControlNet, Lora, and expansion techniques like outpainting hint at a need for consistent color generation and background extension in creative workflows.

Debating CPU vs. GPU Efficiency for AI: Community members exchange troubleshooting tips on GPU memory optimization and flag features like --lowvram for those running Stable Diffusion on less potent machines, highlighting the significant speed difference between CPU and GPU processing.

Artists Seek Tech-Driven Collaborations: The trend of fusing AI with artistic tools continues as a digital artist seeks input and tutorial assistance on a painting app combining AI features, with project details found on GitHub.


Perplexity AI Discord


Unsloth AI (Daniel Han) Discord

Geohot Hacks Back P2P to 4090s: "geohot" ingeniously implemented peer-to-peer support into NVIDIA 4090s, enabling enhanced multi-GPU setups; details are available on GitHub.

Unsloth Gains Multi-GPU Momentum: Interest spiked on Multi-GPU support in Unsloth AI, with Llama-Factory touted as a worthy investigation route for integration.

A Hugging Face of Encouragement: Unsloth AI garnered attention from Hugging Face CEO Clement Delangue by securing a follow on an unspecified platform, suggesting potential collaborative undertones.

Linguistic Labyrinth of AI PhD Life: A PhD student outlined their challenging exploration in developing an instruction-following LLM for their mother tongue, highlighting the project complexity beyond mere translation and fine-tuning.

Million-Dollar Mathematical AI: The community engaged with the prospect of a $10M Kaggle AI prize to create an LLM capable of acing the International Math Olympiad and unpacked the Beal Conjecture's $1M bounty for a proof or counterexample at AMS.

Resourceful VRAM Practices & Strategic Finetuning: AI engineers converged on efficient use of VRAM for training robust LLMs like Mistral, sharing best practices such as the "30% less VRAM" update from Unsloth and the nuanced approach of initiating finetuning with shorter examples.

Cultural Conquest of Linguistic Datasets: Strategies to amplify low-resource language datasets were exchanged, including the use of translation data from platforms like HuggingFace.

Pioneering Podman for AI Deployment: Innovators showcased the deployment of Unsloth AI in Podman containers, streamlining local and cloud implementation, as seen in this demo: Podman Build - For cost-conscious AI.

Ghost 7B Alpha Raises the Benchmark: Ghost 7B Alpha was proclaimed for superior reasoning compared to other models, signaling fine-tuning prowess without expanding tokenizers, as discussed by enthusiasts.

Merging Minds on Model Compression: The melding of adapters, particularly QLoRA and 16bit-saving techniques for vLLM or GGUF, was deliberated, contemplating the intricacies of naive merging versus dequantization strategies.


Eleuther Discord

Pile-T5 Power: EleutherAI introduces Pile-T5, an enhanced T5 model variant produced through training with 2 trillion tokens from the Pile. It showcases significant performance improvements in both SuperGLUE and MMLU benchmarks and excels in code-related tasks, with resources including weights and scripts open-sourced on GitHub.

Entropic Data Filtering: The CVPR 2024 paper suggests a notable advancement in unpacking the importance of entropy in data filtering. An empirical study unveiled scaling laws capturing how data curation is fundamentally linked with entropy, enriching the community's understanding of heterogeneous & limited web data and its practical implications. Explore the study here.

Inside the Transformer Black Box: Google's Patchscopes framework endeavors to make LLMs' hidden representations more interpretable by generating explanations in natural language. Likewise, a paper introducing a toolkit for transformers to conduct causal manipulations showcases the value of pinpointing key model subcircuits during training, possibly offering pathways to avoid common training roadblocks. Details on the JAX toolkit can be found in this tweet from Stephanie Chan (@scychan_brains) and on the Patchscopes framework here.

MoE vs. Dense Transformers Debate: Discussions in the community probe the capacity and benefits of MoE versus dense transformer models. Key insights reveal MoEs' relative advantage sans VRAM constraints and question dense models' performance parity at comparable parameter budgets. There's a pronounced curiosity regarding the foundational attributes driving model behavior beyond the metrics.

NeoX Nuances Unveiled: Questions within the GPT-NeoX project brought up intricacies like oversized embedding matrices for GPU efficiency and peculiar weight decay behaviors potentially due to non-standard activations. A remark on rotary embeddings noted its partial application in NeoX as against other models. A corporate CLA is being devised to facilitate contributions to the project.


Nous Research AI Discord


OpenRouter (Alex Atallah) Discord

Mixtral Model Mix-Up: The community reported that Mixtral 8x22B:free is discontinued; users should transition to the Mixtral 8x22B standard model. Experimental models Zephyr 141B-A35B and Fireworks: Mixtral-8x22B Instruct (preview) are available for testing; the latter is a fine-tune of Mixtral 8x22B.

Token Transaction Troubles: A user's issue with purchasing tokens was deemed unrelated to OpenRouter; they were advised to contact Syrax directly for resolution.

Showcasing Rubik's Research Assistant: Users interested in testing the new Rubiks.ai research assistant can join the beta test with a 2-month free premium. This tool includes access to models like Claude 3 Opus and GPT-4 Turbo, among others; testers should use the code RUBIX and provide feedback. Explore Rubik's AI.

Dynamic Routing Deliberation: There's a buzz around improving Mixtral 8x7B Instruct (nitro) speeds via dynamic routing to the highest transaction-per-second (t/s) endpoint. There were varying opinions on the performance of models like Zephyr 141b and Firextral-8x22B as well.

WizardLM-2 Series Spells Excitement: The newly announced WizardLM-2 series with model sizes including 8x22B, 70B, and 7B, has garnered community interest. Attention is particularly focused on the expected performance of WizardLM-2 8x22B on OpenRouter.


LM Studio Discord

Models On The Fritz: Users reported problems with model loading in LM Studio across different versions and OS, including error messages about "Error loading model." and issues persisting after downgrading and turning off GPU offload. A full removal and reinstall of LM Studio were sought by one user after ongoing frustrations with model performance.

Attention to Hardware: There was considerable discussion surrounding hardware requirements for running AI models effectively, highlighting the necessity for high-tier equipment for an experience on par with GPT-4 and the underutilization of Threadripper Pro CPUs. Contrastingly, ROCm support was called into question, with specific mention of the Radeon 6600 being unsupported in LM Studio.

Quantized conundrums and innovative inferences: Quantization was a hot topic, with users noting performance changes and debate on whether the trade-off in output quality is worthwhile. Meanwhile, a paper titled "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" was circulated, illuminating cutting-edge methodologies for Transformers handling prolonged inputs effectively.

Navigating Through New Model Territories: Users shared experiences and recommendations for an assortment of new models like Command R+ and Mixtral 8x22b, focusing on different setups, performance, and even introducing a commands-based tool missionsquad.ai. Notably, Command R+ has been praised for outpacing similar models in analyzing lengthy documents.

Beta Blues and Docker Distributions: While some users grappled with troubles in MACOS 0.2.19, others spearheaded initiatives like creating and maintaining a Docker image for Autogen Studio, hosted on GitHub. Concurrently, sentiments of letdown were conveyed over Linux beta releases missing out on ROCm support, highlighting the niche hurdles that technical users encounter.


OpenAI Discord


Latent Space Discord

Collaboration Beyond Borders with Scarlet AI: The new Scarlet platform enhances task management with collaborative AI agents that provide automated context and support tool orchestration, eyeing an integration with Zapier NLA. A work-in-progress demo was mentioned and Scarlet's website provides details on their offerings.

Whispers of Perplexity's Vanishing Act: The removal of Perplexity "online" models from LMSYS Arena spurred debate on the model's efficacy and potential integrations into engineering tools. This instance underscores a shared interest in the integration efficiency of AI models.

AI Wrangles YouTube's Wild West: Engineers examined strategies to transcribe and summarize YouTube content for AI applications, debating the merits of various tools including Descript, youtube-dl, and Whisper with diarization. The discussion reflects ongoing endeavors to streamline content processing for AI model training.

Limitless Eyes the Future of Personalized AI: The rebranding of Rewind to Limitless introduces a wearable with personalized AI capabilities, sparking discussions around local processing and data privacy. This highlights a peak interest in the security implications of new AI-powered wearable technologies.

Navigating Vector Space in Semantic Search: Insightful discussions on the complexities of semantic search, including vector size, memory usage, and retrieval performance, culminated in a proposed hackathon to delve deeper into embedding models. The discourse reflects a keen focus on optimizing speed, efficiency, and performance in AI-powered search applications.


Modular (Mojo 🔥) Discord

Earn Your Mojo Stripes: Engage with the Mojo community by contributing to the modularml/mojo repo or creating cool open source projects to attain the prestigious Mojician role. Those with merged PRs can DM Jack to match their GitHub contributions with Discord identities.

Mojo's Python Aspirations: The community is buzzing with the anticipation of Mojo extending full support to Python libraries, aiming to include Python packages with C extensions. Meanwhile, efforts to enhance Mojo's Reference usability are in motion, with Chris Lattner hinting at a proposal that could simplify mutability management.

Code Generation and Error Handling Forefront: GPU code generation tactics and the potential integration of an "extensions" feature resembling Swift's implementation for superior error handling have sparked technical debates among members, indicating a future direction for Mojo's development.

Community Code Collaborations Spike: A flurry of activity surrounds the llm.mojo project, where performance boost techniques such as vectorization and parallelization might benefit from collective wisdom, including maintaining synchronized C and Mojo code bases within a single repository.

Nightly News on Mojo Updates: The Mojo team addresses standard library discrepancies in package naming with an upcoming release fix. Additionally, the idea of updating StaticTuple to Array and supporting AnyType garnered interest, and a call for Unicode support contributions to the standard library arose, along with discussions on proper item assignment syntax.

Twitter Dispatch for Modular Updates: Keep an eye on Modular's Twitter for the latest flurry of tweets covering updates and announcements, including a series of six recent tweets from the organization that shed light on their ongoing initiatives.


CUDA MODE Discord

P2P Gets a Speed Boost: Tinygrad enhances P2P support on NVIDIA 4090 GPUs, reaching 14.7 GB/s AllReduce performance after modifying NVIDIA's driver, as detailed here, while PyTorch tackles namespace build complexities with the nightly build showing slower performance versus torch.matmul.

Massive Row Sorting Challenge Awaits CUDA Competitors: tspeterkim_89106 throws down the gauntlet with a One Billion Row Challenge in CUDA implementation that impressively runs in 16.8 seconds on a V100 and invites others to beat a 6-second record on a 4090 GPU.

New Territories in Performance Optimization: CUDA discussions orbit around the merits of running independent matmuls in separate CUDA streams, leveraging stable_fast over torch.compile, and the pursuit of high-efficiency low-precision computations demonstrated by stable_fast challenging int8 quant tensorrt speeds.

Recording and Sharing CUDA Expertise: CUDA MODE is recruiting volunteers to record and share content through YouTube, where lecturing material is also maintained on GitHub, and highlighting potential shifts to live streaming to manage growing member scales.

HQQ and LLM.c: Striding Towards Efficiency: Updates in HQQ implementation on gpt-fast push token generation speeds with torchao int4 kernel support, while LLM.c confronts CUDA softmax integration challenges, explores online softmax algorithm efficiencies, and juggles the dual goals of peak performance and educational clarity.


Cohere Discord

Connector Confusion Cleared: A member resolved an issue with Cohere's connector API, learning that connector URLs must end with /search to avoid errors.

Fine-Tuning Finesse: Cohere's base models are available for fine-tuning through their dashboard, as confirmed in a dashboard link, with options expanded to Amazon Bedrock and AWS.

New Cohere Tools on the Horizon: Updates were shared on new Cohere capabilities, specifically named Coral for chatbot interfaces, and upcoming releases for AWS toolkits for connector implementations.

Model Performance Discussed: Dialogue around Cohere models like command-r-plus touched on their performance on different hardware, including Nvidia's latest graphics cards and TPUs.

Learning Avenues for AI Newbies: New community members seeking educational resources were directed to free offerings like LLM University and provided with a link to Cohere's educational documentation.

Command-R Rocks the Core: Command-R received accolades for being a newly integrated core module by Cohere, highlighting its significance.

Quant and AI Converge: An invitation for beta testing of Quant Based Finance was posted, appealing to those interested in financial analysis powered by AI, with a link here.

Rubiks.AI Rolls Out: An invite for beta testing of Rubiks.AI, a new advanced research assistant and search engine, was shared, offering early access to models like Claude 3 Opus and GPT-4 Turbo, available at Rubiks.AI.


HuggingFace Discord

A Bundle of Multi-Model Know-how: Engineers exchanged tips on deploying multiple A1111 models on a GPU, highlighting resource allocation for parallel model runs. Discussions in NLP explored lightweight embedding options such as all-MiniLM-L6-v2, with paraphrase-multilingual-MiniLM-L12-v2 suggested for academic purposes.

Cognitive Collision: AI Models Straddle Realities: The cool-finds channel shared links to PyTorch and Blender integration for real-time data pipelines, while a Medium post introduced LlamaIndex's document retrieval enhancements. The Grounding DINO model for zero-shot object detection and how it utilizes this in Transformers trended in computer-vision.

Community-Sourced AI Timeline & Mental Models: Project RicercaMente, aiming to map data science evolution through key papers, was touted in cool-finds and NLP, inviting collaboration from the community. Meanwhile, a Counterfactual Inception method was presented to address hallucinations in AI responses, detailed in a paper on arXiv and a related GitHub project.

Training Trials & Tribulations: A U-Net training plateau after 11 epochs led a user to consider Deep Image Prior for image cleaning tasks, shared in computer-vision. In diffusion-discussions, there was an exploration of multimodal embeddings and a clarification about an overstretched token limit warning in a Gradio chatbot for image generation.

Crossing Streams: Events & Education: Upcoming LLM Reading Group sessions focusing on groundbreaking research, including OpenAI's CLIP model and Zero-Shot Classification, were advertised in today-im-learning and reading-group. Resources for those starting out in NLP were also suggested, featuring beginner's guides and transformer overview videos.


LAION Discord

Scam Notice: No LAION NFTs Exist: There have been repeated warnings about a scam involving a fraudulent Twitter account claiming LAION is offering NFTs, but the community has confirmed LAION is solely focused on free, open-source AI resources and does not engage in selling anything.

When to Guide the Diffusion: An arXiv paper introduced findings that the best image results from diffusion models occur when guidance is applied at particular noise levels, emphasizing the middle stages of generation while avoiding early and late phases.

Innovations from AI Audio to Ethics Discussions: Discussions ranged from the introduction of new AI models, like Hugging Face's Parler TTS for sound generation, to debates over the proper use of 'ethical datasets' and the implications of such political language in AI research.

Stable Diffusion 3's Dilemma: Community insight suggested Stable Diffusion 3 faces a risk of quality decline due to its rigorous prompt censorship. There is anticipation for further refinement to address this possible issue shared particularly on Reddit.

Troubleshooting Diffusion Models: A GitHub repository was shared by a member who faces a training issue with their diffusion model, which is outputting random noise and solid black during inference, despite attempts to adjust regularization and learning rate.


LangChain AI Discord

Be Cautious of Spam: Several channels have reported spam messages containing links to adult content, falsely advertising TEEN/ONLYFANS PORN with a potential phishing risk. Members are advised to avoid engaging with suspicious links or content.

RAG Operations Demand Precision: Users are encountering issues with document splitting during Retrieval-Augmented Generation (RAG) operations on legal contracts, where section contents are mistakenly linked to preceding sections, compromising retrieval accuracy.

LangChain Gets Parallel: Utilizing LangChain's RunnableParallel class allows for the parallel execution of tasks, enhancing efficiency in LangGraph operations—an approach worth considering for those optimizing for performance.

Emerging AI Tools and Techniques: A variety of resources, tutorials, and projects have been shared, including Meeting Reporter, how-to guides for RAG, and Personalized recommendation systems, to equip AI professionals with cutting-edge knowledge and practical solutions.

Watch and Learn: A series of YouTube tutorials has been highlighted, focusing on the implementation of chat agents using Vertex AI Agent Builder and integrating them with communication platforms like Slack, valuable for those interested in AI-infused app development.


OpenAccess AI Collective (axolotl) Discord

OpenAccess AI Goes Open-Source: NVIDIA Linux GPU support with P2P gets a boost from open-gpu-kernel-modules on GitHub, offering a tool for enhanced GPU functionality.

Fireworks AI Ignites with Instruct MoE: Promising results from Fireworks AI's Mixtral 8x22b Instruct OH, previewed here, although facing a hiccup with DeepSpeed zero 3 which was addressed by pulling updates from DeepSpeed's main branch.

DeepSpeed's Contributions Clarified: While it doesn't accelerate model training, deepspeed zero 3 shines in training larger models, with a successful workaround integrating updates from DeepSpeed's official GitHub repository for MoE models.

A Harmonious Relationship with AI: Advances in AI-music creation gain spotlight with a tune crafted by AI, available for a listen at udio.com.

Tools for Advanced Model Training: Engaging discussions revolve around model merging, the use of LISA and DeepSpeed, and their effects on model performance. This tool was cited for extracting Mixtral model experts, alongside hardware prerequisites.

Dynamic Weight Unfreezing: Conversations emerge around dynamic weight unfreezing tactics for GPU-constrained users, alongside an unofficial GitHub implementation for Mixture-of-Depths, accessible here.

RTX 4090 GPUs and P2P: Success in enabling P2P memory access with tinygrad on RTX 4090s lead to discussions about removing barriers to P2P usage and community eagerness regarding this achievement.

Combatting Model Repetitiveness: Persistent shortcomings with a model producing repetitive outputs guide members towards exploring diverse datasets and finetuning methods. Mergekit emerges as a go-to for model surgery with configuration insights gleaned from WestLake-10.7B-v2.

Fine-Tuning and Prompt Surgery: One delves into Axolotl's finetuning intricacies, troubleshooting IndexError and prompt formatting woes with guidance from the Axolotl GitHub repo and successful configuration adjustments.

Mistral V2 Outshines: Exceptional first-epoch results with Mistral v2 instruct overshadow others, demonstrating aptitude in diverse tasks including metadata extraction, outperforming models like qwen with new automation capabilities.

DeepSpeed Docker Deep Dive: Distributed training via DeepSpeed necessitates a custom Docker build and streamlined SSH keys for passwordless node intercommunication. Launching containers with the correct environment variables is essential, as detailed in this Phorm AI Code Search) link.

DeepSpeed and 🤗 Accelerate Collaboration: DeepSpeed integrates smoothly with 🤗 Accelerate without overriding custom learning rate schedulers, with the push_to_hub method complementing the ease of Hugging Face model hub repository creation.


OpenInterpreter Discord

Malware Scare and Command Line Riddles: Engineers have raised an issue about Avast antivirus detections and confusion stemming from the ngrok/ngrok/ngrok command line in OpenInterpreter; updates to the documentation were suggested to clear user concerns.

Tiktoken Gets a PR For Building Progress: A GitHub pull request aiming to resolve a build error by updating the tiktoken version for OpenInterpreter suggests improvements are on the way; review the changes here.

Persistence Puzzle: Emergence of Assistants API: The integration of the Assistants API for data persistence has been discussed, with community members creating Python assistant modules for better session management; advice on node operations implementation is being sought.

OS Mode Odyssey on Ubuntu: The Open Interpreter's OS Mode on Ubuntu has generated troubleshooting conversations, with a focus on downgrading to Python 3.10 for platform compatibility and configuring accessibility settings.

Customize It Yourself: O1's User-Driven Innovation: The O1 community showcases their creativity through personal modifications and enhancements such as improved batteries and custom cases; a custom GPT model trained on Open Interpreter's documentation is lending a hand to ChatGPT Plus users.


LlamaIndex Discord

LlamaIndex Migrates PandasQueryEngine: The latest LlamaIndex update (v0.10.29) relocated PandasQueryEngine to llama-index-experimental, necessitating import path adjustments and providing error messages for guidance on the transition.

AI Application Generator Garners Attention: In partnership with T-Systems and Marcus Schiesser, LlamaIndex launched create-tsi, a command line tool to generate GDPR-compliant AI applications, stirring the community's interest with a promotional tweet.

Redefining Database and Retrieval Strategies: Community exchanges delved into ideal vector databases for similarity searches, contrasting Qdrant, pg vector, and Cassandra, along with discussions on leveraging hybrid search for multimodal data retrieval, referencing the LlamaIndex vector stores guide.

Tutorials and Techniques to Enhance AI Reasoning: Articles and a tutorial shared showcased methods to fortify document retrieval with memory in Colbert-based agents and integrating small knowledge graphs to boost RAG systems, as highlighted by WhyHow.AI.

Community Commendations and Technical Support: Community appreciation for articles on advancing AI reasoning was voiced, while LlamaIndex users tackled technical woes and encouraged proactive contribution to documentation, reinforcing the platform's dedication to knowledge sharing and support.


tinygrad (George Hotz) Discord

Relevant links for further exploration and understanding included:


Interconnects (Nathan Lambert) Discord

Hugging Face Collections Streamline Artifact Organization: Hugging Face collections have been introduced to aggregate artifacts from a blog post on open models and datasets. The collections offer ease of re-access and come with an API as seen in the Hugging Face documentation.

The "Incremental" Update Debate and Open Data Advocacy: Community members are divided on the importance of the transition from Claude 2 to Claude 3, and there's a push to remember the value of open data initiatives that may be getting overshadowed. Meanwhile, AI release announcements appear in force, with Pile-T5 and WizardLM 2 amongst the front runners.

Synthesis of Machine Learning Discourse: Conversations touched on obligations with ACL revision uploads, the benefits of "big models," and distinctions between critic vs reward models—with Nato's RewardBench project being a point of focus. A tweet from @andre_t_martins provided clarity on the ACL revision uploads process.

Illuminating Papers and Research: Key papers highlighted include "CodecLM: Aligning Language Models with Tailored Synthetic Data" and "LLaMA: Open and Efficient Foundation Language Models" with Hugging Face identifier 2302.13971. Synthetic data's role and learning from stronger models were key takeaways from the discussions.

Graphs Garner Approval, and Patience Is Proposed for Bots: In the realm of newsletters, graphs won praise for their clarity, with a commitment to future enhancement and integration into a Python library. A lone mention was made of an experimental bot that might benefit from patience rather than premature intervention.


LLM Perf Enthusiasts AI Discord

Haiku's Speed Hiccup: Engineers discussed Haiku's slower total response times in contrast to its throughput, suggesting Bedrock's potential as an alternative despite speed concerns there as well.

Claude's RP Constraints: Concerns arose as Claude refuses to engage with roleplay prompts such as being a warrior maid or sending fictional spam mail, even after using various prompting techniques.

Jailbreak Junction: Amid discussions, a tweet by @elder_plinius was shared about a universal jailbreak for Claude 3 to enable edgier content which bypasses the strict content filters like those in Gemini 1.5 Pro. The community appears to be evaluating its implications; the tweet is available here.

Code Competence Claims: A newer version of an unnamed tool is lauded for its enhanced coding abilities and speed, with a member considering to reactivate their ChatGPT Plus subscription to further test these upgrades.

Claude's Contextual Clout: Despite improvements in other tools, Claude maintains its distinction with long context window code tasks, implying limitations in ChatGPT's context window size handling.


DiscoResearch Discord

Mixtral Mastery or Myth?: Community discussions touched on uncertainties in the training and finetuning efficiency of Mixtral MoE models, with some suspecting undisclosed techniques behind its performance. Interest was shown in weekend ventures of finetuning with en-de instruct data, and Envoid's "fish" model was mentioned as a curious case for RP/ERP applications in Mixtral, albeit untested due to hardware limitations.

Llama-Tokenizer Tinkering Tutorial Tips: Efforts to optimize a custom Llama-tokenizer for small hardware utilization led to shared resources such as the convert_slow_tokenizer.py from Hugging Face and the convert.py script with --vocab-only option from llama.cpp GitHub. Additionally, there's a community call for copyright-free EU text and multimodal data for training large open multimodal models.

Template Tendencies of Translators: The occiglot/7b-de-en-instruct model showcased template sensitivity in evaluations, with performance variations on German RAG tasks due to template correctness as indicated by Hugging Face's correct template usage.

Training Parables from StableLM and MiniCPM: Insights on pretraining methods were highlighted, referencing StableLM's use of ReStruct dataset inspired by this ExpressAI GitHub repo, and MiniCPM's preference for mixing data like OpenOrca and EvolInstruct during cooldown phases detailed in Chapter 5 of their study.


Mozilla AI Discord

Burnai Ignites Interest: Rust enthusiasts in the community are pointing out the underutilized potential of the Burnai project for optimal inference, sparking questions about Mozilla's lack of engagement despite Rust being their creation.

Llamafile Secures Mcaffee's Trust: The llamafile 0.7 binary has successfully made it to Mcaffee's whitelist, marking a win for the project's security reputation.

New Collaborator Energizes Discussions: A new participant has joined the fold, eager to dive into collaborations and knowledge-sharing, signaling a fresh perspective on the horizon.

Curiosity Peaks for Vulkan and tinyblas: Intrigue is brewing over potential Vulkan compatibility in the anticipated v0.8 release and the benefits of upstreaming tinyblas for ROCm applications, indicating a concerted focus on performance enhancements.

Help Wanted for Model Packaging: Demand for guidance on packaging custom models into llamafile has led to community exchanges, giving rise to contributions like a GitHub Pull Request on container publishing.


Alignment Lab AI Discord


AI21 Labs (Jamba) Discord


Datasette - LLM (@SimonW) Discord


Skunkworks AI Discord


PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #general-chat (1129 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #general (879 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (23 messages🔥):

Link mentioned: Meta Releases AI on WhatsApp, Looks Like Perplexity AI: Meta has quietly released its AI-powered chatbot on WhatsApp, Instagram, and Messenger in India, and various parts of Africa.


Perplexity AI ▷ #pplx-api (26 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (431 messages🔥🔥🔥):

<ul>
  <li><strong>Unsloth Multi-GPU Update Query</strong>: There were inquiries regarding updates on Multi-GPU support for Unsloth AI. Suggestions to look into Llama-Factory with Unsloth integration were mentioned.</li>
  <li><strong>Geohot Adds P2P to 4090s</strong>: A significant update where "geohot" has hacked P2P back into NVIDIA 4090s was shared, along with a relevant <a href="https://github.com/tinygrad/open-gpu-kernel-modules">GitHub link</a>.</li>
  <li><strong>Upcoming Unsloth AI Demo and Q&A Event Alert</strong>: An announcement for a live demo of Unsloth AI with a Q&A session by Analytics Vidhya was shared. Those interested were directed to join via a posted <a href="https://us06web.zoom.us/webinar/register/WN_-uq-XlPzTt65z23oj45leQ">Zoom link</a>.</li>
  <li><strong>Mistral Model Fusion Tactics Discussed</strong>: There was a discussion about the practicality of merging MOE experts into a single model, with skepticism regarding output quality. Some considered fine-tuning Mistral for narrow tasks and removing lesser-used experts as a potential compression method.</li>
  <li><strong>Hugging Face CEO Follows Unsloth on Platform X</strong>: Clement Delangue, co-founder, and CEO of Hugging Face, now follows Unsloth on an unnamed platform, sparking hopes for future collaborations between the two AI communities.</li>
</ul>

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (38 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (313 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (54 messages🔥):

Links mentioned:


Eleuther ▷ #announcements (1 messages):

Links mentioned:


Eleuther ▷ #general (123 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (534 messages🔥🔥🔥):

Links mentioned:


Eleuther ▷ #scaling-laws (10 messages🔥):

Link mentioned: Tweet from Pratyush Maini (@pratyushmaini): 1/ 🥁Scaling Laws for Data Filtering 🥁 TLDR: Data Curation cannot be compute agnostic! In our #CVPR2024 paper, we develop the first scaling laws for heterogeneous & limited web data. w/@goyalsach...


Eleuther ▷ #interpretability-general (6 messages):

Links mentioned:


Eleuther ▷ #lm-thunderdome (1 messages):

Link mentioned: GitHub - EQ-bench/EQ-Bench: A benchmark for emotional intelligence in large language models: A benchmark for emotional intelligence in large language models - EQ-bench/EQ-Bench


Eleuther ▷ #gpt-neox-dev (23 messages🔥):


Nous Research AI ▷ #ctx-length-research (1 messages):


Nous Research AI ▷ #off-topic (21 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (29 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (382 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (68 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #rag-dataset (5 messages):

Link mentioned: RAG/Long Context Reasoning Dataset: no description found


Nous Research AI ▷ #world-sim (66 messages🔥🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (9 messages🔥):

Link mentioned: Rubik's AI - AI research assistant & Search Engine: no description found


OpenRouter (Alex Atallah) ▷ #general (480 messages🔥🔥🔥):

Links mentioned:


LM Studio ▷ #💬-general (237 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (87 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (30 messages🔥):


LM Studio ▷ #📝-prompts-discussion-chat (11 messages🔥):


LM Studio ▷ #🎛-hardware-discussion (53 messages🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (6 messages):


LM Studio ▷ #autogen (4 messages):

Links mentioned:


LM Studio ▷ #amd-rocm-tech-preview (38 messages🔥):


OpenAI ▷ #ai-discussions (397 messages🔥🔥):

Links mentioned:


OpenAI ▷ #gpt-4-discussions (36 messages🔥):


OpenAI ▷ #prompt-engineering (11 messages🔥):


OpenAI ▷ #api-discussions (11 messages🔥):


Latent Space ▷ #ai-general-chat (228 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-in-action-club (143 messages🔥🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #general (26 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #💬︱twitter (6 messages):


Modular (Mojo 🔥) ▷ #ai (1 messages):

docphaedrus: https://www.youtube.com/watch?v=1cQbu2zXTKk

Podman Pull- On basic terminal commands


Modular (Mojo 🔥) ▷ #🔥mojo (218 messages🔥🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #community-projects (23 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #community-blogs-vids (4 messages):

Links mentioned:


Modular (Mojo 🔥) ▷ #🏎engine (3 messages):


Modular (Mojo 🔥) ▷ #nightly (55 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #general (29 messages🔥):

Links mentioned:


CUDA MODE ▷ #cuda (77 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #torch (15 messages🔥):

Links mentioned:


CUDA MODE ▷ #cool-links (1 messages):

Link mentioned: Just enough CUDA to be dangerous: Listen to this episode from PyTorch Developer Podcast on Spotify. Ever wanted to learn about CUDA but not sure where to start? In this sixteen minute episode I try to jam in as much CUDA knowledge as ...


CUDA MODE ▷ #beginner (5 messages):

Link mentioned: Courses: no description found


CUDA MODE ▷ #pmpp-book (9 messages🔥):


CUDA MODE ▷ #youtube-recordings (6 messages):


CUDA MODE ▷ #torchao (2 messages):

Links mentioned:


CUDA MODE ▷ #ring-attention (3 messages):

Link mentioned: Ring Attention Explained | Coconut Mode: Near infinite context window for language models.


CUDA MODE ▷ #off-topic (2 messages):

Link mentioned: Tweet from kache (dingboard.com) (@yacineMTB): IT CHANGES EVERYTHING!!!!!!


CUDA MODE ▷ #hqq (12 messages🔥):

Link mentioned: HQQ 4 bit llama 2 7b · pytorch-labs/gpt-fast@551af74: export MODEL_REPO=meta-llama/Llama-2-7b-hf scripts/prepare.sh $MODEL_REPO python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-hqq --groupsize 64 python generate.py --...


CUDA MODE ▷ #triton-viz (3 messages):


CUDA MODE ▷ #llmdotc (135 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #recording-crew (12 messages🔥):

Links mentioned:


Cohere ▷ #general (308 messages🔥🔥):

Links mentioned:


Cohere ▷ #project-sharing (2 messages):

Links mentioned:


HuggingFace ▷ #general (172 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (6 messages):

<ul>
  <li><strong>Decoding CLIP's Concept:</strong> A blog by <a href="https://www.linkedin.com/in/matthewbrems/">Matthew Brems</a> aims to simplify the understanding of <strong>OpenAI's CLIP model</strong>, covering what it is, its workings, and its significance. This multimodal model from OpenAI offers a novel approach to computer vision by training on both images and text. <a href="https://www.kdnuggets.com/2021/03/beginners-guide-clip-model.html">Beginner's Guide to CLIP Model</a>.</li>
  <li><strong>Applying Zero-Shot Classification:</strong> After learning about Zero-Shot Classification with the <strong>bart-large-mnli model</strong>, a demo was created showcasing its application to classify 3D assets from <a href="https://thebasemesh.com/">thebasemesh.com</a>, a free resource of base meshes for creative projects. Watch the exploration on <a href="https://www.youtube.com/watch?v=jJFvOPyEzTY">YouTube</a>.</li>
  <li><strong>Cost-Effective AI with Podman:</strong> A demonstration video was shared showing how <strong>Podman</strong> is used to build containerized generative AI applications, which offers a cost-efficient alternative for local and cloud deployment. The technology promises control and choice regarding deployment options. Tutorial on <a href="https://youtu.be/3iEhFKIDXp0?si=nQJt-PBJUB960gpU">YouTube</a>.</li>
</ul>

Links mentioned:


HuggingFace ▷ #cool-finds (10 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (17 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (7 messages):

Link mentioned: LLM Reading Group (March 5, 19; April 2, 16, 30; May 14; 28): Come and meet some of the authors of some seminal papers in LLM/NLP research and hear them them talk about their work


HuggingFace ▷ #computer-vision (7 messages):

Links mentioned:


HuggingFace ▷ #NLP (14 messages🔥):

Link mentioned: GitHub - EdoPedrocchi/RicercaMente: Open source project that aims to trace the history of data science through scientific research published over the years: Open source project that aims to trace the history of data science through scientific research published over the years - EdoPedrocchi/RicercaMente


HuggingFace ▷ #diffusion-discussions (6 messages):

Links mentioned:


LAION ▷ #general (147 messages🔥🔥):

Links mentioned:


LAION ▷ #announcements (1 messages):


LAION ▷ #research (46 messages🔥):

Links mentioned:


LAION ▷ #learning-ml (1 messages):


LangChain AI ▷ #general (142 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #langserve (3 messages):

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.


LangChain AI ▷ #langchain-templates (4 messages):

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.


LangChain AI ▷ #share-your-work (17 messages🔥):

Links mentioned:


LangChain AI ▷ #tutorials (5 messages):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (64 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (19 messages🔥):

Link mentioned: GitHub - astramind-ai/Mixture-of-depths: Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models": Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models" - astramind-ai/Mixture-of-depths


OpenAccess AI Collective (axolotl) ▷ #general-help (39 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):

b.nodnarb: Thanks for the post, <@915779530122207333> !


OpenAccess AI Collective (axolotl) ▷ #docs (9 messages🔥):


OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (5 messages):

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (25 messages🔥):

Links mentioned:


OpenInterpreter ▷ #general (79 messages🔥🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (72 messages🔥🔥):

Link mentioned: GitHub - rbrisita/01 at linux: The open-source language model computer. Contribute to rbrisita/01 development by creating an account on GitHub.


OpenInterpreter ▷ #ai-content (1 messages):

aime_bln: https://api.aime.info


LlamaIndex ▷ #announcements (1 messages):


LlamaIndex ▷ #blog (8 messages🔥):


LlamaIndex ▷ #general (113 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (5 messages):


tinygrad (George Hotz) ▷ #general (31 messages🔥):

Link mentioned: hotfix: bump line count to 7500 for NV backend · tinygrad/tinygrad@e14a9bc: You like pytorch? You like micrograd? You love tinygrad! ❤️ - hotfix: bump line count to 7500 for NV backend · tinygrad/tinygrad@e14a9bc


tinygrad (George Hotz) ▷ #learn-tinygrad (72 messages🔥🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (5 messages):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (11 messages🔥):


Interconnects (Nathan Lambert) ▷ #ml-questions (13 messages🔥):


Interconnects (Nathan Lambert) ▷ #random (4 messages):


Interconnects (Nathan Lambert) ▷ #reads (1 messages):


Interconnects (Nathan Lambert) ▷ #sp2024-history-of-open-alignment (1 messages):

Link mentioned: [lecture artifacts] aligning open language models - a natolambert Collection: no description found


Interconnects (Nathan Lambert) ▷ #posts (2 messages):


LLM Perf Enthusiasts AI ▷ #claude (25 messages🔥):

Link mentioned: Tweet from Pliny the Prompter 🐉 (@elder_plinius): JAILBREAK ALERT! Just discovered a universal jailbreak for Claude 3. Malware, hard drug recipes, bomb-making, the whole nine yards. Haiku at high temps seems to be the best combo but in my limited ...


LLM Perf Enthusiasts AI ▷ #openai (4 messages):


DiscoResearch ▷ #mixtral_implementation (8 messages🔥):


DiscoResearch ▷ #general (4 messages):

Links mentioned:


DiscoResearch ▷ #discolm_german (2 messages):

Link mentioned: Stable LM 2 1.6B Technical Report: We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruc...


Mozilla AI ▷ #llamafile (9 messages🔥):

Links mentioned:


Alignment Lab AI ▷ #ai-and-ml-discussion (2 messages):


Alignment Lab AI ▷ #general-chat (4 messages):


Alignment Lab AI ▷ #join-in (1 messages):

There are no appropriate messages to summarize.

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.


Alignment Lab AI ▷ #oo2 (1 messages):

aslawliet: Is the project still alive?


AI21 Labs (Jamba) ▷ #jamba (6 messages):

Link mentioned: ajibawa-2023/Code-Jamba-v0.1 · Hugging Face: no description found


Datasette - LLM (@SimonW) ▷ #ai (5 messages):

Link mentioned: PasswordGPT: no description found


Skunkworks AI ▷ #off-topic (1 messages):

Link mentioned: LLMs in Prod w/ Portkey, Flybridge VC, Noetica, LastMile · Luma: Unlock the Secrets to Scaling Your Gen AI App to Production While it's easy to prototype a Gen AI app, bringing it to full-scale production is hard. We are bringing together practitioners &....