Frozen AI News archive

Apple's OpenELM beats OLMo with 50% of its dataset, using DeLighT

**Apple** advances its AI presence with the release of **OpenELM**, its first relatively open large language model available in sizes from **270M to 3B** parameters, featuring a novel layer-wise scaling architecture inspired by the **DeLight** paper. Meanwhile, **Meta's LLaMA 3** family pushes context length boundaries with models supporting over **160K tokens** and an **8B-Instruct model with 262K context length** released on Hugging Face, alongside performance improvements in quantized versions. A new paper on AI alignment highlights **KTO** as the best-performing method, with sensitivity to training data volume noted. In AI ethics and regulation, former **Google** CEO **Eric Schmidt** warns about the risks of open-source AI empowering bad actors and geopolitical rivals, while a U.S. proposal aims to enforce "Know Your Customer" rules to end anonymous cloud usage.

Canonical issue URL

Apple's AI emergence continues apace ahead of WWDC. We've covered OLMo before, and it looks like OpenELM is Apple's first actually open LLM (weights, code) release sharing some novel research in the efficient architecture direction.

image.png

It's not totally open, but it's pretty open. As Sebastian Raschka put it:

Let's start with the most interesting tidbits:

  • OpenELM comes in 4 relatively small and convenient sizes: 270M, 450M, 1.1B, and 3B
  • OpenELM performs slightly better than OLMo even though it's trained on 2x fewer tokens
  • The main architecture tweak is a layer-wise scaling strategy

But:

"Sharing details is not the same as explaining them, which is what research papers were aimed to do when I was a graduate student. For instance, they sampled a relatively small subset of 1.8T tokens from various publicly available datasets (RefinedWeb, RedPajama, The PILE, and Dolma). This subset was 2x smaller than Dolma, which was used for training OLMo. What was the rationale for this subsampling, and what were the criteria?"

image.png

The layer-wise scaling comes from DeLight, a 2021 paper deepening the standard attention mechanism 2.5-5x in number of layers but matching 2-3x larger models by parameter count. These seem paradoxical but the authors described the main trick of varying the depth between the input and the output, rather than uniform:

image.png

image.png


Table of Contents

[TOC]


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

LLaMA Developments

AI Ethics & Regulation

Hardware Developments

Memes & Humor


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Here is a summary of the key topics and insights from the provided tweets:

Meta Llama 3 Release and Impact

Phi-3 Model Release and Reception

Extending LLM Context Windows

Cohere Toolkit Release

OpenAI Employee Suspension and GPT-5 Speculation

Other Noteworthy Topics


AI Discord Recap

A summary of Summaries of Summaries

  1. Extending LLM Context Lengths

    • Llama 3 Performance and Context Length Innovations: Discussions centered around Llama 3's capabilities, with some expressing mixed opinions on its code recall and configuration compared to GPT-4. However, innovations in extending Llama 3's context length to 96k tokens for the 8B model using techniques like PoSE (Positional Skip-wisE) and continued pre-training with 300M tokens generated excitement, as detailed in this tweet thread.
    • The EasyContext project aims to extrapolate LLM context lengths to 1 million tokens with minimal hardware requirements.
  2. Optimizing LLM Training and Deployment

    • Nvidia's Nsight Compute CLI is utilized for kernel profiling to optimize CUDA code for LLM training.
    • Finetuning LLMs for Domain-Specific Gains: Interest grew in finetuning large language models for domain-specific improvements, with examples like Meditron for medical applications. Discussions also covered data synthesis strategies using tools like Argilla's Distilabel, and the challenges of multi-document, long-context finetuning. Cost-performance tradeoffs were debated, such as spending $2,368 for 4 epochs vs $41,440 for 50 epochs with potentially minor gains.
    • PyTorch introduces Torchtitan, a library dedicated to aiding LLM training from scratch.
    • The Mixture of Depths paper proposes accelerating transformer training using a modified MoE routing mechanism.
  1. Open-Source LLM Ecosystem Expansion
  1. Evaluating and Benchmarking LLMs
    • On the Judgemark benchmark, Llama-3-70b shows promise for fine-tuning disco-judge applications.
    • Discussions around the effectiveness of validation loss as a performance indicator for LLMs.
    • The Low-Cost Language Models survey evaluates CPU-friendly LLMs on Python code generation tasks.
    • Debates on the transparency of Nightshade's autoencoder capabilities and the need for publishing findings openly.

PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


Perplexity AI Discord


CUDA MODE Discord

CUDA Collective Comes Together: Members focused on honing their skills with CUDA through optimizing various kernels and algorithms, including matrix multiplication and flash attention. Threads spanned from leveraging the NVIDIA Nsight Compute CLI User Guide for kernel profiling to debate on the efficiency of low-bit quantization methods.

PyTorch Tangles with Compatibility and Extensions: A snag was hit with flash-attn compatibility in PyTorch 2.3.0, resulting in an undefined symbol error, which participants hoped to see rectified promptly. PyTorch AO ignited enthusiasm by supporting custom CUDA extensions, facilitating performance tuning using torch.compile.

Greener Code with C++: An announcement about a bonus talk from the NVIDIA C++ team on converting llm.c to llm.cpp teased opportunities for clearer, faster code.

The Matrix of Memory and Models: Discussions delved deep into finer points of CUDA best practices, contemplating burst sizes for memory coalescing around 128 bytes as explored in Chapter 6, section 3.d of the CUDA guide, and toying with the concept of reducing overhead in packed operations.

Recording Rendezvous: Volunteers stepped up for screen recording with detailed, actionable advice and Existential Audio - BlackHole for lossless sound capture, highlighting the careful nuances needed for a refined technical setup.


LM Studio Discord


Nous Research AI Discord

Pushing The Envelope on Model Context Limits: Llama 3 models are breaking context barriers, with one variant reaching a 96k context for the 8B model using PoSE and continued pre-training with 300M tokens. The efficacy of Positional Skip-wisE (PoSE) and RoPE scaling were key topics, with a paper on PoSE's context window extension and discussions on fine-tuning RoPE base during fine-tuning for lengthier contexts mentioned.

LLM Performance and Cost Discussions Engage Community: Engineers expressed skepticism about validation loss as a performance indicator and shared a cost comparison of training epochs, highlighting a case where four epochs cost $2,368 versus $41,440 for fifty epochs with minor performance gains. Another engineer is considering combining several 8B models into a mixture of experts based on Gemma MoE and speculated on potential enhancements using DPO/ORPO techniques.

The Saga of Repository Archival: Concerns were voiced about the sudden disappearance of Microsoft’s WizardLM repo, sparking a debate on the importance of archiving, especially in light of Microsoft's investment in OpenAI. Participants underscored the need for backups, drawing from instances such as the recent reveal of WizardLM-2, accessible on Hugging Face and GitHub.

Synthetic Data Generation: A One-Stop Shop: Argilla’s Distilabel was recommended for creating diverse synthetic data, with practical examples and repositories such as the distilabel-workbench illustrating its applications. The conversation spanned single document data synthesis, multi-document challenges, and strategies for extended contexts in language models.

Simulated World Engagements Rouse Curiosity: Websim’s capabilities to simulate CLI commands and full web pages have captivated users, with example simulations shared, such as the EVA AI interaction profile on Websim. Speculations on the revival of World-Sim operated in parallel, and members looked forward to its reintroduction with a "pay-for-tokens" model.


OpenAI Discord


Stability.ai (Stable Diffusion) Discord

AI Rollout Must Be Crystal Clear: Valve's new content policy requires developers to disclose AI usage on Steam, particularly highlighting the need for transparency around live-generated AI content and mechanisms that ensure responsible deployment.

Copyright Quandary in Content Creation: Conversations bubbled up over the legal complexities when generating content with public models such as Stable Diffusion; there's a necessity to navigate copyright challenges, especially on platforms with rigorous copyright enforcement like Steam.

Art Imitates Life or... Itself?: An inquiry raised by Customluke on how to create a model or a Lora to replicate their art style using Stable Diffusion sparked suggestions, with tools like dreambooth and kohya_ss surfaced for model and Lora creation respectively.

Selecting the Better Suited AI Flavor: A vocal group of users find SD 1.5 superior to SDXL for their needs, citing sharper results and better training process, evidence that the choice of AI model significantly impacts outcome quality.

Polishing Image Generation: Tips were shared for improving image generation results, recommending alternatives such as Forge and epicrealismXL to enhance the output for those dissatisfied with the image quality from models like ComfyUI.


HuggingFace Discord


Eleuther Discord


OpenRouter (Alex Atallah) Discord

Mixtral Muddle: A provider of Mixtral 8x7b faced an issue of sending blank responses, leading to their temporary removal from OpenRouter. Auto-detection methods for such failures are under consideration.

Soliloquy's Subscription Surprise: The Soliloquy 8B model transitioned to a paid service, charging $0.1 per 1M tokens. Further information and discussions are available at Soliloquy 8B.

DBRX AI Achieves AI Astonishment: Fprime-ai announced a significant advancement with their DBRX AI on LinkedIn, sparking interest and discussions in the community. The LinkedIn announcement can be read here.

Creative Model Melee: Community members argued about the best open-source model for role-play creativity, with WizardLM2 8x22B and Mixtral 8x22B emerging as top contenders due to their creative capabilities.

The Great GPT-4 Turbo Debate: Microsoft's influence on the Wizard LM project incited a heated debate, leading to a deep dive into the incidence, performance, and sustainability of models like GPT-4, Llama 3, and WizardLM. Resources shared include an incident summary and a miscellaneous OpenRouter model list.


LlamaIndex Discord

Create-llama Simplifies RAG Setup: The create-llama v0.1 release brings new support for @ollama and vector database integrations, making it easier to deploy RAG applications with llama3 and phi3 models, as detailed in their announcement tweet.

LlamaParse Touted in Hands-on Tutorial and Webinar: A hands-on tutorial showcases how LlamaParse, @JinaAI_ embeddings, @qdrant_engine vector storage, and Mixtral 8x7b can be used to create sophisticated RAG applications, available here, while KX Systems hosts a webinar to unlock complex document parsing capabilities with LlamaParse (details in this tweet).

AWS Joins Forces with LlamaIndex for Developer Workshop: AWS collaborates with @llama_index to provide a workshop focusing on LLM app development, integrating AWS services and LlamaParse; more details can be found here.

Deep Dive into Advanced RAG Systems: The community engaged in robust discussions on improving RAG systems and shared a video on advanced setup techniques, addressing everything from sentence-window retrieval to integrating structured Pydantic output (Lesson on Advanced RAG).

Local LLM Deployment Strategies Discussed: There was active dialogue on employing local LLM setups to circumvent reliance on external APIs, with guidance provided in the official LlamaIndex documentation (Starter Example with Local LLM), showcasing strategies for resolving import errors and proper package installation.


LAION Discord

Llama 3's Mixed Reception: Community feedback on Llama 3 is divided, with some highlighting its inadequate code recall abilities compared to expectations set by GPT-4, while others speculate the potential for configuration enhancements to bridge the performance gap.

Know Your Customer Cloud Conundrum: The proposed U.S. "Know Your Customer" policies for cloud services spark concern and discussion, emphasizing the necessity for community input on the Federal Register before the feedback window closes.

Boost in AI Model Training Efficiency: Innovations in vision model training are making waves with a weakly supervised pre-training method that races past traditional contrastive learning, achieving 2.7 times faster training as elucidated in this research. The approach shuns contrastive learning's heavy compute costs for a multilabel classification framework, yielding a performance on par with CLIP models.

The VAST Landscape of Omni-Modality: Enthusiasm is sighted for finetuning VAST, a Vision-Audio-Subtitle-Text Omni-Modality Foundation Model. The project indicates a stride towards omni-modality with the resources available at its GitHub repository.

Nightshade's Transparency Troubles: The guild debates the effectiveness and transparency of Nightshade with a critical lens on autoencoder capabilities and reluctances in the publishing of potentially controversial findings.


OpenInterpreter Discord

Mac Muscle Meets Interpreter Might: Open Interpreter's New Computer Update has significantly improved local functionality, particularly with native Mac integrations. The implementation allows users to control Mac's native applications using simple commands such as interpreter --os, as detailed in their change log.

Eyes for AI: Community members highlighted the Moondream tiny vision language model, providing resources like the Img2TxtMoondream.py script. Discussions also featured LLaVA, a multimodal model hosted on Hugging Face, which is grounded in the powerful NousResearch/Nous-Hermes-2-Yi-34B model.

Loop Avoidance Lore: Engineers have been swapping strategies to mitigate looping behavior in local models, considering solutions ranging from tweaking temperature settings and prompt editing to more complex architectural changes. An intriguing concept, the frustration metric, was introduced to tailor a model's responses when stuck in repetitive loops.

Driving Dogs with Dialogue: A member inquired about the prospect of leveraging Open Interpreter for commanding the Unitree GO2 robodog, sparking interest in possible interdisciplinary applications. Technical challenges, such as setting dummy API keys and resolving namespace conflicts with Pydantic, were also tackled with shared solutions.

Firmware Finality: The Open Interpreter 0.2.5 New Computer Update has officially graduated from beta, including the fresh enhancements mentioned earlier. A query about the update's beta status led to an affirmative response after a version check.


OpenAccess AI Collective (axolotl) Discord

CEO's Nod to a Member's Tweet: A participant was excited about the CEO of Hugging Face acknowledging their tweet; network and recognition are alive in the community.

Tech Giants Jump Into Fine-tuning: With examples like Meditron, discussion on fine-tuning language models for specific uses is heating up, highlighting the promise for domain-specific improvements and hinting at an upcoming paper on continual pre-training.

Trouble in Transformer Town: An 'AttributeError' surfaced in transformers 4.40.0, tripping up a user, serving as a cautionary tale that even small updates can break workflows.

Mixing Math with Models: Despite some confusion, inquiries were made about integrating zzero3 with Fast Fourier Transform (fft); keep an eye out for this complex dance of algorithms.

Optimizer Hunt Heats Up: The FSDP (Fully Sharded Data Parallel) compatibility with optimizers remains a hot topic, with findings that AdamW and SGD are in the clear, while paged_adamw_8bit is not supporting FSDP offloading, leading to a quest for alternatives within the OpenAccess-AI-Collective/axolotl resources.


Cohere Discord

Upload Hiccups and Typographic Tangles: Users in the Cohere guild tackled issues with the Cohere Toolkit on Azure, pointing to the paper clip icon for uploads; despite this, problems persisted with the upload functionality going undiscovered. The Cohere typeface's licensing on GitHub provoked discussion; it's not under the MIT license and is slated for replacement.

Model Usage Must-Knows: Discussion clarified that Cohere's Command+ models are available with open weight access but not for commercial use, and the training data is not shared.

Search API Shift Suggestion: The guild mulled over the potential switch from Tavily to the Brave Search API for integrating with the Cohere-Toolkit, citing potential benefits in speed, cost, and accuracy in retrieval.

Toolkit Deployment Debates: Deployment complexities of the Cohere Toolkit on Azure were deliberated, where selecting a model deployment option is crucial and the API key is not needed. Conversely, local addition of tools faced issues with PDF uploads and sqlite3 version compatibility.

Critical Recall on 'Hit Piece': Heated discussions emerged over the criticism of a "hit piece" against Cohere, with dialogue focused on the responsibility of AI agents and their real-world actions. A push for critical accountability emerged, with members reinforcing the need to back up critiques with substantial claims.


tinygrad (George Hotz) Discord


Modular (Mojo 🔥) Discord

Strong Performer: Hermes 2.5 Edges Out Hermes 2: Enhanced with code instruction examples, Hermes 2.5 demonstrates superior performance across various benchmarks when compared to Hermes 2.

Security in the Limelight: Amidst sweeping software and feature releases by Modular, addressing security loopholes becomes critical, emphasizing protection against supply chain attacks like the XZ incident and the trend of open-source code prevalence in software development forecasted to hit 96% by 2024.

Quantum Complexity Through A Geometric Lens: Members discussed how the geometric concept of the amplituhedron could simplify quantum particle scattering amplitudes, with machine learning being suggested as a tool to decipher increased complexities in visualizing quantum states as systems scale.

All About Mojo: Dialogue around the Mojo Programming Language covered topics like assured memory cleanup by the OS, the variance between def and fn functions with examples found here, and the handling of mixed data type lists via Variant that requires improvement.

Moving Forward with Mojo: ModularBot flagged an issue filed on GitHub about Mojo, urged members to use issues for better tracking of concerns, for instance, about __copyinit__ semantics via GitHub Gist, and reported a cleaner update in code with more insertions than deletions, achieving better efficiency.


LangChain AI Discord

A Tricky Query for Anti-Trolling AI Design: A user proposed designing an anti-trolling AI and sought suggestions on how the system could effectively target online bullies.

Verbose SQL Headaches: Participants shared experiences with open-source models like Mistral and Llama3 generating overly verbose SQL responses and encountered an OutputParserException, with links to structured output support and examples of invoking SQL Agents.

RedisStore vs. Chat Memory: The community clarified the difference between stores and chat memory in the context of LangChain integrations, emphasizing the specific use of RedisStore for key-value storage and Redis Chat Message History for session-based chat persistence.

Techie Tutorial on Model Invocation: There was a discussion on the correct syntax when integrating prompts into LangChain models via JavaScript, with recommendations for using ChatPromptTemplate and pipe methods for chaining prompts.

Gemini 1.5 Access with a Caveat: Users discussed the integration of Gemini 1.5 Pro with LangChain, highlighting that it necessitates ChatVertexAI instead of ChatGoogleGenerativeAI and requires configuring the GOOGLE_APPLICATION_CREDENTIALS environment variable for proper access.


Latent Space Discord

Apple Bites the Open Source Apple: Apple has stepped into the open source realm, releasing a suite of models with parameters ranging from 270M to 3B, with the 270M parameter model available on Hugging Face.

Dify Platform Ups and Downs: The open-source LLM app development platform Dify is gaining traction for combining AI workflows and model management, although concerns have arisen about its lack of loops and context scopes.

PyTorch Pumps Up LLM Training: PyTorch has introduced Torchtitan, a library dedicated to aiding the training of substantial language models like llama3 from scratch.

Video Gen Innovation with SORA: OpenAI's SORA, a video generation model that crafts videos up to a minute long, is getting noticed, with user experiences and details explored in an FXGuide article.

MOD Layers for Efficient Transformer Training: The 'Mixture of Depths' paper was presented, proposing an accelerated training methodology for transformers by alternately using new MOD layers and traditional transformer layers, introduced in the presentation and detailed in the paper's abstract.


Mozilla AI Discord


DiscoResearch Discord

Llama Beats Judge in Judging: On the Judgemark benchmark, Llama-3-70b showcased impressive performance, demonstrating its potential for fine-tuning purposes in disco-judge applications, as it supports at least 8k context length. The community also touched on collaborative evaluation efforts, with references to advanced judging prompt design to assess complex rubrics.

Benchmarking Models and Discussing Inference Issues: Phi-3-mini-4k-instruct unexpectedly ranked lower on the eq-bench leaderboard despite promising scores in published evaluations. In model deployment, discussions highlighted issues like slow initialization and inference times for DiscoLM_German_7b_v1 and potential misconfigurations that could be remedied using device_map='auto'.

Tooling API Evaluation and Hugging Face Inquiries: Community debates highlighted Tgi for its API-first, low-latency approach and praised vllm for being a user-friendly library optimized for cost-efficiency in deployment. Queries on Hugging Face's batch generation capabilities sparked discussion, with community involvement evident in a GitHub issue exchange.

Gratitude and Speculation in Model Development: Despite deployment issues, members have expressed appreciation for the DiscoLM model series, while also speculating about the potential of constructing an 8 x phi-3 MoE model to bolster model capabilities. DiscoLM-70b was also a hot topic, with users troubleshooting errors and sharing usage experiences.

Success and Popularity in Model Adoption: The adaptation of the Phi-3-mini-4k model, referred to as llamafication, yielded a respectable EQ-Bench Score of 51.41 for German language outputs. Conversation also pinpointed the swift uptake of the gguf model, indicated by a notable number of downloads shortly after its release.


Interconnects (Nathan Lambert) Discord

Claude Displays Depth and Structure: In a rich discussion, the behavior and training of Claude were considered "mostly orthogonal" to Anthropic's vision, revealing unexpected depth and structural understanding through RLAIF training. Comparisons were made to concepts like "Jungian individuation" and conversation threads highlighted Claude's capabilities.

Debating the Merits of RLHF vs. KTO: A comparison between Reinforcement Learning from Human Feedback (RLHF) and Knowledge-Targeted Optimization (KTO) sparked debate, considering their suitability for different commercial deployments.

Training Method Transition Yields Improvements: An interview was mentioned where a progression in training methods from Supervised Fine Tuning (SFT) to Data Programming by Demonstration (DPO), and then to KTO, led to improved performance based on user feedback.

Unpacking the Complexity of RLHF: The community acknowledged the intricacies of RLHF, especially as they relate to varying data sources and their impact on downstream evaluation metrics.

Probing Grad Norm Spikes: A request for clarity on the implications of gradient norm spikes during pretraining was made, emphasizing the potential adverse effects but specifics were not delivered in the responses.


Skunkworks AI Discord

Moondream Takes On CAPTCHAs: A video guide showcases fine-tuning the Moondream Vision Language Model for better performance on a CAPTCHA image dataset, aimed at improving its image recognition capabilities for practical applications.

Low-Cost AI Models Make Cents: The document "Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation" was shared, covering evaluations of CPU-friendly language models and introducing a novel dataset with 60 programming problems. The use of a Chain-of-Thought prompt strategy is highlighted in the survey article.

Meet, Greet, and Compute: AI developers are invited to a meetup at Cohere space in Toronto, which promises networking opportunities alongside lightning talks and demos — details available on the event page.

Arctic Winds Blow for Enterprises: Snowflake Arctic is introduced via a new video, positioning itself as a cost-effective, enterprise-ready Large Language Model to complement the suite of AI tools tailored for business applications.


Datasette - LLM (@SimonW) Discord


Alignment Lab AI Discord


The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (1265 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (18 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (86 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (10 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #suggestions (7 messages):


Perplexity AI ▷ #announcements (1 messages):


Perplexity AI ▷ #general (531 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (8 messages🔥):


Perplexity AI ▷ #pplx-api (10 messages🔥):

Link mentioned: Supported Models: no description found


CUDA MODE ▷ #general (13 messages🔥):


CUDA MODE ▷ #cuda (40 messages🔥):

Links mentioned:


CUDA MODE ▷ #torch (9 messages🔥):

Link mentioned: CUDA semantics — PyTorch 2.3 documentation: no description found


CUDA MODE ▷ #announcements (1 messages):


CUDA MODE ▷ #algorithms (47 messages🔥):

Link mentioned: GitHub - catid/bitnet_cpu: Experiments with BitNet inference on CPU: Experiments with BitNet inference on CPU. Contribute to catid/bitnet_cpu development by creating an account on GitHub.


CUDA MODE ▷ #beginner (6 messages):


CUDA MODE ▷ #pmpp-book (5 messages):


CUDA MODE ▷ #youtube-recordings (1 messages):

poker6345: ppt can be shared


CUDA MODE ▷ #torchao (2 messages):

Links mentioned:


CUDA MODE ▷ #ring-attention (1 messages):

iron_bound: https://www.harmdevries.com/post/context-length/


CUDA MODE ▷ #off-topic (1 messages):

iron_bound: https://github.com/adam-maj/tiny-gpu


CUDA MODE ▷ #llmdotc (377 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #massively-parallel-crew (25 messages🔥):

Link mentioned: Existential Audio - BlackHole: no description found


LM Studio ▷ #💬-general (218 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (75 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (7 messages):


LM Studio ▷ #🎛-hardware-discussion (64 messages🔥🔥):

Links mentioned:


LM Studio ▷ #amd-rocm-tech-preview (30 messages🔥):


Nous Research AI ▷ #ctx-length-research (22 messages🔥):

Links mentioned:


Nous Research AI ▷ #off-topic (4 messages):


Nous Research AI ▷ #interesting-links (7 messages):

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):

<ul>
  <li><strong>Announcements Channel Upgrade</strong>: The "Announcements" channel has evolved! It can now be followed and integrated into other servers for streamlined updates.</li>
</ul>

Nous Research AI ▷ #general (212 messages🔥🔥):

<ul>
  <li><strong>Discussing Context Window Expansion:</strong> Members are intrigued by works on language model context window expansion, referencing models with over 8k tokens context, and highlighting the possibilities of extending models into the tens of millions of tokens using techniques such as <strong>PoSE (Positional Space Encoding)</strong> and ring attention.</li>
  <li><strong>Authorization of the AI Safety and Security Board:</strong> A tweet from Andrew Curran (@AndrewCurran_) sparked discussions with the announcement of the AI Safety and Security Board by the Department of Homeland Security, prompting mixed reactions.</li>
  <li><strong>WizardLM and Microsoft’s Model Removals:</strong> Speculations arose when Microsoft's WizardLM repo vanished, with some pointing towards a strategic move by Microsoft in response to its investments in OpenAI and products outperforming their offerings. Members share concerns and emphasize the value of creating archives or backups for such repositories.</li>
  <li><strong>AI Dialogue Systems:</strong> There's a mention of using GPT to generate dialogue and create high-quality training data through "Heated discussion between professors and student." These role-playing dialogues can lead to better question generation or more accurate answers.</li>
  <li><strong>LLMs Frontend Choices:</strong> Multiple tools and interfaces for working with Large Language Models are brought up, including <strong>Librechat, Lm studio, and OpenRouter</strong>. Members seem to be exploring various options for the best tool fit.</li>
</ul>

Links mentioned:


Nous Research AI ▷ #ask-about-llms (50 messages🔥):

Links mentioned:


Nous Research AI ▷ #project-obsidian (1 messages):

deoxykev: Does anybody know of work extending moondream’s input size?


Nous Research AI ▷ #rag-dataset (3 messages):

Links mentioned:


Nous Research AI ▷ #world-sim (68 messages🔥🔥):

Links mentioned:


OpenAI ▷ #ai-discussions (170 messages🔥🔥):

Link mentioned: apple/OpenELM · Hugging Face: no description found


OpenAI ▷ #gpt-4-discussions (42 messages🔥):


OpenAI ▷ #prompt-engineering (20 messages🔥):


OpenAI ▷ #api-discussions (20 messages🔥):


Stability.ai (Stable Diffusion) ▷ #general-chat (246 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #general (208 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (3 messages):


HuggingFace ▷ #cool-finds (8 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (9 messages🔥):

Links mentioned:


HuggingFace ▷ #computer-vision (2 messages):


HuggingFace ▷ #NLP (6 messages):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (6 messages):


HuggingFace ▷ #gradio-announcements (1 messages):

<ul>
  <li><strong>Gradio Bolsters Custom Component Capabilities</strong>: Version 4.28.0 of Gradio introduces significant enhancements for custom components, including Tailwind styling, support for any vite plugin and preprocessors, and a refined custom component CLI that utilizes the vanilla Gradio SDK in spaces.</li>
  <li><strong>Streamlined Development and New Features</strong>: Additional features accompany the custom components upgrade, such as setting a maximum upload size, persistent reloads in dev mode to maintain front-end state, and a re-organized documentation to better represent the Gradio ecosystem.</li>
  <li><strong>Comprehensive Release with More Improvements</strong>: This is just a highlight of the update; more details can be found in the full changelog available on the Gradio website.</li>
</ul>

Eleuther ▷ #general (52 messages🔥):

Links mentioned:


Eleuther ▷ #research (144 messages🔥🔥):

Links mentioned:


Eleuther ▷ #interpretability-general (1 messages):

main.ai: https://twitter.com/sen_r/status/1783497788120248431


Eleuther ▷ #lm-thunderdome (21 messages🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Link mentioned: Lynn: Llama 3 Soliloquy 8B by lynn | OpenRouter: Soliloquy-L3 is a fast, highly capable roleplaying model designed for immersive, dynamic experiences. Trained on over 250 million tokens of roleplaying data, Soliloquy-L3 has a vast knowledge base, ri...


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):


OpenRouter (Alex Atallah) ▷ #general (215 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #blog (4 messages):


LlamaIndex ▷ #general (117 messages🔥🔥):

Links mentioned:


LAION ▷ #general (78 messages🔥🔥):

Links mentioned:


LAION ▷ #research (12 messages🔥):

Links mentioned:


OpenInterpreter ▷ #general (70 messages🔥🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (2 messages):


OpenInterpreter ▷ #ai-content (1 messages):

8i8__papillon__8i8d1tyr: https://www.youtube.com/watch?v=WeH3h-o1BgQ


OpenAccess AI Collective (axolotl) ▷ #general (56 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (3 messages):


OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages):


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (10 messages🔥):

Links mentioned:


Cohere ▷ #general (63 messages🔥🔥):

Links mentioned:


Cohere ▷ #project-sharing (6 messages):


tinygrad (George Hotz) ▷ #general (62 messages🔥🔥):

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (6 messages):

Link mentioned: tinygrad docs: no description found


Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1783575774085410911


Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):

Link mentioned: Modular: Preventing supply chain attacks at Modular: We are building a next-generation AI developer platform for the world. Check out our latest post: Preventing supply chain attacks at Modular


Modular (Mojo 🔥) ▷ #ai (2 messages):


Modular (Mojo 🔥) ▷ #🔥mojo (36 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #community-blogs-vids (5 messages):

Links mentioned:


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (6 messages):

Links mentioned:


Modular (Mojo 🔥) ▷ #🏎engine (3 messages):


Modular (Mojo 🔥) ▷ #nightly (11 messages🔥):


LangChain AI ▷ #general (44 messages🔥):

Links mentioned:


LangChain AI ▷ #langchain-templates (1 messages):


LangChain AI ▷ #share-your-work (5 messages):

Links mentioned:


Latent Space ▷ #ai-general-chat (35 messages🔥):

Links mentioned:


Latent Space ▷ #llm-paper-club-west (12 messages🔥):

Links mentioned:


Latent Space ▷ #ai-in-action-club (1 messages):


Mozilla AI ▷ #llamafile (40 messages🔥):

Links mentioned:


DiscoResearch ▷ #disco_judge (6 messages):

Links mentioned:


DiscoResearch ▷ #general (4 messages):

<ul>
  <li><strong>API Focus and Library Ease Discussed:</strong> Tgi is presented as API-first and prioritizes low latency, while vllm is acclaimed for being an easy-to-use library, emphasizing cost-effective and high-throughput deployment.</li>
  <li><strong>Batch Generation Inquiry at Hugging Face:</strong> A debate finds its way to Hugging Face regarding batch generation capabilities <a href="https://github.com/huggingface/text-generation-inference/issues/1008#issuecomment-1742588516"><strong>GitHub Issue #1008</strong></a>, revealing community-driven problem-solving.</li>
  <li><strong>DiscoLM Inference Speed Woes:</strong> A member reports slow initialization and inference times for DiscoLM_German_7b_v1 on a high-performance computing system, contrasting with much faster times on a local setup without GPUs.</li>
  <li><strong>Potential Misconfiguration in DiscoLM:</strong> Another member suggests ensuring correct model loading with <code>device_map='auto'</code>, expecting a significant speed improvement when using 2x V100 GPUs for inference.</li>
</ul>

Link mentioned: Batch generate? · Issue #1008 · huggingface/text-generation-inference: System Info Hi, i like to ask if it is possible to do batch generation? client = Client("http://127.0.0.1:8081",timeout = 60) gen_t = client.generate(batch_text,max_new_tokens=64) generate c...


DiscoResearch ▷ #discolm_german (7 messages):


Interconnects (Nathan Lambert) ▷ #ml-questions (12 messages🔥):

Link mentioned: Tweet from j⧉nus (@repligate): definitely have no doubt there are various ways to do RL/generation-discrimination/synthetic data/self-play-esque training on top of teacher-forcing that makes the models smarter, but especially more ...


Interconnects (Nathan Lambert) ▷ #random (4 messages):


Skunkworks AI ▷ #general (1 messages):

Link mentioned: Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation: Large Language Models (LLMs) have become the go-to solution for many Natural Language Processing (NLP) tasks due to their ability to tackle various problems and produce high-quality results. Specifica...


Skunkworks AI ▷ #off-topic (4 messages):

Links mentioned:


Datasette - LLM (@SimonW) ▷ #ai (2 messages):

Link mentioned: apple/OpenELM · Hugging Face: no description found


Alignment Lab AI ▷ #general-chat (1 messages):

venadore: trying to get llama 3 to do topic complexity classification, not half bad