Frozen AI News archive

Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM

**Snowflake Arctic** is a notable new foundation language model released under Apache 2.0, claiming superiority over **Databricks** in data warehouse AI applications and adopting a mixture-of-experts architecture inspired by **DeepSeekMOE** and **DeepSpeedMOE**. The model employs a 3-stage curriculum training strategy similar to the recent **Phi-3** paper. In AI image and video generation, **Nvidia** introduced the **Align Your Steps** technique improving image quality at low step counts, while **Stable Diffusion 3** and **SD3 Turbo** models were compared for prompt understanding and image quality. **Adobe** launched an AI video upscaling project enhancing blurry videos to HD, though with some high-resolution artifacts. **Apple** released open-source on-device language models with code and training logs, diverging from typical weight-only releases. The **Llama-3-70b** model ties for first place on the LMSYS leaderboard for English queries, and **Phi-3** (4B params) outperforms **GPT-3.5 Turbo** in the banana logic benchmark. Fast inference and quantization of **Llama 3** models were demonstrated on MacBook devices.

Canonical issue URL

This one takes a bit of parsing but is a very laudable effort from Snowflake, which til date has been fairly quiet in the modern AI wave. Snowflake Arctic is notable for a few reasons, but probably not the confusing/unrelatable chart they chose to feature above the fold:

image.png

"Enterprise Intelligence" one could warm to, esp if it explains why they have chosen to do better on some domains than others:

image.png

What this chart really shows in not very subtle ways is that Snowflake is basically claiming to have built an LLM that is better in almost every way to Databricks, their main rival in the data warehouse wars. (This has got to smell offensive to Jon Frankle and his merry band of Mosaics?)

Downstream users don't care that much about training efficiency, but the other thing that should catch your eye is the model architecture - taking the right cue from DeepSeekMOE and DeepSpeedMOE) with more experts = better:

image.png

No mention is made of the "shared expert" trick that DeepSeek used.

Finally there's mention of a 3 stage curriculum:

image.png

which echoes a similar strategy seen in the recent Phi-3 paper:

image.png

Finally, the model is released as Apache 2.0.

Honestly a great release, with perhaps the only poor decision being that the Snowflake Arctic cookbook is being published on Medium dot com.


Table of Contents

[TOC]


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Image/Video Generation

Other Image/Video AI

Language Models and Chatbots

AI Hardware and Infrastructure

AI Ethics and Societal Impact

Humor/Memes


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

OpenAI and NVIDIA Partnership

Llama 3 and Phi 3 Models

Snowflake Arctic Model

Retrieval Augmented Generation (RAG) and Long Context

AI Development Tools and Applications

Industry News


AI Discord Recap

A summary of Summaries of Summaries

1. Llama 3 and Phi-3 Releases Spark Excitement and Comparisons: The release of Meta's Llama 3 (8B and 70B variants) and Microsoft's Phi-3 models generated significant buzz, with discussions comparing their performance, architectures like RoPE, and capabilities like Phi-3's function_call tokens. Llama 3's impressive scores on benchmarks like MMLU and Human Eval were highlighted.

2. Advancements in RAG Frameworks and Multimodal Models: Improvements to Retrieval-Augmented Generation (RAG) frameworks using LangChain's LangGraph were discussed, featuring techniques like Adaptive Routing and Corrective Fallback. The release of Apple's OpenELM-270M and interest in models like moondream for multimodal tasks were also covered.

3. Open-Source Tooling and Model Deployment: The open-sourcing of Cohere's Toolkit for building RAG applications was welcomed, while Datasette's LLM Python API usage for text embedding was explored. Discussions on batching prompts efficiently involved tools like vLLM, TGI, and llm-swarm.

4. Specialized Models and Niche Applications: The medical Internist.ai 7b model's impressive performance, even surpassing GPT-4 in evaluations, generated excitement. Unique projects like the AI-powered text RPG Brewed Rebellion and the 01 project for embedding AI into devices were also showcased.


PART 1: High level Discord summaries

Unsloth AI (Daniel Han) Discord


LM Studio Discord

Phi and TinyLLamas Take the Spotlight: Members have been experimenting with phi-3 in LM Studio, using models like PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed to navigate quantization differences, with Q4 outshining Q2 in text generation. Meanwhile, a suite of TinyLlamas models has garnered attention on Hugging Face, presenting opportunities to play with Mini-MOE models of 1B to 2B range, and the community is abuzz with the rollout of Apple's OpenELM, despite its token limitations.

Navigating GPU Waters: GPU topics were center stage, with discussions on VRAM-intensive phi-3-mini-128k models, strategies for avoiding errors like "(Exit code: 42)" by upgrading to LM Studio v0.2.21, and addressing GPU offload errors. Further, technical advice flowed freely, recommending Nvidia GPUs for AI applications despite some members' qualms with the brand, and a nod towards 32GB RAM upgrades for robust LLM experimentation.

Tech Tangles in ROCm Realm: The AMD and NVIDIA mixed GPU environment provoked errors with ROCm installs, with temp fixes including removing NVIDIA drivers. However, heyitsyorkie underscored that ROCm within LM Studio is still in tech preview, signaling expected bumps. Wisdom from the community suggested solutions like driver updates, for instance, Adrenalin 24.3.1 for an rx7600, to iron out compatibility and performance concerns.

Mac Mileage Varies for LLMs: Mac users chimed in, suggesting that a minimum of 16GB RAM is ideal for running LLMs smoothly, although the M1 chip on an 8GB RAM setup can handle smaller models if not overloaded with parallel tasks.

Local Server Lore: Strategy-sharing for accessing LM Studio's local servers highlighted the use of Mashnet for remote operations and the potential role of Cloudflare in facilitating connections, updating the tried-and-true "localhost:port" setup.


Perplexity AI Discord


Nous Research AI Discord

Bold RoPE Discussions: The community debated the capabilities of Rotary Position Embedding (RoPE) in models like Meta's Llama 3, including its effectiveness in fine-tuning versus pretraining and misconceptions about its ability to generalize in longer contexts. The paper on "Scaling Laws of RoPE-based Extrapolation" (arXiv:2310.05209) sparked conversations on scaling RoPE and the challenges of avoiding catastrophic forgetting with increased RoPE base.

AutoCompressors Enter the Ring: A new preprint on AutoCompressors presented a way for transformers to manage up to 30,720 tokens and improve perplexity (arXiv:2305.14788). Jeremy Howard's thoughts on Llama 3 and its finetuning strategies echoed through the guild (Answer.AI post), and a Twitter thread unveiled its successful context extension to 96k using advanced methods (Twitter Thread).

LLM Education and Holographic Apple Leans Out: The guild discussed a game aimed at instructing about LLM prompt injections (Discord Invite). In hardware inklings, Apple reportedly reduced its Vision Pro shipments by 50% and is reassessing their headset strategy, sparking speculation about the 2025 lineup (Tweet by @SawyerMerritt).

Snowflake's Hybrid Model and Model Conversation: Snowflake Arctic 480B's launch of a unique Dense + Hybrid model led to analytical banter over its architecture choices, with a nod to its attention sinks designed for context scaling. Meanwhile, GPT-3 dynamics under discussion led to skepticism regarding whether it actually runs OpenAI's Rabbit R1.

Pydantic Models for Credible Citations: Pydantic models garnished with validators were touted as a way to ensure proper citations in LLM contexts; the discussion referenced several GitHub repositories (GitHub - argilla-io/distilabel) and tools like lm-format-enforcer for maintaining credible responses.

Stream Crafting with WorldSim: Guildmates swapped experiences with WorldSim and suggested the potential for Twitch streaming shared world simulations. They also shared a custom character tree (Twitter post) and conversed about the application of category theory involving types ontology and morphisms (Tai-Danae Bradley’s work).


CUDA MODE Discord

PyTorch 2.3: Triton and Tensor Parallelism Take Center Stage: PyTorch 2.3 enhances support for user-defined Triton kernels and improves Tensor Parallelism for training Large Language Models (LLMs) up to 100 billion parameters, all validated by 426 contributors (PyTorch 2.3 Release Notes).

Pre-Throttle GPU Ponders During Power Plays: Engaging discussions occurred around GPU power-throttling architectures like those of A100 and H100, with anticipations around the B100's design possibly affecting computational efficiency and power dynamics.

CUDA Dwellers Uncover Room for Kernel Refinements: Members shared strategies for optimizing CUDA kernels, including the avoidance of atomicAdd and capitalizing on warp execution advancements post-Volta, which allow threads in a warp to execute diverse instructions.

Accelerated Plenoxels Poses as a CUDA-Sharpened NeRF: Enthusiasm was directed towards Plenoxels for its efficient CUDA implementation of NeRF, as well as expressions of interest in GPU-accelerated SLAM techniques and optimization for kernels targeting attention mechanisms in deep learning models.

PyTorch CUDA Strides, Flash-Attention Quirks, and Memory Management: Source code indicating a memory-efficient handling of tensor multiplications touched upon similarity with COO matrix representation. It also highlighted a potential issue regarding Triton kernel crashes when trying to access expanded tensor indices outside their original range.


Eleuther Discord

Pre-LayerNorm Debate: An engineer highlighted an analysis that pre-layernorm may hinder the deletion of information in a residual stream, possibly leading to norm increases with successive layers.

Tokenizer Version Tussle: Changes between Huggingface tokenizer versions 0.13 and 0.14 are causing inconsistencies, resulting in a token misalignment during model inference, raising concern among members working on NeoX.

Poetry's Packaging Conundrum: After a failed attempt to utilize Poetry for package management in NeoX development due to its troublesome binary and version management, the member decided it was too complex to implement.

Chinchilla's Confidence Quandary: A community member questioned the accuracy of the confidence interval in the Chinchilla paper, suspecting an oversampling of small transformers and debating the correct cutoff for stable estimates.

Mega Recommender Revelations: Facebook has published about a 1.5 trillion parameter HSTU-based Generative Recommender system, which members highlighted for its performance improvement by 12.4% and potential implications. Here is the paper.

Penzai's Puzzling Practices: Users find penzai's usage non-intuitive, sharing workarounds and practical examples for working with named tensors. Discussion includes using untag+tag methods and the function pz.nx.nmap for tag manipulation.

Evaluating Large Models: A user working on a custom task reported high perplexity and is seeking advice on the CrossEntropyLoss implementation, while another discussion arose over the num_fewshot settings for benchmarks to match the Hugging Face leaderboard.


Stability.ai (Stable Diffusion) Discord


OpenRouter (Alex Atallah) Discord

Mixtral 8x7b Blank Response Crisis: The Mixtral 8x7b service experienced an issue with blank responses, leading to the temporary removal of a major provider and planning for future auto-detection capabilities.

Model Supremacy Debates Rage On: In discussions, members compared smaller AI models like Phi-3 to larger ones such as Wizard LM and reported that FireFunction from Fireworks (Using function calling models) might be a better alternative due to OpenRouter's challenges in function calling and adhering to 'stop' parameters.

Time-Outs in The Stream: Various users reported an overflow of "OPENROUTER PROCESSING" notifications designed to maintain active connections, alongside issues with completion requests timing out with OpenAI's GPT-3.5 Turbo on OpenRouter.

The Quest for Localized AI Business Expansion: A member’s search for direct contact information signaled an interest in establishing closer business connections for AI models in China.

Language Barriers in AI Discussions: AI Engineers compared language handling across AI models such as GPT-4, Claude 3 Opus, and L3 70B, noting particularly that GPT-4's performance in Russian left something to be desired.


HuggingFace Discord

Llama 3 Leapfrogs into Lead: The new Llama 3 language model has been introduced, trained on a whopping 15T tokens and fine-tuned with 10M human annotated samples. It offers 8B and 70B variants, scoring over 80 on the MMLU benchmark and showcasing impressive coding capabilities with a Human Eval score of 62.2 for the 8B model and 81.7 for the 70B model; find out more through Demo and Blogpost.

Phi-3: Mobile Model Marvel: Microsoft's Phi-3 Instruct model variants gain attention for their compact size (4k and 128k contexts) and their superior performance over other models such as Mistral 7B and Llama 3 8B Instruct on standard benchmarks. Notably designed for mobile use, Phi-3 features 'function_call' tokens and demonstrates advanced capabilities; learn more and test them out via Demo and AutoTrain Finetuning.

OpenELM-270M and RAG Refreshment: Apple's OpenELM-270M model is making a splash on HuggingFace, along with advancements in the Retrieval-Augmented Generation (RAG) framework, which now includes Adaptive Routing and Corrective Fallback features using Langchain's LangGraph. These and other conversations signify continued innovation in the AI space; details on RAG enhancements are found here, and Apple's OpenELM-270M is available here.

Batching Discussions Heat Up: The necessity for efficient batching during model inference spurred interest among the community. Aphrodite, tgi, and other libraries are recommended for superior batching speeds, with reports of success using arrays for concurrent prompt processing, suggesting arrays could be used like prompt = ["prompt1", "prompt2"].

Trouble with Virtual Environments: A member's challenges with setting up Python virtual environments on Windows sparked discussions and advice. The recommended commands for Windows are python3 -m venv venv followed by venv\Scripts\activate, with the suggestion to try WSL for improved performance.


LlamaIndex Discord

Trees of Thought: The development of LLMs with tree search planning capabilities could bring significant advancements to agentic systems, as disclosed in a tweet by LlamaIndex. This marks a leap from sequential state planning, suggesting potential strides in AI decision-making models.

Watching Knowledge Dance: A new dynamic knowledge graph tool developed using the Vercel AI SDK can stream updates and was demonstrated by a post that can be seen on the official Twitter. This visual technology could be a game-changer for real-time data representation.

Hello, Seoul: The introduction of the LlamaIndex Korean Community is expected to foster knowledge sharing and collaborations within the Korean tech scene, as announced in a tweet.

Boosting Chatbot Interactivity: Enhancements to chatbot User Interfaces using create-llama have emerged, allowing for expanded source information components and promising a more intuitive chat experience, with credits to @MarcusSchiesser and mentioned in a tweet.

Embeddings Made Easy: A complete tutorial on constructing a high-quality RAG application combining LlamaParse, JinaAI_ embeddings, and Mixtral 8x7b is now available and can be accessed through LlamaIndex's Twitter feed. This guide could be key for engineers looking to parse, encode, and store embeddings effectively.

Advanced RAG Rigor: In-depth learning is needed for configuring advanced RAG pipelines, with suggestions like sentence-window retrieval and auto-merging retrieval being considered for tackling complex question structures, as pointed out with an instructional resource.

VectorStoreIndex Conundrum: Confusion about embeddings and LLM model selection for a VectorStoreIndex was clarified; gpt-3.5-turbo and text-embedding-ada-002 are the defaults unless overridden in Settings, as stated in various discussions.

Pydantic Puzzles: Integration of Pydantic with LlamaIndex encountered hurdles with structuring outputs and Pyright's dissatisfaction with dynamic imports. The discussions haven't concluded with an alternative to # type:ignore yet.

Request for Enhanced Docs: Requests were made for more transparent documentation on setting up advanced RAG pipelines and configuring LLMs like GPT-4 in LlamaIndex, with a reference made to altering global settings or passing custom models directly to the query engine.


OpenAI Discord

AI Hunts for True Understanding: A debate centered on whether AI can achieve true understanding, with the Turing completeness of autoregressive models like Transformers being a key point. The confluence of logic's syntax and semantics was considered as potential enabler for meaning-driven operations by the model.

From Syntax to Semantics: Conversations revolved around the evolution of language in the AI landscape, forecasting the emergence of new concepts to improve clarity for future communication. The limitations of language’s lossy nature on accurately expressing ideas were also highlighted.

Apple's Pivot to Open Source?: Excitement and speculation surrounded Apple's OpenELM, an efficient, open-source language model introduced by Apple, stirring discussions on the potential impact on the company's traditionally proprietary approach to AI technology and the broader trend towards openness.

Communication, Meet AI: Members highlighted the importance of effective flow control in AI-mediated communication, exploring technologies like voice-to-text and custom wake words. Discussing the interplay between AI and communication highlighted the need for mechanisms for interruption and recovery in virtual assistant interactions.

RPG Gaming with an AI Twist: The AI-powered text RPG Brewed Rebellion was shared, illustrating the growing trend of integrating AI into interactive gaming experiences, particularly in narrative scenarios like navigating internal politics within a corporation.

Engineering Better AI Behavior: Engineers shared tips on prompt crafting, emphasizing the use of positive examples for better results and pointing out that negative instructions often fail to rein in creative outputs from AI like GPT.

AI Coding Challenges in Gaming and Beyond: Challenges abound when prompting GPT for language-specific coding assistance, as raised by an engineer working on SQF language for Arma 3. Issues such as the model's pretraining biases and limited context space were discussed, sparking recommendations for alternative models or toolchains.

Dynamic AI Updates and Capabilities: Queries on AI updates and capabilities surfaced, including how to create a GPT expert in Apple Playgrounds and whether new GPT versions could rival the likes of Claude 3. Additionally, the utility of GPT's built-in browser versus dedicated options like Perplexity AI Pro and You Pro was contrasted, and anticipation for models with larger context windows was noted.


LAION Discord


OpenAccess AI Collective (axolotl) Discord

Bold Llama Ascends New Heights: Discussions captivated participants as Llama-3 has the potential to scale up to a colossal 128k size, with the blend of Tuning and augmented training. Interest also percolates around Llama 3's pretrained learning rate, speculating an infinite LR schedule might be in the works to accompany upcoming model variants.

Snowflake's New Release Causes Flurry of Excitement: The Snowflake 408B Dense + Hybrid MoE model made waves, flaunting a 4K context window and Apache 2.0 licensing. This generated animated conversations on its intrinsic capabilities and how it could synergize with Deepspeed.

Medical AI Takes A Healthy Leap Forward: The Internist.ai 7b model, meticulously designed by medical professionals, reportedly outshines GPT-3.5, even scoring well on the USMLE examination. It spurs on the conversation about the promise of specialized AI models, captivated by its performance and the audacious idea that it outperforms numerous other 7b models.

Crosshairs on Dataset and Model Training Tangles: Technical discussions dove into the practicalities of Hugging Face datasets, optimizing data usage, and the compatible interplay between optimizers and Fully Sharded Data Parallel (FSDP) setups. On the same thread, members experienced turbulence with fsdp when it comes to dequantization and full fine tunes, indicative of deeper compatibility and system issues.

ChatML's New Line Quirk Raises Eyebrows: Participants identified a glitch in ChatML and possibly FastChat concerning erratic new line and space insertion. The issue throws a spotlight on the importance of refined token configurations, as it could skew training outcomes for AI models.


tinygrad (George Hotz) Discord


Modular (Mojo 🔥) Discord

Bold Moves in Benchmarking: The engineering community awaits Mojo's performance benchmarks, comparing its prowess against languages like Rust and Python amidst skepticism from Rust enthusiasts. Lobsters carries a heated debate on Mojo's claims of being safer and faster, which is central to Mojo's narrative in tech circles.

Quantum Conundrums and ML Solutions: Quantum computing discussions touched on the nuances of quantum randomness with mentions of the Many-Worlds and Copenhagen interpretations. There's a buzz about harnessing geometric principles and ML in quantum algorithms to handle qubit complexity and improve calculation efficiency.

Patching Up Mojo Nightly Builds: The Mojo community logs a null string bug in GitHub (#239two) and enjoys a fresh nightly compiler release with improved overloading for function arguments. Simultaneously, SIMD's adaptation to EqualityComparable reveals both pros and cons, sparking a search for more efficient stdlib types.

Securing Software Supply Chains: Modular's blogspot highlights the security protocols in place for Mojo's safe software delivery in light of the XZ supply chain attack. With secure transport and signing systems like SSL/TLS and GPG, Modular puts a firm foot forward in protecting its evolving software ecosystem.

Discord Community Foresees Swag and Syntax Swaps: Mojo's developer community enjoys a light-hearted suggestion for naming variables and anticipates future official swag; meanwhile, API development sparks discussions on performance and memory management. The MAX engine query redirects to specific channels, ensuring streamlined communication.


Latent Space Discord

A New Angle on Transformers: Engineers discussed enhancing transformer models by incorporating inputs from intermediate attention layers, paralleling the Pyramid network approach in CNN architectures. This tactic could potentially lead to improvements in context-aware processing and information extraction.

Ethical Tussle over 'TherapistAI': Controversy arose over leveIsio's TherapistAI, with debates highlighting concerns about AI posing as a replacement for human therapists. This sparked discussions on responsible representations of AI capabilities and ethical implications.

Search for Semantic Search APIs: Participants reviewed several semantic search APIs; however, options like Omnisearch.ai fell short in web news scanning effectiveness compared to traditional tools like newsapi.org. This points to a gap in the current offerings of semantic search solutions.

France Bets on AI in Governance: Talks revolved around France's experimental integration of Large Language Models (LLMs) into its public sector, noting the country's forward-looking stance. Discussions also touched upon broader themes such as interaction of technology with the sociopolitical landscape.

Venturing Through Possible AI Winters: Members debated the sustainability of AI venture funding, spurred by a tweet concerning the ramifications of a bursting AI bubble. The conversations involved speculations on the impact of economic changes on AI research and venture prospects.


LangChain AI Discord

LangChain AI Fires Up Chatbot Quest: Discussions centered around utilizing pgvector stores with LangChain for enhancing chatbot performance, including step-by-step guidance and specific methods like max_marginal_relevance_search_by_vector. Members also fleshed out the mechanics behind SelfQueryRetriever and strategized on building conversational AI graphs with methods like createStuffDocumentsChain. The LangChain GitHub repository is pointed out as a resource along with the official LangChain documentation.

Template Woes for the Newly Hatched LLaMA-3: One member sought advice on prompt templates for LLaMA-3, citing gaps in the official documentation, reflecting the collective effort to catch up with the latest model releases.

Sharing AI Narratives and Tools: The community showcased several projects: the adaptation of RAG frameworks using LangChain's LangGraph, an article of which is available on Medium; a union-centric, text-based RPG "Brewed Rebellion," playable here; "Collate", a service for transforming saved articles into digest newsletters available at collate.one; and BlogIQ, a content creation helper for bloggers found on GitHub.

Training Day: Embeddings Faceoff**: AI practitioners looking to sharpen their knowledge on embedding models could turn to an educational YouTube video shared by a member, aimed at demystifying the best tools in the trade.


Cohere Discord


OpenInterpreter Discord


Interconnects (Nathan Lambert) Discord

Blind Test Ring Welcomes Phi-3-128K: Phi-3-128K has been ushered into blind testing, with strategic interaction initiations like "who are you" and mechanisms like LMSys preventing the model's name disclosure to maintain blind test integrity.

Instruction Tuning Remains a Hotbed: Despite the rise of numerous benchmarks for assessing large language models, such as LMentry, M2C, and IFEval, the community still holds strong opinions about the lasting relevance of instruction-following evaluations, highlighted in Sebastian Ruder's newsletter.

Open-Source Movements Spice Up AI: The open-sourcing of Cohere's chat interface drew attention and can be found on GitHub, which led to humorous side chats including jokes about Nathan Lambert's perceived influence in the AI space and musings over industry players' opaque motives.

AI Pioneers Shun Corporate Jargon: The term "pick your brain" faced disdain within the community, emphasizing the discomfort of industry experts in being approached with corporate cliches during peak times of innovation.

SnailBot Notifies with Caution: The deployment of SnailBot prompted discussions around notification etiquette, while access troubles with the "Reward is Enough" publication sparked troubleshooting conversations, highlighting the necessity of hassle-free access to scientific resources.


Mozilla AI Discord


DiscoResearch Discord

Batch Your Bots: Discord users investigated how to batch prompts efficiently in Local Mixtral and compared tools like vLLM and the open-sourced TGI. While some preferred using TGI as an API server for its low latency, others highlighted the high throughput and direct Python usage that comes with vLLM in local Python mode, with resources like llm-swarm suggested for scalable endpoint management.

Dive into Deutsch with DiscoLM: Interaction with DiscoLM in German sparked discussions about prompt nuances, such as using "du" versus "Sie", and how to implement text summarization constraints like word counts. Members also reported challenges with model outputs and expressed interest in sharing quantifications for experimental models, especially in light of the high benchmarks scored by models like Phi-3 on tests like Ger-RAG-eval.

Grappled Greetings: Users debated the formality in prompting language models, acknowledging the variable impact on responses when initiating with formal or informal forms in German.

Summarization Snafus: The struggle is real when trying to cap off model-generated text at a specific word or character limit without abrupt endings. The conversation mirrored the common desire for fine-tuned control over output.

Classify with Confidence: Arousing community enthusiasm was the possibility of implementing a classification mode for live inference in models to match the praised benchmark performance.


Datasette - LLM (@SimonW) Discord


Skunkworks AI Discord


LLM Perf Enthusiasts AI Discord


Alignment Lab AI Discord


AI21 Labs (Jamba) Discord


PART 2: Detailed by-Channel summaries and links

Unsloth AI (Daniel Han) ▷ #general (774 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (13 messages🔥):

Link mentioned: No more Fine-Tuning: Unsupervised ICL+: A new Paradigm of AI, Unsupervised In-Context Learning (ICL) of Large Language Models (LLM). Advanced In-Context Learning for new LLMs w/ 1 Mio token contex...


Unsloth AI (Daniel Han) ▷ #help (186 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (3 messages):

Link mentioned: AI Unplugged 8: Llama3, Phi-3, Training LLMs at Home ft DoRA.: Insights over Information


Unsloth AI (Daniel Han) ▷ #suggestions (75 messages🔥🔥):

Link mentioned: Fix: loading models with resized vocabulary by oKatanaaa · Pull Request #377 · unslothai/unsloth: This PR is intended to address the issue of loading models with resized vocabulary in Unsloth. At the moment loading models with resized vocab fails because of tensor shapes mismatch. The fix is pl...


LM Studio ▷ #💬-general (298 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (73 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (9 messages🔥):


LM Studio ▷ #🎛-hardware-discussion (112 messages🔥🔥):


LM Studio ▷ #langchain (1 messages):

vic49.: Yeah, dm me if you want to know how.


LM Studio ▷ #amd-rocm-tech-preview (56 messages🔥🔥):


Perplexity AI ▷ #announcements (2 messages):


Perplexity AI ▷ #general (467 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (8 messages🔥):


Perplexity AI ▷ #pplx-api (14 messages🔥):

Link mentioned: Supported Models: no description found


Nous Research AI ▷ #ctx-length-research (11 messages🔥):

Link mentioned: Scaling Laws of RoPE-based Extrapolation: The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with ...


Nous Research AI ▷ #off-topic (17 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (16 messages🔥):

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):


Nous Research AI ▷ #general (181 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (53 messages🔥):

Links mentioned:


Nous Research AI ▷ #project-obsidian (3 messages):

Link mentioned: Why Not Both Por Que No Los Dos GIF - Why Not Both Por Que No Los Dos Yey - Discover & Share GIFs: Click to view the GIF


Nous Research AI ▷ #bittensor-finetune-subnet (1 messages):

paradox_13: What are the miner rates?


Nous Research AI ▷ #rag-dataset (75 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #world-sim (102 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #general (11 messages🔥):


CUDA MODE ▷ #triton (1 messages):

Link mentioned: PyTorch 2.3 Release Blog: We are excited to announce the release of PyTorch® 2.3 (release note)! PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kerne...


CUDA MODE ▷ #cuda (32 messages🔥):

Links mentioned:


CUDA MODE ▷ #torch (9 messages🔥):

Link mentioned: CUDA semantics — PyTorch 2.3 documentation: no description found


CUDA MODE ▷ #algorithms (4 messages):


CUDA MODE ▷ #beginner (6 messages):


CUDA MODE ▷ #pmpp-book (5 messages):


CUDA MODE ▷ #youtube-recordings (5 messages):


CUDA MODE ▷ #torchao (1 messages):

Link mentioned: Effort Engine: no description found


CUDA MODE ▷ #off-topic (1 messages):

iron_bound: https://github.com/adam-maj/tiny-gpu


CUDA MODE ▷ #llmdotc (353 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #massively-parallel-crew (4 messages):


Eleuther ▷ #general (28 messages🔥):

Link mentioned: Hashes — EleutherAI: no description found


Eleuther ▷ #research (324 messages🔥🔥):

Links mentioned:


Eleuther ▷ #scaling-laws (3 messages):


Eleuther ▷ #interpretability-general (6 messages):


Eleuther ▷ #lm-thunderdome (19 messages🔥):

Link mentioned: lm-evaluation-harness/lm_eval/api/metrics.py at 3196e907fa195b684470a913c7235ed7f08a4383 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


Eleuther ▷ #gpt-neox-dev (27 messages🔥):


Stability.ai (Stable Diffusion) ▷ #general-chat (354 messages🔥🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):


OpenRouter (Alex Atallah) ▷ #general (323 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #announcements (1 messages):

<ul>
  <li><strong>Llama 3 leaps into action</strong>: Boasting a training on 15T tokens and fine-tuning on 10M human annotated samples, <strong>Llama 3</strong> comes in 8B and 70B versions as both Instruct and Base. The 70B variant has notably become the best open LLM on the MMLU benchmark with a score over 80, and its coding abilities shine with scores of 62.2 (8B) and 81.7 (70B) on Human Eval, now available on Hugging Chat with <a href="https://huggingface.co/chat/models/meta-llama/Meta-Llama-3-70B-Instruct">Demo</a> and <a href="https://huggingface.co/blog/llama3">Blogpost</a>.</li>
  <li><strong>Phi-3's MIT Makeover</strong>: The recently rolled-out <strong>Phi-3</strong> Instruct variants, designed with contexts of 4k and 128k and trained on 3.3T tokens, demonstrate superior performance over Mistral 7B or Llama 3 8B Instruct on standard benchmarks. This model also features specialized "function_call" tokens and is optimized for mobile platforms, including Android and iPhones, with resources available via <a href="https://huggingface.co/chat/models/microsoft/Phi-3-mini-4k-instruct">Demo</a> and <a href="https://x.com/abhi1thakur/status/1782807785807159488">AutoTrain Finetuning</a>.</li>
  <li><strong>Open Source Bonanza</strong>: HuggingFace unveils <strong>FineWeb</strong>, a massive 15 trillion token web data set for research, alongside the latest updates to Gradio and Sentence Transformers for developers. Notably, <strong>The Cauldron</strong>, a large collection of vision-language datasets, emerges to assist in instruction fine-tuning, detailed at <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb">FineWeb</a> and <a href="https://huggingface.co/posts/tomaarsen/476985886331959">Sentence Transformers v2.7.0</a>.</li>
  <li><strong>HuggingChat Breaks into iOS</strong>: The HuggingChat app lands on Apple devices, bringing the power of conversational AI to iPhones, as announced in the latest post available <a href="https://huggingface.co/posts/fdaudens/628834201033253">here</a>.</li>
  <li><strong>Content to Quench Your AI Thirst</strong>: Explore the versatility of transformer agents with the blog post "Jack of All Trades, Master of Some", and get the low-down on deploying open models on Google Cloud in the upcoming HuggingCast, while the Open Chain of Thought Leaderboard offers a new competitive stage for researchers, as introduced at <a href="https://huggingface.co/blog/leaderboard-cot">Leaderboard CoT</a>.</li>
</ul>

HuggingFace ▷ #general (276 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (3 messages):


HuggingFace ▷ #cool-finds (9 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (15 messages🔥):

Links mentioned:


HuggingFace ▷ #computer-vision (3 messages):


HuggingFace ▷ #NLP (7 messages):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (4 messages):


LlamaIndex ▷ #blog (7 messages):


LlamaIndex ▷ #general (188 messages🔥🔥):

Links mentioned:


OpenAI ▷ #ai-discussions (128 messages🔥🔥):

Links mentioned:


OpenAI ▷ #gpt-4-discussions (18 messages🔥):


OpenAI ▷ #prompt-engineering (14 messages🔥):


OpenAI ▷ #api-discussions (14 messages🔥):


LAION ▷ #general (126 messages🔥🔥):

Links mentioned:


LAION ▷ #research (5 messages):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (85 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (8 messages🔥):


OpenAccess AI Collective (axolotl) ▷ #general-help (4 messages):


OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages):

aillian7: Is there a format for ORPO that i can use for a conversational use case?


OpenAccess AI Collective (axolotl) ▷ #community-showcase (9 messages🔥):

Link mentioned: internistai/base-7b-v0.2 · Hugging Face: no description found


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (10 messages🔥):

Links mentioned:


tinygrad (George Hotz) ▷ #general (61 messages🔥🔥):

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (31 messages🔥):


Modular (Mojo 🔥) ▷ #💬︱twitter (2 messages):


Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):

Link mentioned: Modular: Preventing supply chain attacks at Modular: We are building a next-generation AI developer platform for the world. Check out our latest post: Preventing supply chain attacks at Modular


Modular (Mojo 🔥) ▷ #ai (4 messages):


Modular (Mojo 🔥) ▷ #🔥mojo (21 messages🔥):


Modular (Mojo 🔥) ▷ #community-blogs-vids (15 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (5 messages):

Link mentioned: Issues · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.


Modular (Mojo 🔥) ▷ #📰︱newsletter (1 messages):

Zapier: Modverse Weekly - Issue 31 https://www.modular.com/newsletters/modverse-weekly-31


Modular (Mojo 🔥) ▷ #🏎engine (4 messages):


Modular (Mojo 🔥) ▷ #nightly (32 messages🔥):

Links mentioned:


Latent Space ▷ #ai-general-chat (77 messages🔥🔥):

<ul>
  <li><strong>Transformer Architecture Tweaks Under Discussion:</strong> Members were discussing an approach to improve transformer models by taking inputs from intermediate attention layers in addition to the last attention layer, likening the method to the Pyramid network in CNN architectures.</li>
  <li><strong>TherapistAI Sparks Controversy:</strong> A member highlighted the controversy surrounding levelsio's <a href="https://twitter.com/meijer_s/status/1783032528955183532">TherapistAI</a> on Twitter, criticizing its potentially misleading suggestion that it could replace a real therapist.</li>
  <li><strong>Semantic Search Solution Inquiry:</strong> A discussion about finding a good semantic search API like <a href="https://newsapi.org">newsapi.org</a> led to recommendations including<a href="https://omnisearch.ai/"> Omnisearch.ai</a>, though it wasn't a fit for scanning the web for news.</li>
  <li><strong>France Steps Towards LLMs in the Public Sector:</strong> There was a conversation regarding France's experimental incorporation of LLMs into public administration, with insights and opinions shared about France's innovation and political climate, linking to a <a href="https://twitter.com/emile_marzolf/status/1783072739630121432">tweet about the topic</a>.</li>
  <li><strong>AI Winter Predictions Stir Discussion:</strong> Users deliberated over the state and future of AI venture funding prompted by a <a href="https://x.com/schrockn/status/1783174294865887521?s=46&t=90xQ8sGy63D2OtiaoGJuww">tweet on AI bubble effects</a>, reflecting on the implications of a potential bubble burst for AI innovation.</li>
</ul>

Links mentioned:


LangChain AI ▷ #general (47 messages🔥):

Links mentioned:


LangChain AI ▷ #langchain-templates (1 messages):


LangChain AI ▷ #share-your-work (6 messages):

Links mentioned:


LangChain AI ▷ #tutorials (1 messages):


Cohere ▷ #general (42 messages🔥):

Links mentioned:


Cohere ▷ #project-sharing (6 messages):


OpenInterpreter ▷ #general (32 messages🔥):

Links mentioned:


OpenInterpreter ▷ #O1 (14 messages🔥):

Links mentioned:


OpenInterpreter ▷ #ai-content (1 messages):

8i8__papillon__8i8d1tyr: https://mlflow.org/


Interconnects (Nathan Lambert) ▷ #news (3 messages):


Interconnects (Nathan Lambert) ▷ #ml-questions (17 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (12 messages🔥):

Link mentioned: Tweet from Nick Frosst (@nickfrosst): we open sourced our chat interface. https://github.com/cohere-ai/cohere-toolkit/?tab=readme-ov-file


Interconnects (Nathan Lambert) ▷ #posts (10 messages🔥):


Mozilla AI ▷ #llamafile (25 messages🔥):

Links mentioned:


DiscoResearch ▷ #general (7 messages):

Links mentioned:


DiscoResearch ▷ #discolm_german (10 messages🔥):


Datasette - LLM (@SimonW) ▷ #llm (7 messages):

Links mentioned:


Skunkworks AI ▷ #general (1 messages):

burnytech: Hi!


Skunkworks AI ▷ #off-topic (2 messages):

Link mentioned: Toronto Local & Open-Source AI Developer Meetup · Luma: Local & open-source AI developer meetup is coming to Toronto! Join the Ollamas and friends at the Cohere space! Special thank you to abetlen (Andrei), the…


LLM Perf Enthusiasts AI ▷ #general (1 messages):

jeffreyw128: https://twitter.com/wangzjeff/status/1783215017586012566


LLM Perf Enthusiasts AI ▷ #opensource (1 messages):


LLM Perf Enthusiasts AI ▷ #openai (1 messages):


Alignment Lab AI ▷ #general-chat (1 messages):

neilbert.: Congrats! You are now Laurie Anderson!


AI21 Labs (Jamba) ▷ #general-chat (1 messages):

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.