Frozen AI News archive

Welcome /r/LocalLlama!

**Sakana** released a paper on evolutionary model merging. **OpenInterpreter** launched their **O1 devkit**. Discussions highlight **Claude Haiku**'s underrated performance with 10-shot examples. On **Reddit's IPO**, AINews introduces Reddit summaries starting with /r/LocalLlama, covering upcoming subreddits like r/machinelearning and r/openai. **Aether Research** released **Cerebrum 8x7b** based on **Mixtral**, matching **GPT-3.5 Turbo** and **Gemini Pro** on reasoning tasks, setting a new open-source reasoning SOTA. **Moistral 11B v1** finetuned model from Cream-Phi-2 creators was released. A creative writing benchmark uses **Claude Opus** as judge. Hobbyists explore **1.58 BitNet** ternary quantization and **1-bit LLMs** training. Nvidia's **Blackwell (h200)** chip supports **FP4 precision** quantization. **LMDeploy v0.2.6+** enables efficient vision-language model deployment with models like **Qwen-VL-Chat**. Users seek GUIs for LLM APIs with plugin and RAG support. Pipelines for synthetic training data generation and fine-tuning language models for chat are discussed.

Canonical issue URL

It's a quiet news day - Sakana shipped an evolutionary model merging paper, OpenInterpreter launched their O1 devkit, and people are talking about how Claude Haiku is underrated if you make 10-shot examples.

But on the occasion of Reddit's successful IPO today, it's a good time to FINALLY introduce Reddit summaries to AINews! just starting with /r/LocalLlama for now, and we'll be summarizing the comments soon, but next we have r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence mapped out. Let us know if we're missing any major alpha drop subreddits.


Table of Contents

[TOC]


REDDIT: /r/LocalLlama

Model Releases and Benchmarks

Quantization and Performance Optimization

Deployment and Serving

Training Data and Fine-Tuning

Hardware and Compute Resources

Memes and Humor

PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

Intel and AI Capacity

Debugging and Counterintuitive Code

Microsoft and OpenAI

Q-Star Energy-Based Model for Dialog Generation

Advice and Observations

Memes and Humor


PART 0: Summary of Summaries of Summaries

we are concluding that Claude Opus is just the best model for top level summaries so we're discontinuing the A/B/C tests (see archives for our struggles/record). We'll be exposing parallel runs for all 3 + more models (incl Gemini 1.5!!) as this problem is topologically similar to our personalization app we'll be launching.

1. Grok-1: The Behemoth Unleashed

2. Innovations in Retrieval-Augmented Generation (RAG)

3. Scaling Strategies and Efficiency for Large Language Models

4. Multilingual Challenges and Benchmarking for Language Models

5. Misc


PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord


Perplexity AI Discord

Pro Perks or Perplexing Problems?: Perplexity AI has granted Pro users unlimited daily queries on Claude 3 Opus, but users are raising concerns about the actual extent of "unlimited" in light of context limits. Clarification on what "unlimited" entails, in terms of use and context, is a hot topic among the community.

AI Parenting Prospects: A vibrant community discussion unfolded over the role of AI in simplifying complex concepts for children, underscoring the importance of an AI's developmental appropriateness and its potential in educational support.

Perplexity Amongst the Engineers: Despite plans to deprecate the sonar-medium-online model, it seems to be running post-deadline, causing user confusion. Engineers debate API behavior, with discussions around the maxtokens parameter and observations of different news results when queried through browsers versus the API.

In Search of Truth and Tech Jobs: Users shared their experiences using Perplexity AI's Claude 3 Opus for creative writing experiments, cleanest options query, probing North Korea's political dynamics, speculating about living on Mars, and scraping job postings. Questions abound as to the variability and reliability of provided links in search results.

Cautious Optimism on Corporate Collaborations: Speculation grows around Apple and Google's potential AI integrations, as details on generative AI collaborations are keenly discussed by members who share thoughts on tech giants' strategies and the future of AI commercialization.


Unsloth AI (Daniel Han) Discord


LM Studio Discord


Nous Research AI Discord


Eleuther Discord


OpenAI Discord


HuggingFace Discord


LlamaIndex Discord


Latent Space Discord


LAION Discord

Codex Decoded in Copilot: Microsoft Codex can now be accessed for free within the Copilot app, integrating Jupyter Notebooks and libraries like simpy and matplotlib, enabling a more resourceful coding environment.

DALL-E 3 Dataset's New Home: Confusion about the DALL-E 3 dataset being removed from Hugging Face was resolved; it's been relocated and is available at this direct link.

Grok-1 Joins the AI Fray: OpenAI's Grok-1, an impressive 314B parameter model, has hit the scene with a splash, performing notably well in various benchmarks. Its release on GitHub piqued interest and comparison with models like Mixtral and LLaMA, and is up for exploration here.

Efficient Ways to Better LLMs: An arXiv paper discussed cost-effective methods such as learning rate warming and replay of previous data for updating LLMs without full re-training.

Speculative GPT-4 Gossip: Speculation abounds on GPT-4 being a 1.8 trillion-parameter mixture of experts (MoE) model, following a hint from Nvidia. The authenticity of GPT-4's details remains unconfirmed and the topic was sparked by a tweeted image.


CUDA MODE Discord

Photonics Chips Blaze Past Traditional Silicon: Anastasia's video on photonic chips stimulated chatter about technology that's a thousand times faster than traditional chips, alongside mentions of resources like the Asianometry channel for enthusiasts seeking in-depth knowledge on silicon photonics and light-based networks.

Triton Debugging Gets Visual: Engineers shared a new visualizer tool for simplifying Triton debugging, and a set of Triton Puzzles for deepening knowledge, available for trials on Google Colab.

CUDA Communities Unpack Scheduler Mysteries: Intense discussions delved into the nuances of CUDA's warp schedulers and memory management tactics, sparking a conversation about the intricacies of ProducerProvides, ConsumerTakes, async work, and stream synchronization.

Reconfigurable Computing in Academia: Members gazed into the academic niche of reconfigurable computing for efficient ML, driven by Prof. Mohamed Abdelfattah's work and an ECE 5545 course syllabus, despite some confusion over textbook specifics resolved by referencing the course's first lecture video.

Catching Up with CUDA: Fresh CUDA enthusiasts were offered guidance with book recommendations like "Programming Massively Parallel Processors", available here on Amazon, and encouragement to harness frameworks like torch for stepping into ML/DL realms.

Thoughtful Threads on Striped and Flash Attention: A healthy debate on attention mechanisms saw discussions about memory requirements contrasting Ring Attention and Flash Attention, including recommendations to consult specific literature (Striped Attention paper) and code (GitHub implementation) for clarification.

AI and Systems Collide at MLSys 2024: Engineers swapped details about the MLSys 2024 conference, emphasizing its critical role at the convergence of Machine Learning and Systems for facing emerging AI challenges (MLSys Conference).

Gearing Up for a GTC Gathering: Gautier's biggest AI enthusiasts are organizing meetups for GTC 2023, discussing visiting plans and sharing contact information while acknowledging some high-spirited humor around the constraints of attending such exclusive events.


OpenRouter (Alex Atallah) Discord

LLaMa Models Play Nice with Prompts: The LLaMa models are confirmed to work well with prompts structured in "system", "user", and "assistant" roles, useful for those utilizing the OpenAI JavaScript library.

Script Breaks Down Books for AI Segmentation: An innovative script has been developed that deconstructs books for AI-driven segment generation, with notable improvements in generative quality when instruction-based data is utilized, revealed through testing with Airoboros 70B and comparing against lzlv 70B.

Demand for In-Depth Usage Analytics Rises: Discussions highlighted the community's need for detailed usage analytics akin to those provided by OpenAI, revealing a specific interest in insights such as daily or weekly usage costs, broken down by models and applications.

Models Play Hard to Get: Recent changes in model behavior have been noted, with a particular decrease in a model's willingness to perform tasks, accompanying questions about access to beta models like sonnet:beta and opus:beta. The company confirmed that there should be general access.

API for the People, by the People: One user plans to debut a public API and seeks to have it included in OpenRouter’s listings, prompting a positive response from the platform eager for further detail exchanges through direct messages.


LangChain AI Discord

API Evolution Sparks Curiosity: Engineers are questioning the future of LangChain's astream_log given the beta status of astream_events; concerns revolve around potential deprecation or the distinction in use cases between the two.

Rubik's AI Awaits Eager Testers: Beta testers are being summoned for Rubik's AI, a promising research assistant offering access to Claude 3 Opus, GPT-4 Turbo, and Mistral Large. Those interested can join the waitlist.

LangChain JavaScript Streaming Stumbles: Reports have surfaced of streaming issues with RemoteRunnable in JavaScript, unlike its functionality in Python. The community is looking for insights or fixes, with suggestions to follow up on GitHub and LangChain's security guidelines.

Community Showcases Diverse AI Creations: Innovators have introduced various AI tools: an AI chatbot for data analysis (Haste171/langchain-chatbot), Living Bookmarks bot managing Raindrop.io bookmarks, a call for interviews on productivity with NeuroFusion, a popular AI-based scraper Scrapegraph-ai, and Lyzr.ai's Automata for simulating sales roles (GitHub Repo).

AI Learning Made Accessible: Didactic resources on creating a personalized nutrition AI with privacy focus using Langchain's Pebblo are shared in a YouTube tutorial (Nutriheal Demo), along with documentation for locally deploying AI solutions, harnessing generic UI for AI assistants, and developing 'plan-and-execute' style AI agents with strategic abilities (Langgraph Tutorial).


Interconnects (Nathan Lambert) Discord

Model Mystery Unveiled Through API: An arXiv paper discusses how queries to API-protected large language models (LLMs) could leak proprietary information such as model size – an unintended "softmax bottleneck". Concerns were raised about the accuracy of these findings, especially when models use technologies like MoE, which could skew size estimations.

Open Source Definitions Stir Drama: A Twitter conversation sparked predictions of drama in the machine learning community over what should be considered "open source". This sparked conversations about including data in the definition of open-source software, with a push towards establishing a pragmatic consensus on the term's boundaries. Meanwhile, there is dissatisfaction with EleutherAI's social media engagement strategy.

Grok-1 Joins The Model Party: xAI introduced Grok-1, a 314 billion parameter MoE model, raising discussions around its release, performance metrics, which were rumored to surpass those of Falcon, and its marketing strategy. Skepticism was voiced over torrent-based distribution affecting the reputation and policies around open-source AI models, leading to a tongue-in-cheek idea about physically shipping models via mail.


Alignment Lab AI Discord


LLM Perf Enthusiasts AI Discord


DiscoResearch Discord


Datasette - LLM (@SimonW) Discord


Skunkworks AI Discord


PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #announcements (1 messages):

Link mentioned: Introducing Stable Video 3D: Quality Novel View Synthesis and 3D Generation from Single Images — Stability AI: When we released Stable Video Diffusion, we highlighted the versatility of our video model across various applications. Building upon this foundation, we are excited to release Stable Video 3D. This n...


Stability.ai (Stable Diffusion) ▷ #general-chat (988 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #announcements (1 messages):


Perplexity AI ▷ #general (795 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (35 messages🔥):


Perplexity AI ▷ #pplx-api (64 messages🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (853 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #announcements (1 messages):

Link mentioned: GitHub - unslothai/unsloth: 2-5X faster 70% less memory QLoRA & LoRA finetuning: 2-5X faster 70% less memory QLoRA & LoRA finetuning - unslothai/unsloth


Unsloth AI (Daniel Han) ▷ #random (25 messages🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (568 messages🔥🔥🔥):

Links mentioned:


Unsloth AI (Daniel Han) ▷ #suggestions (21 messages🔥):

Links mentioned:


LM Studio ▷ #💬-general (301 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (138 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (12 messages🔥):

Link mentioned: andrewcanis/c4ai-command-r-v01-GGUF · Hugging Face: no description found


LM Studio ▷ #🎛-hardware-discussion (480 messages🔥🔥🔥):

Links mentioned:


LM Studio ▷ #🧪-beta-releases-chat (4 messages):

Link mentioned: GitHub - lmstudio-ai/configs: LM Studio JSON configuration file format and a collection of example config files.: LM Studio JSON configuration file format and a collection of example config files. - lmstudio-ai/configs


LM Studio ▷ #langchain (1 messages):


LM Studio ▷ #avx-beta (5 messages):


LM Studio ▷ #amd-rocm-tech-preview (5 messages):

Link mentioned: GitHub - brknsoul/ROCmLibs: Prebuild Windows ROCM Libs for gfx1031 and gfx1032: Prebuild Windows ROCM Libs for gfx1031 and gfx1032 - brknsoul/ROCmLibs


LM Studio ▷ #crew-ai (1 messages):


Nous Research AI ▷ #off-topic (56 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (16 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (656 messages🔥🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (25 messages🔥):

Links mentioned:


Nous Research AI ▷ #bittensor-finetune-subnet (18 messages🔥):


Nous Research AI ▷ #rag-dataset (100 messages🔥🔥):

Link mentioned: scratchTHOUGHTS/commanDUH.py at main · EveryOneIsGross/scratchTHOUGHTS: 2nd brain scratchmemory to avoid overrun errors with self. - EveryOneIsGross/scratchTHOUGHTS


Eleuther ▷ #general (273 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (245 messages🔥🔥):

Links mentioned:


Eleuther ▷ #scaling-laws (11 messages🔥):


Eleuther ▷ #interpretability-general (13 messages🔥):

Links mentioned:


Eleuther ▷ #lm-thunderdome (31 messages🔥):

Links mentioned:


Eleuther ▷ #gpt-neox-dev (3 messages):


OpenAI ▷ #ai-discussions (193 messages🔥🔥):

Link mentioned: Enterprise privacy: no description found


OpenAI ▷ #gpt-4-discussions (34 messages🔥):


OpenAI ▷ #prompt-engineering (79 messages🔥🔥):


OpenAI ▷ #api-discussions (79 messages🔥🔥):


HuggingFace ▷ #general (96 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (12 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (12 messages🔥):

Links mentioned:


HuggingFace ▷ #NLP (18 messages🔥):

Link mentioned: Introduction - Hugging Face NLP Course: no description found


LlamaIndex ▷ #blog (7 messages):


LlamaIndex ▷ #general (303 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (4 messages):

Link mentioned: RAG with LlamaParse, Qdrant and Groq | Step By Step: In this video, I will show you how to create a effective RAG with LlamaParse, Qdrant and Groq. I will explain what LlamaParse is and briefly walk you through...


Latent Space ▷ #ai-general-chat (202 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (2 messages):

Link mentioned: Suno, an AI music generator | Hacker News: no description found


Latent Space ▷ #llm-paper-club-west (20 messages🔥):


Latent Space ▷ #ai-in-action-club (36 messages🔥):

Link mentioned: AI In Action: Weekly Jam Sessions: 2024 Topic,Date,Facilitator,Resources,@dropdown UI/UX patterns for GenAI,1/26/2024,nuvic,<a href="https://maggieappleton.com/squish-structure">https://maggieappleton.com/squish-struct...


LAION ▷ #general (168 messages🔥🔥):

Links mentioned:


LAION ▷ #research (13 messages🔥):

Links mentioned:


CUDA MODE ▷ #general (43 messages🔥):

Links mentioned:


CUDA MODE ▷ #triton (7 messages):

Link mentioned: Google Colaboratory: no description found


CUDA MODE ▷ #cuda (68 messages🔥🔥):

Links mentioned:


CUDA MODE ▷ #suggestions (5 messages):

Links mentioned:


CUDA MODE ▷ #jobs (1 messages):

vim410: Depends. But yes.


CUDA MODE ▷ #beginner (5 messages):

Link mentioned: no title found: no description found


CUDA MODE ▷ #pmpp-book (6 messages):


CUDA MODE ▷ #ring-attention (14 messages🔥):

Links mentioned:


CUDA MODE ▷ #off-topic (5 messages):

Link mentioned: MLSys 2024: no description found


CUDA MODE ▷ #gtc-meetup (9 messages🔥):

Link mentioned: I Snuck Into A Secret Arms-Dealer Conference: Get an exclusive video every month at https://www.patreon.com/Boy_BoyWe made this in collaboration with the legendary Australian political satire group The C...


OpenRouter (Alex Atallah) ▷ #general (159 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #general (95 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #langserve (45 messages🔥):

Links mentioned:


LangChain AI ▷ #share-your-work (11 messages🔥):

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #other-papers (8 messages🔥):

Link mentioned: Logits of API-Protected LLMs Leak Proprietary Information: The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption...


Interconnects (Nathan Lambert) ▷ #ml-drama (19 messages🔥):

Link mentioned: Tweet from Stella Biderman (@BlancheMinerva): @natolambert @felix_red_panda You're wrong though :P


Interconnects (Nathan Lambert) ▷ #random (63 messages🔥🔥):

Links mentioned:


Alignment Lab AI ▷ #general-chat (6 messages):


Alignment Lab AI ▷ #oo (32 messages🔥):

Link mentioned: keirp/hungarian_national_hs_finals_exam · Datasets at Hugging Face: no description found


LLM Perf Enthusiasts AI ▷ #general (1 messages):


LLM Perf Enthusiasts AI ▷ #claude (7 messages):

Link mentioned: Tweet from roon (@tszzl): anthropic is controlled opposition to put the fear of god in the members of technical staff


LLM Perf Enthusiasts AI ▷ #reliability (16 messages🔥):

Links mentioned:


LLM Perf Enthusiasts AI ▷ #openai (1 messages):

res6969: https://x.com/leopoldasch/status/1768868127138549841?s=46


DiscoResearch ▷ #general (21 messages🔥):

Links mentioned:


DiscoResearch ▷ #discolm_german (4 messages):


Datasette - LLM (@SimonW) ▷ #ai (20 messages🔥):

Links mentioned:


Datasette - LLM (@SimonW) ▷ #llm (1 messages):

obra: Is it possible to recover the seed used by the openai models for a previous api request?


Skunkworks AI ▷ #general (17 messages🔥):


Skunkworks AI ▷ #off-topic (1 messages):

pradeep1148: https://www.youtube.com/watch?v=ZlJbaYQ2hm4