Frozen AI News archive

Qwen 2 beats Llama 3 (and we don't know how)

**Alibaba** released **Qwen 2** models under Apache 2.0 license, claiming to outperform **Llama 3** in open models with multilingual support in **29 languages** and strong benchmark scores like **MMLU 82.3** and **HumanEval 86.0**. **Groq** demonstrated ultra-fast inference speed on **Llama-3 70B** at **40,792 tokens/s** and running 4 Wikipedia articles in 200ms. Research on **sparse autoencoders (SAEs)** for interpreting **GPT-4** neural activity showed new training methods, metrics, and scaling laws. **Meta AI** announced the **No Language Left Behind (NLLB)** model capable of high-quality translations between **200 languages**, including low-resource ones. *"Our post-training phase is designed with the principle of scalable training with minimal human annotation,"* highlighting techniques like rejection sampling for math and execution feedback for coding.

Canonical issue URL

AI News for 6/5/2024-6/6/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (408 channels, and 2450 messages) for you. Estimated reading time saved (at 200wpm): 304 minutes.

With Qwen 2 being Apache 2.0, Alibaba is now claiming to universally beat Llama 3 for the open models crown:

image.png

There are zero details on dataset so it's hard to get any idea of how they pulled this off, but they do drop some hints on post-training:

Our post-training phase is designed with the principle of scalable training with minimal human annotation.

Specifically, we investigate how to obtain high-quality, reliable, diverse and creative demonstration data and preference data with various automated alignment strategies, such as

These collective efforts have significantly boosted the capabilities and intelligence of our models, as illustrated in the following table.

They also published a post on Generalizing an LLM from 8k to 1M Context using Qwen-Agent.


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Qwen2 Open-Source LLM Release

Groq's Inference Speed on Large LLMs

Sparse Autoencoder Training Methods for GPT-4 Interpretability

Meta's No Language Left Behind (NLLB) Model

Pika AI's Series B Funding

Other Noteworthy Developments


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

LLM Developments and Applications

AI Developments and Concerns

AI Assistants and Interfaces

AI Content Generation


AI Discord Recap

A summary of Summaries of Summaries

  1. LLM and Model Performance Innovations:

    • Qwen2 Attracts Significant Attention with models ranging from 0.5B to 7B parameters, appreciated for their ease of use and rapid iteration capabilities, supporting innovative applications with 128K token contexts.

    • Stable Audio Open 1.0 Generates Interest leveraging components like autoencoders and diffusion models, as detailed on Hugging Face, raising community engagement in custom audio generation workflows.

    • ESPNet Competitive Benchmarks Shared for Efficient Transformer Inference: Discussions around the newly released ESPNet showed promising transformer efficiency, pointing towards enhanced throughput on high-end GPUs (H100), as documented in the ESPNet Paper.

    • Seq1F1B Promotes Efficient Long-Sequence Training: The pipeline scheduling method introduces significant memory savings and performance gains for LLMs, as per the arxiv publication.

  2. Fine-tuning and Prompt Engineering Challenges:

    • Model Fine-tuning Innovations: Fine-tuning discussions highlight the use of gradient accumulation to manage memory constraints, and custom pipelines such as using FastLanguageModel.for_inference for Alpaca-style prompts, as demonstrated in a Google Colab notebook.

    • Chatbot Query Generation Issues: Debugging Cypher queries using Mistral 7B emphasized the importance of systematic evaluation and iterative tuning methods in successful model training.

    • Adapter Integration Pitfalls: Critical challenges with integrating trained adapters pointed to a need for more efficient adapter loading techniques to maintain performance, supported by practical coding experiences.

  3. Open-Source AI Developments and Collaborations:

    • Prometheus-2 Evaluates RAG Apps: Prometheus-2 offers an open-source alternative to GPT-4 for evaluating RAG applications, valued for its affordability and transparency, detailed on LlamaIndex.

    • Launch of OpenDevin sparks collaboration interest, featuring a robust AI system for autonomous engineering developed by Cognition, with documentation available via webinar and GitHub.

    • Gradient Accumulation Strategies Improve Training: Discussions on Unsloth AI emphasized using gradient accumulation to handle memory constraints effectively, reducing training times highlighted by shared YouTube tutorials.

    • Mojo Rising as a Backend Framework: Developers shared positive experiences using Mojo for HTTP server development, depicting its advantages in static typing and compile-time computation features on GitHub.

  4. Deployment, Inference, and API Integrations:

    • Perplexity Pro Enhances Search Abilities: The recent update added step-by-step search processes via an intent system, enabling more agentic execution, as discussed within the community around Perplexity Labs.

    • Discussion on Modal's Deployment and Privacy: Queries about using Modal for LLM deployments included concerns about its fine-tuning stack and privacy policies, with additional support provided through Modal Labs documentation.

    • OpenRouter Technical Insights and Limits: Users explored technical specifications and capabilities, including assistant message prefill support and handling function calls through Instructor tool.

  5. AI Community Discussions and Events:

    • Stable Diffusion 3 Speculation: Community buzz surrounds the anticipated release, with speculation about features and timelines, as detailed in various Reddit threads.

    • Human Feedback Foundation Event on June 11: Upcoming discussions on integrating human feedback into AI, featuring speakers from Stanford and OpenAI with recordings available on their YouTube channel.

    • Qwen2 Model Launches with Overwhelming Support: Garnering excitement for its multilingual capabilities and enhanced benchmarks, the release on platforms like Hugging Face highlights its practical evaluations.

    • Call for JSON Schema Support in Mozilla AI: Requests for JSON schema inclusion in the next version to ease application development were prominently noted in community channels.

    • Keynote on Robotics AI and Foundation Models: Investment interests in "ChatGPT for Robotics" amid foundation model companies underscore the strategic alignment detailed in Newcomer's article.


PART 1: High level Discord summaries

LLM Finetuning (Hamel + Dan) Discord


OpenAI Discord


Unsloth AI (Daniel Han) Discord

Gradient Accumulation to the Rescue: Engineers agreed that gradient accumulation can alleviate memory constraints and improve training times, but warned of potential pitfalls with larger batch sizes due to unexpected memory allocation behaviors.

Tackling Inferential Velocity with Alpacas: An engineer shared a code snippet leveraging FastLanguageModel.for_inference to utilize Alpaca-style prompts for sequence generation in LLMs, which sparked interest alongside discussions about a shared Google Colab notebook.

Adapter Merging Mayday: Challenges with integrating trained adapters causing significant dips in performance led to calls for more efficient adapter loading techniques to maintain training efficiency.

Qwen2 Models Catch Engineers' Eyes: Excitement bubbles over the release of Qwen2 models, with engineers keen on the smaller-sized models ranging from 0.5B to 7B for their ease of use and faster iteration capabilities.

Quest for Solutions in the Help Depot: Conversations in the help channel emphasized a need for a VRAM-saving lora-adapter file conversion process, quick intel on a bug potentially slowing down inference, strategies for mitigating GPU memory overloads, and clarifications on running gguf models and implementing a RAG system, referenced to Mistral documentation.


Stability.ai (Stable Diffusion) Discord


LM Studio Discord


HuggingFace Discord

Moderation is Key: The community debated moderation strategies in response to reports of inappropriate behavior. Professionalism in handling such issues is crucial.

Gradio API Challenges: Integrating Gradio with React Native and Node.js raised questions within the community. It's built with Svelte, so users were directed to investigate Gradio's API compatibility.

Text with Stability: Discussion around Stable Diffusion models for text generation pointed members towards solutions like AnyText and TextDiffuser-2 from Microsoft for robust output.

When Compute Goes Peer-to-Peer: The conversation turned to peer-to-peer compute for distributed machine learning, with tools like Petals and experiences with privacy-conscious local swarms offering promising avenues.

Human Feedback in AI: The Human Feedback Foundation is making strides in incorporating human feedback into AI, with an event on June 11th and a trove of educational sessions on their YouTube channel.

Small Datasets, Big Challenges: In computer vision discussions, dealing with small datasets and unrepresentative validation sets was a pressing concern. Solutions include using diverse training data and maybe even transformers despite their longer training times.

Swin Transformer Tests: There was a query about applying the Swin Transformer to CIFar datasets, highlighting the community's interest in experimenting with contemporary models in various scenarios.

Deterministic Models Turn Down the Heat: A single message highlighted lowering temperature settings to 0.1 to achieve more deterministic model behavior, prompting reflection on model tuning approaches.

Sample Input Snafus: Confusion over text embeddings and proper structuring of sample inputs for models like text-enc 1 and text-enc 2 surfaced, along with a discussion on the challenges posed by added kwargs in a dictionary format.

Re-parameterising with Results: A member successfully re-parameterised Segmind's ssd-1b into a v-prediction/zsnr refiner model and lauded it as a new favorite, hinting at a possible trend toward 1B mixture of experts models.

A Helping Hand for Projects: In a stretch of community aid, members offered personal assistance through DMs for addressing dataset questions, adding to the guild's collaborative environment.


Eleuther Discord

KAN Skepticism Expressed: Kolmogorov-Arnold Networks (KANs) were deemed less efficient than traditional neural networks by guild members, with concerns about their scalability and interpretability. However, there's interest in more efficient implementations of KANs, such as those using ReLU, evidenced by a shared ReLU-KAN architecture paper.

Expanding the Data Curation Toolbox: Participants debated the utility of influence functions in data quality evaluation, with the LESS algorithm (LESS algorithm) being mentioned as a potentially more scalable alternative for selecting high-quality training data.

Breakthroughs in Efficient Model Training: Innovations in model training were widely shared, including Nvidia's new open weights available on GitHub, the exploration of MatMul-free models (arXiv) for increased efficiency, and Seq1F1B's promise for more memory-efficient long-sequence training (arXiv).

Quantization Technique May Boost LLM Performance: The novel QJL method presents a promising avenue for large language models by compressing KV cache requirements through a quantization process (arXiv).

Brain-Data Speech Decoding Adventure: A guild member reported experimenting with Whisper tiny.en embeddings and brain implant data to decode speech, requesting peer suggestions to optimize the model by adjusting layers and loss functions while facing the constraint of a single GPU for training.


Perplexity AI Discord


CUDA MODE Discord


Interconnects (Nathan Lambert) Discord


Modular (Mojo 🔥) Discord


Latent Space Discord


LlamaIndex Discord

Prometheus-2 Pitches for RAG App Judging: Prometheus-2 is presented as an open-source alternative to GPT-4 for evaluating RAG applications, sparking interest due to concerns about transparency and affordability.

LlamaParse Pioneers Knowledge Graph Construction: A posted notebook demonstrates how LlamaParse can execute first-class parsing to develop knowledge graphs, paired with a RAG pipeline for node retrieval.

Configuration Overload in LlamaIndex: AI engineers are expressing difficulty with the complexity of configuring LlamaIndex for querying JSON data and are seeking guidance, as well as discussing issues with Text2SQL queries not balancing structured and unstructured data retrieval.

Exploring LLM Options for Resource-Limited Scenarios: Discussions on alternative setups for those with hardware limitations veer towards smaller models like Microsoft Phi-3 and experimenting with platforms like Google Colab for heavier models.

Scoring Filters Gain Customizable Edges: Engineers are discussing the capability of LlamaIndex to filter results by customizable thresholds and performance score, indicating a need for fine-tuned precision in search results.


Cohere Discord


Nous Research AI Discord

Qwen2 Leaps Ahead: The launch of the Qwen2 models marks a significant evolution from Qwen1.5, now featuring support for 128K token context lengths, 27 additional languages, and pretrained as well as instruction-tuned models in various sizes. They are available on platforms like GitHub, Hugging Face, and ModelScope, along with a dedicated Discord server.

Map Event Prediction Discussion: A user inquired about predicting true versus false event points on a map with temporal data, leading to a conversation about relevant commands and techniques, although specific methods were not provided.

Update on Mistral API and Model Storage: Mistral's introduction of a fine-tuning API and associated costs sparked discussion, with a focus on practical implications for development and experimentation. The API, including pricing details, is explained in their fine-tuning documentation.

Mobile Text Input Gets a Makeover: WorldSim Console updated their mobile platform, resolving bugs related to text input, improving text input reliability, and offering new features such as enhanced copy/paste and cosmetic customization options.

Music Exploration in Off-Topic: One member shared links to explore "Wakanda music", though this might have limited technical relevance for the engineer audience. Among the shared links were music videos like DG812 - In Your Eyes and MitiS & Ray Volpe - Don't Look Down.


OpenRouter (Alex Atallah) Discord

Server Management Made Easy with Pilot: The Pilot bot is revolutionizing how Discord servers are managed by offering features such as "Ask Pilot" for intelligent server insights, "Catch Me Up" for message summarization, and weekly "Health Check" reports on server activity. It's free to use and improves community growth and engagement, accessible through their website.

AI Competitors in Role-Playing Realm: The WizardLM 8x22b model is currently gaining popularity in the role-playing community, nevertheless Dolphin 8x22 emerges as a potential rival, awaiting user tests to compare their effectiveness.

Gemini Flash Sparks Image Output Curiosity: Inquiries about whether Gemini Flash can render images spurred clarification that while no Large Language Model (LLM) presently offers direct image outputs, they can theoretically use base64 or call external services like Stable Diffusion for image generation.

Tool Tips for Handling Function Calls: For handling specific function calls and formatting, Instructor is recommended as a powerful tool, facilitating automated command execution and improving user workflows.

Technical Discussions Amidst Model Enthusiasm: A member's query regarding prefill support in OpenRouter led to a confirmation that it's possible, particularly with the usage of reverse proxies; meanwhile, excitement is building around GLM-4 due to its support for the Korean language, hinting at the model's potential in multilingual applications.


MLOps @Chipro Discord


OpenAccess AI Collective (axolotl) Discord

Data Feast for AI Enthusiasts: Engineers lauded the accessibility of 15T datasets, humorously noting the conundrum of abundance in data but scarcity in computing resources and funding.

GPU Banter Amidst Hardware Discussions: The suitability of 4090s for pretraining massive datasets sparked a facetious exchange, jesting about the limitations of consumer GPUs for such demanding tasks.

Finetuning Fun with GLM and Qwen2: The community shared tips and configurations for finetuning GLM 4 9b and Qwen2 models, noting that Qwen2's similarity to Mistral simplifies the process.

Quest for Reliable Checkpointing: The use of Hugging Face's TrainingArguments and EarlyStoppingCallback featured in talks about checkpoint strategies, specifically for capturing both the most recent and best performing states based on eval_loss.

Error Hunting in AI Code: Troubleshooting the "returned non-zero exit status 1" error prompted members to suggest pinpointing the failing command, scrutinizing stdout and stderr, and checking for permission or environment variable issues.


LAION Discord


LangChain AI Discord


tinygrad (George Hotz) Discord


OpenInterpreter Discord

Need for Speed with Graphics: Members are seeking advice on executing graphics output with interpreter.computer.run, specifically for visualizations like those produced by matplotlib without success thus far.

OS Mode Mayhem: Conversations highlighted troubles in getting --os mode to operate correctly with local models from LM Studio, including issues with local LLAVA models not starting screen recording.

Vision Quest on M1 Mac: Engineers expressed frustration about hardware constraints on vision models for M1 Mac, indicating a strong interest in free and accessible AI solutions, given the high costs associated with OpenAI's offerings.

Integration Anticipation for Rabbit R1: Excitement is brewing over integrating Rabbit R1 with OpenInterpreter, particularly the upcoming webhook feature, to enable practical actions.

Bash Model Request Open: A call for suggestions for an open model suitable for handling bash commands has yet to be answered, leaving an open gap for potential recommendations.


AI Stack Devs (Yoko Li) Discord

Curiosity for AI Town's Development Status: Members in AI Stack Devs sought an update on the project, with one expressing interest in progress, while another apologized for not contributing yet due to a lack of time.

Tileset Troubles in AI Town: An engineering challenge surfaced around parsing spritesheets for AI Town, with a proposal to use the provided level editor or Tiled, supported by conversion scripts from the community.

Learning to Un-Censor LLMs: A member shared insights from a Hugging Face blog post on abliteration, which uncensors LLMs, featuring instruct versions of the third generation of Llama models. They followed up by inquiring about applying this technique to OpenAI models.

Unanswered OpenAI Implementation Query: Despite sharing the study on abliteration, a call for knowledge on how to implement the technique with OpenAI models went unanswered in the thread.

For a deeper dive:


Datasette - LLM (@SimonW) Discord


Torchtune Discord

Megatron's Checkpoint Conundrum: Engineers enquired about Megatron's compatibility with fine-tuning libraries, noting its unique checkpoint format. It was agreed that converting Megatron checkpoints to Hugging Face format and utilizing Torchtune for fine-tuning was the best course of action.


Mozilla AI Discord


YAIG (a16z Infra) Discord


The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

LLM Finetuning (Hamel + Dan) ▷ #general (66 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #workshop-1 (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #asia-tz (5 messages):

- **Jeremyhoward loves Hainan**: *"I love hainan! 😄"*. Later, Blaine shared his love for Shenzhou Peninsula mentioning nearby beaches and passion fruits.
- **Anmol from India seeks chatbot pricing advice**: Anmol asked for advice on pricing an enterprise customer service chatbot. He expressed hope that someone with experience could assist him.
- **Hanoi to Germany transition**: Hehehe0803 introduced themselves from Hanoi, Vietnam, currently living in Germany. They mentioned joining late and expressed hope to connect with others.

LLM Finetuning (Hamel + Dan) ▷ #🟩-modal (8 messages🔥):

- **Modal Privacy Policy Sought**: A user inquired about the privacy policy of Modal. Another user provided a link to a Google search for further information: [Privacy Policy Modal Labs](https://www.google.com/search?q=privacy+policy+modal+labs).

- **Confusion on LLM Inference Setup**: A user asked about setting up a server to run an LLM and expose an endpoint, referencing a [Modal example script on GitHub](https://github.com/modal-labs/modal-examples/blob/main/06_gpu_and_ml/llm-serving/text_generation_inference.py#L240C20-L240C30). They were unsure about how to get the base URL for calling the endpoint from REST clients like Postman.

- **Praise for Modal from a GPU Enthusiast**: A user who typically trains locally with multiple GPUs tried Modal and found it "super cool". They expressed their appreciation with emojis: 👍👏.

- **Dataset Handling Issue with Axolotl Configs**: A user experienced issues with Modal's insistence on passing a dataset, which overrode their existing axolotl configuration. They mentioned hacking the `train.py` to remove the dataset code, which resolved the issue for them.

Link mentioned: modal-examples/06_gpu_and_ml/llm-serving/text_generation_inference.py at main · modal-labs/modal-examples: Examples of programs built using Modal. Contribute to modal-labs/modal-examples development by creating an account on GitHub.


LLM Finetuning (Hamel + Dan) ▷ #learning-resources (1 messages):


LLM Finetuning (Hamel + Dan) ▷ #jarvis-labs (2 messages):

Link mentioned: Debugging and Troubleshooting | Jarvislabs: Some common troubleshooting tips for updating Cuda, Freeing up the disk space and many more.


LLM Finetuning (Hamel + Dan) ▷ #hugging-face (3 messages):


LLM Finetuning (Hamel + Dan) ▷ #replicate (9 messages🔥):


LLM Finetuning (Hamel + Dan) ▷ #langsmith (29 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #workshop-3 (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #workshop-4 (4 messages):


LLM Finetuning (Hamel + Dan) ▷ #clavie_beyond_ragbasics (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #jason_improving_rag (150 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #jeremy_python_llms (6 messages):


LLM Finetuning (Hamel + Dan) ▷ #yang_mistral_finetuning (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #axolotl (13 messages🔥):

Link mentioned: Dependency Resolution - pip documentation v24.1.dev1: no description found


LLM Finetuning (Hamel + Dan) ▷ #zach-accelerate (12 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #freddy-gradio (2 messages):


LLM Finetuning (Hamel + Dan) ▷ #charles-modal (12 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #langchain-langsmith (1 messages):


LLM Finetuning (Hamel + Dan) ▷ #credits-questions (22 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #strien_handlingdata (1 messages):


LLM Finetuning (Hamel + Dan) ▷ #fireworks (28 messages🔥):


LLM Finetuning (Hamel + Dan) ▷ #emmanuel_finetuning_dead (98 messages🔥🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #braintrust (29 messages🔥):

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #europe-tz (2 messages):

- **Local Roots Shout-Out**: One user mentioned living in London but originally being from Portugal. Another user opted to keep their origin a secret with a *"🤐"* emoji.

LLM Finetuning (Hamel + Dan) ▷ #announcements (2 messages):

Link mentioned: no title found: no description found


LLM Finetuning (Hamel + Dan) ▷ #predibase (3 messages):

Link mentioned: Quickstart | Predibase: Predibase provides the fastest way to fine-tune and serve open-source LLMs. It's built on top of open-source LoRAX.


LLM Finetuning (Hamel + Dan) ▷ #openpipe (4 messages):


LLM Finetuning (Hamel + Dan) ▷ #openai (69 messages🔥🔥):

<!-- Summary -->

- **OpenAI credits applied retroactively**: Several users noted that their credits were applied to their existing API balance, making it similar to adding funds via a credit card. [Members discussed](https://platform.openai.com/settings/organization/billing/overview) potential improvements for those new to the API.
- **Finalizing Tier 2 API status for students**: OpenAI granted Tier 2 API status to those who filled out the form in time, allowing them to utilize the additional credits. Users should stay tuned for updates if they missed the initial registration.
- **Late submission form for credits**: To rectify earlier submission errors, [a new form for additional credit requests](https://maven.com/parlance-labs/fine-tuning/1/forms/f2d68f) has been shared and needs to be correctly filled out.
- **Internal thoughts during fine-tuning**: There was an in-depth discussion regarding how to handle "internal thoughts" in long multi-turn conversations during OpenAI model fine-tuning. Delimiters and separate examples were proposed as potential solutions.
- **Public acknowledgment and kudos**: The group appreciated the efforts of OpenAI team members for their swift and effective support, highlighted in a [Twitter post](https://x.com/TheZachMueller/status/1798674326633247143) expressing gratitude.

Links mentioned:


OpenAI ▷ #ai-discussions (266 messages🔥🔥):

- **GPTs at their limits with advanced programming questions**: A user noted that their programming questions have become more specific and complex as their project advanced, leading to struggles with GPT models. They expressed concern that these models may be "pushing their limits for programming assistance 😁".
- **GPTs sometimes fail at simple corrections**: Another user pointed out a problem where the GPT could not correct an incorrect math equation despite being prompted, showcasing issues with basic logical consistency in the model.
- **Continuous Learning and Real-time Adjustments**: Discussion involved the idea that making models agentic and capable of continuous learning could be costly and pose regulatory challenges. Continuous learning could also lead to issues with personality drift and potential security risks.
- **Generative AI's current and future impact**: There was debate about the immediate usefulness and future potential of generative AI, with some users highlighting its potential to assist or significantly change job structures, while others were skeptical of its broader economic impacts.
- **Community discussions on AI advances and resource requirements**: Users conversed about the computational power required for training AI models, referencing specific hardware like A100 and H100 GPUs, and speculating on developments with upcoming models like GPT-5.

OpenAI ▷ #gpt-4-discussions (20 messages🔥):

Link mentioned: Tweet from OpenAI (@OpenAI): All users will start to get access to GPT-4o today. In coming weeks we’ll begin rolling out the new voice and vision capabilities we demo’d today to ChatGPT Plus.


OpenAI ▷ #prompt-engineering (6 messages):


OpenAI ▷ #api-discussions (6 messages):


Unsloth AI (Daniel Han) ▷ #general (148 messages🔥🔥):

<ul>
  <li><strong>Gradient Accumulation Insights:</strong> Members discussed how <em>gradient accumulation</em> can help with memory issues and batch size. "It’ll decrease the time compared to small batch size", but gets tricky with larger batch sizes due to memory allocation quirks.</li>
  <li><strong>Addressing CUDA Memory Issues:</strong> <em>"When increasing batch size, the sequences' different lengths slow down the process."</em> Suggested using "gradient accumulation" or "non-power of 2 batch sizes" to mitigate memory spikes.</li>
  <li><strong>Training and Merge Issues:</strong> Members faced issues with <em>merging trained adapters</em> leading to significant performance degradation. There's a call out for effective loading of adapters to continue training without losing efficiency.</li>
  <li><strong>Using Alpaca Prompt for Inferences:</strong> A detailed code snippet was shared for using <em>FastLanguageModel.for_inference</em> with Alpaca-style prompts for generating sequence completions after fine-tuning. This came from [a shared Colab link](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing).</li>
  <li><strong>Excitement Over Qwen2 models:</strong> Enthusiasm about the Qwen2 model release, with members particularly interested in small models (0.5B to 7B) for their ease of training and use. Discussions touched on the promise of "easy to train, easy to iterate, and can run everywhere."</li>
</ul>

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (9 messages🔥):


Unsloth AI (Daniel Han) ▷ #help (54 messages🔥):

- **Feature Request for Lora-Adapter File Handling**: A user expressed the need for an unsloth lora-adapter file conversion process that doesn't require VRAM. They mentioned struggles with saving a ~7GB adapter for llama-3-70b in the current format.
- **Persistent Bug and Faster Inference**: A user detailed a bug causing persistent logging but mentioned that once fixed, it might result in slight performance improvements. "Once it's fixed you might get to claim slightly faster inference, since it won't be printing to console every iteration 😄".
- **Handling CUDA Out of Memory Issues**: Another member shared the usage of `torch.cuda.empty_cache()` to handle GPU memory issues. Inference using lm_head was consuming more memory than expected, leading to a CUDA out-of-memory error.
- **Running gguf models**: There was a discussion on running gguf models using llama-ccp-python, and the lack of support by transformers for running gguf directly. Another user suggested running gguf binaries directly via llama.cpp.
- **RAG System Confusion**: There was confusion about Mistral AI offering a RAG system; it was clarified that while Mistral does not offer RAG, there is [documentation for implementing it](https://docs.mistral.ai/guides/rag/). 

Links mentioned:


Unsloth AI (Daniel Han) ▷ #community-collaboration (2 messages):

Link mentioned: Join the VirtualValleyAI Discord Server!: Check out the VirtualValleyAI community on Discord - hang out with 72 other members and enjoy free voice and text chat.


Stability.ai (Stable Diffusion) ▷ #general-chat (180 messages🔥🔥):

Links mentioned:


LM Studio ▷ #💬-general (64 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (78 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (3 messages):


LM Studio ▷ #⚙-configs-discussion (3 messages):


LM Studio ▷ #🎛-hardware-discussion (26 messages🔥):

Link mentioned: The Story of Snapdragon X Elite: Two lawsuits & a mystery: The Story of Snapdragon X Elite | In this video we will take a look at the exciting history of Qualcomm's new Arm SoC that aims to ...


LM Studio ▷ #🧪-beta-releases-chat (1 messages):


HuggingFace ▷ #general (120 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

qasim_30: There is paper out there "7 billion is all you need"


HuggingFace ▷ #cool-finds (4 messages):

Link mentioned: DAIEF/q-learning-Taxi-v3 · Hugging Face: no description found


HuggingFace ▷ #i-made-this (12 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (4 messages):

Links mentioned:


HuggingFace ▷ #computer-vision (10 messages🔥):


HuggingFace ▷ #NLP (1 messages):


HuggingFace ▷ #diffusion-discussions (7 messages):


Eleuther ▷ #general (104 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (21 messages🔥):

Links mentioned:


Eleuther ▷ #lm-thunderdome (3 messages):


Eleuther ▷ #multimodal-general (1 messages):


Perplexity AI ▷ #general (111 messages🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (5 messages):

Links mentioned:


Perplexity AI ▷ #pplx-api (1 messages):


CUDA MODE ▷ #general (3 messages):


CUDA MODE ▷ #torch (2 messages):

Link mentioned: Function at::_weight_int4pack_mm — PyTorch main documentation: no description found


CUDA MODE ▷ #algorithms (1 messages):

Link mentioned: AI Unplugged 12: MoRA. DPO vs PPO. CoPE Contextual Position Encoding. S3D Self Speculative Decoding.: Insights over Information


CUDA MODE ▷ #cool-links (22 messages🔥):

- **KANs rival MLPs with torch.compile**: A [tweet by Thomas Ahle](https://x.com/thomasahle/status/1798408687981297844) highlighted how torch.compile makes KANs as fast as MLPs, praising the performance improvement. This drew attention and comments from several users surprised and impressed by this claim.
- **Repository on GitHub**: The [GitHub repository](https://github.com/thomasahle/kanmlps) linked in the discussion provides resources for KANs and MLPs. Users are actively compiling and profiling these implementations to understand the performance benefits.
- **Practical profiling experiences**: Users shared their experiences and results while profiling the compiled KANs, noting improvements in speed by 1.5-2 times after compilation. One user mentioned compiling the `.forward` function with significant speed improvements.
- **Concerns over operator fusion and kernels**: There were technical discussions on potential downsides like losing operator fusion and questions about generating Triton kernels. Users are profiling different implementations to verify and compare results, referencing [specific code locations on GitHub](https://github.com/thomasahle/kanmlps/blob/main/models.py#L101).
- **Request for further collaboration**: There was a suggestion to invite Thomas Ahle to join the discussion and share insights about compile testing results. Users are interested in ensuring the implementations match academic papers and seeking verification outputs.

Links mentioned:


CUDA MODE ▷ #pmpp-book (1 messages):

piotr.mazurek: Chapter 4, exercise 9, anyone knows if this is the corrext solution here?


CUDA MODE ▷ #torchao (1 messages):


CUDA MODE ▷ #off-topic (1 messages):

Link mentioned: What kind of bug would make machine learning suddenly 40% worse at NetHack?: One day, a roguelike-playing system just kept biffing it, for celestial reasons.


CUDA MODE ▷ #irl-meetup (1 messages):

Link mentioned: AI_dev Europe 2024 Schedule: Check out the schedule for AI_dev Europe 2024


CUDA MODE ▷ #llmdotc (52 messages🔥):

Links mentioned:


CUDA MODE ▷ #bitnet (1 messages):


CUDA MODE ▷ #arm (3 messages):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ideas-and-feedback (1 messages):


Interconnects (Nathan Lambert) ▷ #news (33 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (12 messages🔥):

Link mentioned: Tweet from Kevin Roose (@kevinroose): Interesting update to the OpenAI whistleblower story: After denying it on the record, Microsoft is now admitting that they tested an early version of GPT-4 in India without the approval of a joint saf...


Interconnects (Nathan Lambert) ▷ #random (4 messages):

Link mentioned: no title found: no description found


Interconnects (Nathan Lambert) ▷ #memes (13 messages🔥):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rl (2 messages):

Link mentioned: Tweet from Arash Ahmadian (@aahmadian_): 🤔Can we explicitly teach LLMs to self-improve using RLHF? Introducing “Self-Improving Robust Preference Optimization” (SRPO) which trains models that are self-improving and robust to eval tasks! w/...


Modular (Mojo 🔥) ▷ #general (7 messages):

Links mentioned:


Modular (Mojo 🔥) ▷ #💬︱twitter (1 messages):

ModularBot: From Modular: https://twitter.com/Modular/status/1798760653806817352


Modular (Mojo 🔥) ▷ #ai (4 messages):


Modular (Mojo 🔥) ▷ #🔥mojo (34 messages🔥):


Modular (Mojo 🔥) ▷ #nightly (11 messages🔥):

Links mentioned:


Latent Space ▷ #ai-general-chat (54 messages🔥):

Links mentioned:


LlamaIndex ▷ #blog (2 messages):


LlamaIndex ▷ #general (43 messages🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (5 messages):


Cohere ▷ #general (45 messages🔥):


Cohere ▷ #announcements (2 messages):

Links mentioned:


Nous Research AI ▷ #off-topic (9 messages🔥):

Links mentioned:


Nous Research AI ▷ #interesting-links (2 messages):

Link mentioned: Hello Qwen2: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: Pretrained and instruction...


Nous Research AI ▷ #general (29 messages🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (1 messages):

quantumalchemy: Hermes pro mistral v0.3 ?


Nous Research AI ▷ #rag-dataset (1 messages):

Link mentioned: Fine-tuning | Mistral AI Large Language Models: Every fine-tuning job comes with a minimum fee of $4, and there's a monthly storage fee of $2 for each model. For more detailed pricing information, please visit our pricing page.


Nous Research AI ▷ #world-sim (2 messages):


OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Link mentioned: Pilot - The co-owner for your Discord server.: Pilot takes the work out of running a server. Get AI-enhanced advice, insights, and more to help you grow and manage your community.


OpenRouter (Alex Atallah) ▷ #general (40 messages🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #일반 (1 messages):

voidnewbie: GLM-4가 한국어를 지원해서 기대됩니다


MLOps @Chipro ▷ #events (6 messages):

Links mentioned:


MLOps @Chipro ▷ #general-ml (23 messages🔥):


OpenAccess AI Collective (axolotl) ▷ #general (17 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (1 messages):

josharian: i just experienced this exact behavior as well.


OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (11 messages🔥):

Links mentioned:


LAION ▷ #general (21 messages🔥):

Link mentioned: Hello Qwen2: GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: Pretrained and instruction...


LangChain AI ▷ #general (14 messages🔥):

Links mentioned:


LangChain AI ▷ #tutorials (3 messages):

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (16 messages🔥):


OpenInterpreter ▷ #general (10 messages🔥):


OpenInterpreter ▷ #O1 (2 messages):


AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (2 messages):


AI Stack Devs (Yoko Li) ▷ #ai-town-dev (3 messages):


AI Stack Devs (Yoko Li) ▷ #local-ai-stack (2 messages):

Link mentioned: Uncensor any LLM with abliteration: no description found


Datasette - LLM (@SimonW) ▷ #llm (6 messages):


Torchtune ▷ #general (3 messages):


Mozilla AI ▷ #llamafile (1 messages):


YAIG (a16z Infra) ▷ #ai-ml (1 messages):

oliver.jack: Weekend listening:

https://youtu.be/4jPg4Se9h5g?si=ULVqGQa6AvI8Ch3o




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}