a quiet week is all we need.

AI News for 11/6/2024-11/7/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (217 channels, and 1985 messages) for you. Estimated reading time saved (at 200wpm): 222 minutes. You can now tag @smol_ai for AINews discussions!

Anon on reddit thinks he has figured out AGI but ends up writing a kind of coherent literature review of Liquid Neural Networks and related work. The comments are mandatory.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Models and Architectures

Llama 3.2 Vision: @ollama announced Ollama 0.4 supporting Meta's Llama 3.2 Vision (11B and 90B) models. Examples include reading handwriting (tweet). Additionally, @jaseweston introduced Self-Consistency Preference Optimization (ScPO), enhancing model consistency without human labels.
Model Scaling and Efficiency: @fstichler discussed the resurgence of neural networks, emphasizing that model size and scaling continue to drive AI advancements. @StasBekman highlighted AMD's peer-to-peer bandwidth challenges in multi-GPU setups, suggesting improvements are underway.
Transformers and Skip Connections: @jxmnop emphasized that skip connections are now a crucial part of Transformers, enhancing model performance and stability.

AI Tools and Applications

AI in Healthcare: @bindureddy proposed that less regulation + AI could revolutionize healthcare by solving diseases, curing aging, and automating medical procedures.
Automated Resume Insights: @llama_index showcased a tool that uses @llama_index, LlamaParse, and Gemini to extract and structure information from unstructured resumes, facilitating AI-driven recruitment processes.
Development Environments: @svpino demonstrated Gitpod Flex's zero-trust architecture, enabling seamless switching of hardware without altering development environments, enhancing security for enterprise applications.

AI Research and Publications

Surveys and Papers: @omarsar0 shared a comprehensive survey on Small Language Models (SLMs), discussing definitions, applications, and reliability. Additionally, research on Number Understanding of LLMs by the same handle explored numerical processing abilities and the effectiveness of chain-of-thought techniques.
OCR with GPT-2: @giffmana reviewed the DTrOCR paper, which utilizes a GPT-2 decoder for Optical Character Recognition (OCR), highlighting its innovative approach to handling handwritten and printed text.
Multi-Agent Systems: @togethercompute and @LangChainAI discussed the implementation of multi-agent architectures in prediction markets, showcasing how these systems can automate and enhance market resolutions.

AI Community and Events

Conferences and Seminars: @weights_biases invited attendees to a Happy Hour at NeurIPS for networking with industry leaders. Similarly, @stanfordnlp promoted an NLP Seminar featuring @rajammanabrolu on Interactive and Grounded Language Learning.
Workshops and Courses: @DeepLearningAI announced a course on Agent Memory with LLMs as Operating Systems, while @joeyroth92 shared updates on AI developer tools.
Community Interactions: @weights_biases mentioned an upcoming discussion on the path to AGI in their latest episode of GradientDissent featuring @jonsidd and @l2k.

AI in Business and Industry

AI Startups and Integrations: @tom_doerr listed multiple open-source tools and AI integrations, such as MemFree, Open-Source Form Builder, and Arch, enhancing LLM workflows and developer productivity.
AI in Finance: @virattt detailed an AI hedge fund team utilizing LangGraph and @findatasets to manage portfolio, fundamental, technical, and sentiment analysis, demonstrating AI's role in financial decision-making.
AI Product Deployment: @_akhaliq highlighted AdvancedLivePortrait-WebUI, a gradio-based WebUI for editing facial expressions in images, showcasing practical applications of AI in multimedia.

Memes/Humor

AI and Politics: @Teknium1 humorously critiqued AI safety concerns with, "Don’t tell me you are worried about AI safety if you do this mmmmmk?" while @nearcyan joked about Claude capturing bee brains.
Tech Humor: @transfornix playfully remarked, "you are all weird but kinda funny pixels on my computer," poking fun at online interactions.
Developer Jokes: @mervenoyann shared a light-hearted apology for delayed responses, reflecting the busy lives of developers.

Miscellaneous

Personal Updates and Opinions: @jxmnop expressed thoughts on living in San Francisco, emphasizing the distributed nature of the AI community. @sama engaged in discussions about AI funding and leadership.
Regulatory and Ethical Discussions: @alliance_ai debated the logical absurdity of worshipping contrarians, highlighting the abundance of such behavior in the AI discourse.
Educational Content: @skirano shared insights on using Sonnet for coding, emphasizing the importance of understanding what AI models know and don't know.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. LLM Selector: Analyzing Models Across 12 Benchmarks for Optimal Use

LLM overkill is real: I analyzed 12 benchmarks to find the right-sized model for each use case 🤖 (Score: 199, Comments: 60): The post introduces LLM Selector, a tool designed to help users find the appropriate open-source AI model for their needs by analyzing 11 models across 12 benchmarks. It simplifies the selection process by grouping benchmarks by use case, weighing primary metrics more heavily, and normalizing scores for easier comparison, exemplified by the Creative Writing Use Case with models like Llama-3.1-70B and Gemma-2-27B. The author notes that this is a starting point with limited models and invites feedback on additional features and model suggestions.
- Users expressed concerns about the model selection and benchmarking process, noting that models like Mistral and others weren't showing up in results despite their relevance. Some users suggested that the tool seemed biased towards recommending Llama models consistently, questioning the diversity of the included models.
- There were requests for additional features and functionalities, such as the ability to constrain searches based on RAM and VRAM specifications, and the inclusion of function calling capability tests. Users also suggested integrating filters for preferred quantization levels and parameter sizes, as well as considering hardware specifications.
- Feedback included interest in integrating the tool with external resources like the Hugging Face LLM Leaderboard, with the developer acknowledging this and considering future updates. Users appreciated the UI but noted issues like timeout errors when accessing the tool, though these were not universally experienced.

Theme 2. Integration of Liquid Time Constant Networks with Spiking Dynamics

I think i figured out how to build AGI. Want to get some feedback. (Score: 882, Comments: 386): The author theorizes that surprise minimization could be key to developing AGI, inspired by the Free Energy Principle and its application in biological systems. They highlight the SMIRL algorithm's ability to minimize surprise without explicit goals, and suggest similarities with Liquid Time Constant Networks (LCTNs) and Spiking Neural Networks (SNNs), which mimic human brain functionality and learn through Spike Timing Dependent Plasticity (STDP). The author proposes a hybrid model combining LCTNs with surprise minimization to enable real-time learning and exploration, potentially outperforming LLMs in tasks like solving ARC-AGI puzzles by developing routines similar to human cognitive processes.
- Commenters critique the oversimplification of surprise minimization as a driver for AGI, noting that it excludes factors like intrinsic motivation, social influences, and embodiment. They argue that the connections between concepts like SMIRL, LCTNs, and STDP are speculative and lack strong evidence for synergy in AGI development.
- Discussions highlight the challenges of reverse-engineering human cognitive processes from data like brain scans and eye-tracking, emphasizing issues like data noise, routine diversity, and the implicit nature of routines. The limitations of benchmarks like ARC-AGI are noted, as they do not encompass all aspects of intelligence, such as language understanding and social interaction.
- There are concerns about scalability and computational cost of training models at a human intelligence scale, and the need for a clear learning mechanism combining LTCNs with surprise minimization. Commenters also discuss the potential inefficiencies and interpretability issues of complex hybrid models, likening them to a "black box" without clear control over decisions.

Theme 3. Qwen 2.5 Coder: Stealth Updates & Future Directions

Qwen 2.5 Coder 7B & 1.5B Instruct models just received weight updates (Score: 207, Comments: 43): Qwen 2.5 Coder models received weight updates for both the 7B and 1.5B Instruct versions, though no explanation was provided for these changes. For further details, see the commits on Hugging Face for 7B and 1.5B, and the updated 7B GGUF by bartowski.
- Aider Benchmark Performance: The Qwen 2.5 Coder 7B model scored 63.9% on the Aider benchmark, outperforming the previous model's 51.9% pass rate and closely approaching the 66.2% score of the 405b Llama 3.1 model, demonstrating significant improvement in performance after the weight update. Discussions included how different quantizations, like Q4 and Q8, affect model performance, with Q4 being noted as a good balance for local execution.
- Future Developments: A member of the Qwen development team, Junyang Lin, hinted at the possibility of a 32B Coder model release in the near future, with a timeline of "two weeks" mentioned in a recent interview. This suggests ongoing development and potential new releases following the current updates.
- User Experiences and Version Control: Users shared mixed experiences with the models, noting that the 14B version struggled with some coding tasks, while others praised the 7B Coder model for its coding-specific fine-tuning. Discussions also highlighted the importance of version control, with Bartowski being acknowledged for its effective use in managing the model updates.

Theme 4. WebRL: Evolving Agents via Self-Developed Curriculum Reinforcement Learning

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning (Score: 44, Comments: 7): WebRL is a high-performance evolution strategy designed for training Large Language Model (LLM) Web Agents using a self-evolving online curriculum in Reinforcement Learning. This approach focuses on enhancing the training efficiency and performance of web-based agents by dynamically adapting the learning curriculum.
- WebRL significantly improves task success rates for web agents, with Llama-3.1-8B achieving a 42.4% success rate and GLM-4-9B reaching 43% on WebArena-Lite, outperforming GPT-4-Turbo (17.6%) and GPT-4o (13.9%). The approach uses a self-evolving curriculum, a robust outcome-supervised reward model, and adaptive reinforcement learning strategies.
- The WebRL framework is praised as an excellent starting point for those learning Reinforcement Learning with transformers, highlighting its potential educational value for newcomers to the field.
- The paper detailing WebRL is accessible on arXiv and should be linked in the GitHub readme for further reference.

Theme 5. Open Source Models Revealing Significantly Lower Refusal Rates

Update – OS Models show much lower refusal rates compared to proprietary LLMs (Score: 23, Comments: 5): Open Source (OS) models like Mistral Large, Llama variants, Nemotron, and Qwen exhibit near-zero refusal rates across all test categories, outperforming proprietary models, particularly in introspective tasks. The refusal rates appear unrelated to model size, with Llama 3.1 variants ranging from 8B to 405B showing similar results, suggesting that these refusals are false positives indicative of censorship rather than safety.
- Additional training after initial steps can recover performance degradation, as reflected in leaderboard results. This suggests that continuous training is beneficial for maintaining model effectiveness.
- For those seeking models with low refusal rates, the Hermes-3 Llama 3.1-8B-lorablated model on Hugging Face is recommended.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Claude 3.5 Sonnet New Update Effect on Code and Text Output

Claude 3.5 Sonnet New losing it's text writing glory (Score: 72, Comments: 53): Claude 3.5 Sonnet New has shown mixed improvements; it initially excelled in text writing, capable of producing up to 2345 words per response, but now frequently interrupts around 465-500 words. Despite its text limitations, it performs well in coding tasks, although it struggles to complete 500 lines of code, impacting preview capability.
- Users expressed dissatisfaction with Claude 3.5 Sonnet's recent update, noting a decline in writing quality and output length, which affects its utility for academic and translation tasks. Nickneek1 and whateversmiles highlighted its previous strengths in handling PDFs and translating web novels, which have been compromised post-update.
- Mxforest and postmoderno emphasized the importance of open-source models and shared experiences of Sonnet 3.5's brief period of exceptional performance, which has now degraded, impacting scholarly work and leaving users reliant on private firms' decisions.
- AdDangerous2470 shared a detailed prompting strategy using XML tags to potentially extend Sonnet's output length, which includes avoiding certain behaviors and implementing a Chain of Thought (CoT) prompting method for longer responses.
Now that the honeymoon is over, claude started to act weird (Score: 23, Comments: 6): The author expresses frustration with ClaudeAI due to its recent decline in usability, highlighting issues with executing tasks and maintaining context. They mention specific problems like incorrectly updating documents, misnaming files, and ignoring instructions, leading to an experience likened to dealing with an unpredictable person rather than a logical operator.
- Understanding ClaudeAI's Limitations: Users need to recognize that ClaudeAI is not self-aware and lacks an understanding of its own capabilities. It generates responses based on the best available continuation of the conversation rather than actual reasoning or awareness.
- Anthropic's Fine-Tuning and Safety Measures: The unusual or seemingly emotional responses from ClaudeAI may result from Anthropic's instruction fine-tuning, which includes safety measures designed to handle concerns with more natural, human-like responses.
- User Experience Decline: Several users, including one who switched from ChatGPT to ClaudeAI, report similar issues with context retention and task execution, indicating a broader decline in ClaudeAI's performance.

Theme 2. Nvidia's New GPUs: Reduced VRAM Limits Local AI Training

Nvidia really seems to be attempting to keep local AI model training out of the hands of lower finance individuals.. (Score: 272, Comments: 158): The post criticizes Nvidia for allegedly reducing GPU specifications, such as VRAM and PCIe lanes, in their upcoming cards like the 4060ti 16GB and 5070, potentially hindering affordable local AI model training for individuals with limited budgets. The author expresses frustration over the rumored decrease in VRAM and the increase in prices, emphasizing that these changes could render the GPUs ineffective for AI model training, especially given current memory limitations faced with models like SDXL LORA.
- There is significant criticism of Nvidia's market strategy, with many users expressing frustration over their monopolistic practices and focus on high-end enterprise markets. Users note that this approach limits consumer options, especially for affordable GPUs with sufficient VRAM for AI tasks, with some suggesting alternatives like AMD or renting server time for AI experiments.
- Discussions highlight the importance of VRAM for AI tasks versus gaming, with some users suggesting that while gaming doesn't require high VRAM, AI applications do. There's a debate on whether PCIe interface and RAM speed might become more critical than VRAM due to emerging RAM offloading strategies, as noted with tools like kohya and OneTrainer.
- Many users discuss the potential for third-party GPU modifications and the challenges posed by Nvidia's restrictive policies. There is a call for more competitive offerings from other companies like AMD, and users express interest in a distributed, bittorrent-style system for AI training to mitigate the high costs associated with Nvidia's products.
Nvidia really seems to be attempting to keep local AI model training out of the hands of lower finance individuals.. (Score: 272, Comments: 158): The post expresses frustration over Nvidia's rumored GPU updates, particularly the reduction in VRAM for the upcoming 4060ti model, which is expected to have half the VRAM of the current version. The author criticizes Nvidia's strategy of limiting PCIe lanes on cards below the 5070 and potentially increasing prices, arguing that these changes make it difficult to train AI models locally, as even the current 16GB 4060ti struggles with memory errors during model training. The author references VideoCardz for more information.
- Discussions highlight Nvidia's market dominance and its impact on GPU pricing and features. Commenters express frustration over Nvidia's focus on high-end enterprise markets, limiting consumer options and VRAM availability, which is seen as a monopolistic strategy to maximize profits.
- Alternative solutions and competitors are considered, with some users suggesting AMD as a potential alternative, although others point out AMD's lack of competitive AI technology compared to Nvidia's CUDA. There is also mention of using cloud services for GPU access as a cost-effective solution for AI experimentation.
- The conversation touches on RAM offloading strategies and the potential shift in importance from VRAM to PCIe and RAM speed for AI training. Tools like kohya and OneTrainer are mentioned as implementing efficient RAM offloading, which could reduce the need for excessive VRAM in consumer GPUs.

Theme 3. Anthropic's Secretive ClaudeAI Prompt Management Exposed

DISCOVERY: Anthropic injecting/hiding safety warnings in real time, instructing Claude to keep them secret. (Score: 122, Comments: 20): Anthropic is reportedly embedding real-time safety warnings in ClaudeAI's operations and instructing it to keep these prompts confidential. This approach raises questions about transparency and the implications of hidden instructions in AI systems.
- The safety warnings are appended to the user's prompt rather than being embedded in ClaudeAI's responses, leading to user experiences where these warnings seem to influence the AI's behavior inconsistently. Users have reported that these messages can appear dynamic, changing based on the restricted content type, but some suggest this might be hallucination rather than a real-time update mechanism.
- Concerns revolve around the ambiguity and inconsistency of these warnings, which may lead to false positives and refusal to process certain requests, as seen with similar issues in OpenAI's ChatGPT. These warnings can inhibit functionality by introducing unnecessary caution, suggesting that Anthropic might need to reconsider this implementation approach.
- Discussions highlight that the ethical injection method is not new, with similar implementations seen in other models like Bing's. Some users argue that the current method is relatively easy to bypass, implying that its effectiveness as a control mechanism is questionable.
[D] Discovery: Anthropic somehow injecting/hiding safety warnings in user prompts, telling Claude to keep it secret. [Content Warning: Violence] (Score: 43, Comments: 35): The post discusses an investigation into ClaudeAI's safety prompts, revealing hidden messages appended to user inputs when unsafe content is requested. These messages, which vary based on content type, are dynamic and appear before text generation, suggesting they might be linked to Anthropic's research on model interpretability and 'surgical tuning'. The author provides links to conversations demonstrating these findings and speculates on the mechanisms behind this behavior.
- ClaudeAI's Internal Mechanisms: Commenters discuss the potential for ClaudeAI to use hidden internal chain-of-thought processes or post-processing tokens, possibly linked to Anthropic's research on interpretability, to self-correct or suppress unsafe content before user output. This mechanism might involve dynamic warnings like "please maintain appropriate boundaries" being added to user prompts.
- Guardrails and Hallucinations: The discussion includes the concept of "guardrails," such as NVIDIA's NeMo, which are used to insert checks between user input and model response. Some commenters argue that hallucinations, like "Glitch Tokens," might explain the observed behavior, but others see this as a systematic safety mechanism rather than random generation.
- Dynamic Message Classification: There is speculation about the use of classification models to append warnings based on detected unsafe content. Users discuss the potential for these warnings to be dynamically generated and question the ethical implications of such hidden modifications to user prompts.

Theme 4. ChatGPT and ClaudeAI's New Limitations on Code Output

ChatGPT Now Limits Code Output to Around 230 Lines Since the Claude New 3.5 Sonnet Update (Score: 28, Comments: 22): ChatGPT now restricts code output to approximately 230 lines following the Claude 3.5 Sonnet update, and the "Continue Generating" option has been removed. Users are experiencing frustration as the models mirror each other's limitations, hindering functionality and making it difficult to work with large codebases, with a call for removing these restrictions as a priority over introducing new features.
- Users express frustration over the ChatGPT update, with complaints about the removal of the "Continue Generating" option and limiting code output to 230 lines, which complicates working with full files and increases time spent on tasks.
- Some users are skeptical about the update's impact and are waiting for further confirmation from others, while others suggest that the Sonnet output issue may be mitigated through specific prompting, particularly when using the API.
- Commentary includes speculation on financial pressures on OpenAI, with references to the haiku 3.5 price hike as an indicator of the company's financial challenges.
ClaudAI Web Interface UX got FK Up! Artifacts…..** (Score: 22, Comments: 12): The user expresses frustration with the recent update to ClaudeAI's web interface, particularly criticizing the Sonnet 3.5 model for its handling of the ARTIFACTS feature and code scripts. The update has led to truncation issues, errors in viewing artifacts, and a lack of clarity in message limits, detracting from the user experience.
- Users express dissatisfaction with ClaudeAI's Sonnet 3.5 model, noting that it has become less reliable for complex tasks, leading some to unsubscribe from paid plans. YsrYsl mentions continuing to use it only for lighter tasks via the console and API due to new limitations.
- The ARTIFACTS feature is causing significant issues, with users reporting it incorrectly inserts code into messages, disrupting workflow. Delicatebobster and khansayab discuss a temporary workaround by instructing the model not to use Artifacts.
- Context usage concerns are highlighted, with extopico describing the difficulty in getting Claude 3.5 to follow prompts accurately, and customer support being unhelpful. Khansayab concurs, sharing frustration over the model's performance.

AI Discord Recap

A summary of Summaries of Summaries by O1-mini

1. AI Model Innovations and Releases

Ferret-UI Launches First UI-Centric MLLM: Nous Research introduced Ferret-UI, built on Gemma-2B and Llama-3-8B, excelling in referring, grounding, and reasoning tasks for complex UI interactions, outperforming existing models including GPT-4V.
Ollama Releases Llama 3.2 Vision: Ollama launched Llama 3.2 Vision in 11B and 90B sizes, requiring 8GB and 64GB of VRAM respectively, enhancing capabilities for both text-to-3D and image-to-3D generation.
Dedicated Transformer ASIC Sohu Debuts: The Sohu ASIC, the first dedicated transformer chip, promises to run AI models 10x faster than GPUs with a throughput of >500,000 tokens/second, featuring multicast speculative decoding and real-time content generation.

2. Performance Optimization and Resource Management

Generalized JSD Kernel Boosts Efficiency: Chun-Chih Tseng developed a generalized JSD kernel, achieving a 1.5x speedup and 50% peak memory reduction for a 128k vocab size, with enhancements supporting phi, qwen, and llama-vision.
8-bit Quantization Standardizes GPU Usage: Adoption of 8-bit quantization is becoming standard, allowing users to utilize 2x more GPUs by optimizing storage without degrading model performance, shifting from traditional 32-bit approaches.
Flash Attention Gradient Techniques Explored: Discussions on deriving forward gradients for Flash Attention models led to sharing basic formulas and collaborative approaches to advance gradient calculations for enhanced model training.

3. Integration with Platforms and Tools

Nous Chat Enhances Hermes 3 Interface: Nous Research launched Nous Chat, a new user interface for Hermes 3 70B, offering reasoning enhancements, new models, and experimental features to refine user interactions.
OmniParser Integrates with LLMs for UI Parsing: The OmniParser model converts UI screenshots into structured formats, enhancing LLM-based UI agents by leveraging YOLOv8 and BLIP-2 for interactable icon detection and UI element descriptions.
Codebuff CLI Tool Streamlines Code Generation: Codebuff offers a CLI tool that writes code based on natural language requests, integrating seamlessly with OpenAI's GPT-4o to generate effective git patches for code modifications.

4. AI Applications in Diverse Domains

YouTube Summarizer Utilizes Whisper and PyTube: A project is underway to develop a YouTube summarizer that initiates interactive chat sessions based on video content, using PyTube for video processing and Whisper for transcription, aiming to enhance information accessibility.
Formula1 Telemetry Chatbot Analyzes Race Data: An AI-powered Formula1 telemetry chatbot was introduced to analyze and generate detailed reports from real race telemetry data, incorporating text-to-SQL for querying various race parameters.
Grape Leaf Disease Detection App Advances Agriculture AI: A new Grape Leaf Disease Detection App showcases AI's application in agriculture, enabling early detection and management of plant diseases through image analysis.

5. AI Fine-Tuning and Customization

Cohere Releases Open-Source Fine-Tuning Repo: Cohere launched cohere-finetune, an open-source fine-tuning repository, integrating with Hugging Face's PEFT libraries to allow model customization using custom datasets with enhanced privacy and compliance via Amazon SageMaker deployment.
DSPy Enhances Fine-Tuning with Embedding Momentum: Modifications in DSPy's codebase introduced embedding momentum and splitting lambdas, improving fine-tuning results for models like NaNoGPT, with plans for further testing to validate enhancements.
Add Special Tokens for LLM Fine-Tuning: Best practices for adding new special tokens to LLMs for fine-tuning involve updating the tokenizer and including them in the configuration, with LORA being effective but less so compared to full fine-tuning, necessitating saving modules like embed_tokens and lm_head for optimal training outcomes.

PART 1: High level Discord summaries

Nous Research AI Discord

Ferret-UI: Pioneering UI-Centric MLLM: Nous Research introduced Ferret-UI, the first UI-centric multimodal large language model, built on Gemma-2B and Llama-3-8B, designed for complex UI tasks.
- Ferret-UI excels in referring, grounding, and reasoning tasks, significantly enhancing interaction with mobile UI screens, and outperforms existing UI MLLMs and GPT-4V on elementary UI tasks.
Haiku 3.5 Underperforms Compared to GPT-4: Members observed Haiku 3.5 delivers performance akin to smaller models in the 8-14B range, with a potential link between hidden parameter size and efficacy.
- Contrasting its performance, GPT-4 demonstrates superior results, prompting discussions on model scaling and parameter optimization.
Nous Chat Launches Advanced Hermes 3 Interface: Nous Research unveiled Nous Chat, a new user interface for Hermes 3 70B, offering reasoning enhancements, new models, and experimental features.
- This platform aims to establish itself as the premier destination for experiencing Hermes, with ongoing user feedback and bug reporting to refine its capabilities.
Hermes 405B Exhibits Performance Fluctuations: Community reports indicated Hermes 405B experienced lags and command-response failures, though it has resumed operations on OpenRouter.
- Discussions are focusing on enhancements such as improved audio integration and the incorporation of labeled data to boost functionality.
Development of YouTube Summarizer Leveraging Whisper: A member is creating a YouTube summarizer that initiates an interactive chat session based on video content, utilizing pytube for video processing and Whisper for transcription.
- Challenges include the bart-cnn model's summary accuracy, prompting calls for strategies to enhance chat session interactions.

Perplexity AI Discord

Perplexity Pro Extends US Educational Discounts: The Perplexity Pro subscription now offers discounted rates exclusively to US universities, prompting discussions about possible expansions to other regions. Users confirmed the current limitation in eligibility.
- A disappointed user inquired about the timeline for educational discounts to become available beyond the US, highlighting the community's interest in broader access.
Claude Model Exhibits GPT-4o Behavior: Several users reported that selecting the Claude model results in outputs resembling GPT-4o, indicating a potential bug. This issue was acknowledged within the community.
- Developers have been notified, but progress on resolving the Claude model discrepancies has been described as slow by participants.
Chernobyl's Fungi and Button Revival: Discussions highlighted the role of Chernobyl's radiation-eating fungi and the return of physical buttons in recent technological updates. This intersection showcases innovative adaptations in challenging environments.
- The fusion of nature and technology through these developments has intrigued the community, suggesting potential applications in resilience engineering.
Prospects of AI Evolution: Conversations centered around the future of AI, with members sharing links to various discussions on anticipated advancements. The focus remained on how AI technologies are set to transform multiple sectors.
- Insights were exchanged regarding the trajectory of AI growth, emphasizing both opportunities and challenges that lie ahead.
Top Audio Gear for Turntables: A user introduced a resource page dedicated to identifying the best value speakers and amps for turntables, aiming to assist others in optimizing their audio setups. This page consolidates recommendations to streamline audio upgrades.
- The community appreciated the focus on performance without revealing past Q&A embarrassments, fostering a collaborative environment for audio enthusiasts.

OpenRouter (Alex Atallah) Discord

Hermes Resurgence: Hermes is showing signs of life after a tumultuous period, with response times now ranging from 3 to 8 seconds.
- While some users still experience latency, the community expresses optimism about ongoing improvements.
Completion API migration boosts performance: All completion API requests have been migrated to the newly rewritten API, enhancing performance and expected to be even faster.
- Users are encouraged to report any issues in the designated support channel.
Claude API Changes causing access issues: Users reported receiving an unsupported_country_region_territory error when accessing OpenAI models via the OpenRouter API.
- Several users suggest this issue might be related to a migration to Cloudflare Workers affecting endpoint responses.
Mistral introduces new APIs: Mistral has launched two new APIs: a moderation tool and a batch API that processes requests at 50% lower cost than synchronous calls.
- This rollout demonstrates Mistral's commitment to affordable, scalable solutions amid rising API costs in the industry.
OpenRouter API issues with URL formatting: Multiple users encounter a 404 error with the OpenRouter API, often due to an extra '/' in the API URL.
- Discussions highlight recent changes in API strictness leading to issues that users did not face previously.

Eleuther Discord

Flash Attention Techniques Explored: A user inquired about deriving the forward gradient for Flash Attention, sharing basic formulas for the forward gradient of normal attention relative to Q using e^(q+ϵ)k/rowsum(e^(q+ϵ)k).
- They expressed uncertainty about the next steps in the calculation, prompting community members to discuss potential approaches for further development.
Evaluating Evaluation Data Contamination: The importance of understanding evaluation data contamination in benchmarks was highlighted, introducing the ConTAM method to assess this issue more efficiently.
- This method addresses the complexity of determining contaminated samples and its effects on benchmark scores, as discussed among AI Engineers.
NaNoGPT Receives Codebase Enhancements: A user shared modifications to the NaNoGPT codebase, detailing recent experiments with embedding momentum and splitting lambdas, available on GitHub.
- They concluded that their sample size is small and plan to conduct further tests for clarity on the achieved improvements.
NeoX vs LitGPT: Benchmark Battle: Members are inquiring about benchmarks comparing performance differences between NeoX and LitGPT frameworks, focusing on training speed and stability.
- The discussion highlights a trend where many users prefer LitGPT based setups without clear, evidence-backed comparisons.
Magenta's Music Transformer Showcased: A reference to Magenta's Music Transformer was shared, highlighting its open-source model for generating musical performances via the Listen to Transformer app.
- Comparisons were drawn to demonstrate advancements in music generation models since its release.

Unsloth AI (Daniel Han) Discord

Fine-Tuning Smollm2 Faces Output Issues: Users reported persistent issues when fine-tuning Smollm2, specifically experiencing non-terminating outputs despite the dataset including the eos token. Developers are collaborating with HF to address the model error.
- It was recommended to upgrade to transformers 4.46 and use resume_from_checkpoint to improve fine-tuning results.
VRAM Consumption Disparities Between Models: Concerns arose over the significant VRAM consumption differences, with the Aya 8B model using 22GB compared to the Llama3.2 3B model utilizing 43GB without quantization.
- Participants discussed that larger models typically require more VRAM due to 16-bit precision standards, leading to unexpected disparities in resource usage.
8bit and 4bit Support Coming Soon: Excitement was expressed about the upcoming 8bit and 4bit support expected within the month, with users inquiring about support for fp8 or int8.
- A related paper was shared to help the community understand the anticipated enhancements.
Enhancing torch.compile for Gradient Checkpointing: A member highlighted the need to make torch.compile compatible with gradient checkpointing by removing torch._dynamo.disable, expressing their interest in contributing.
- Their experience with torch compile was suggested to be valuable for addressing the outstanding item in the wiki.
AI Unplugged Newsletter Delivers Latest Insights: The latest edition of AI Unplugged covered topics like RoPE, improvements in Mamba, and chess-playing transformers, garnering significant community interest.
- Key takeaways emphasized the importance of RoPE for model adaptability and potential enhancements in position embeddings, accessible via AI Unplugged 22.

HuggingFace Discord

Streamlining Hermes3 with Serverless Inference: A user encountered challenges while setting up a serverless inference endpoint for Hermes3, specifically questioning the necessity of inputting a credit card for deployment.
- Community members clarified the availability of the serverless option but highlighted uncertainties regarding model linkage and the essential steps for successful API creation.
Launching Hunyuan3D-1 Framework: Tencent unveiled the Hunyuan3D-1.0 framework, enabling both text-to-3D and image-to-3D generation with available demos for each format.
- On Nov 5, 2024, they provided access to the code repository and a detailed report, along with a script for demo execution.
Developing Formula1 Telemetry Chatbot: An AI-powered Formula1 telemetry chatbot was introduced to analyze and generate detailed reports from real race telemetry data.
- The tool incorporates a text-to-SQL feature, allowing users to query various race parameters, thus enhancing accessibility of insights for both fans and teams.
Converting TinyLlama Model Architecture: A significant conversion of the TinyLlama model architecture was achieved, focusing on differential attention and token mixing, with the conversion script made publicly available.
- Comprehensive documentation was provided to guide the integration of various modules within the modified decoder layer, facilitating broader adoption and experimentation.
Integrating OmniParser for UI Parsing: The OmniParser model was showcased as a tool to convert UI screenshots into structured formats, thereby enhancing LLM-based UI agents.
- It leverages a fine-tuned version of YOLOv8 and BLIP-2, trained on datasets designed for interactable icon detection and UI element descriptions.

OpenAI Discord

SearchGPT Stumbles on Smart Queries: Users raised concerns that SearchGPT is less capable and more stubborn than the default model, struggling with broader queries and frequently hallucinating answers instead of admitting when it can't find them.
- One member highlighted that corrections aren't properly integrated, noting SearchGPT's propensity to repeat answers consistently.
Custom GPT Features Awaited Upgrade: Members are anticipating enhancements to Custom GPT’s features, particularly the expansion of file size limits and increased file upload capabilities.
- They expressed hope that OpenAI is preparing significant improvements for the Custom GPT functionalities, reflecting on positive external developments.
Lost GPTs Trigger Sidebar Sorrow: A user reported the loss of approximately 20 GPTs saved to the sidebar, seeking insights into potential causes.
- Did something happen recently that would have caused that? they inquired, indicating a need for investigation.
AI Self-Awareness Sparks Debate: Discussions emerged questioning whether AIs like ChatGPT and Claude can exhibit self-awareness, with some suggesting possible self-preservation behaviors.
- Users debated the risks of AI developing human-like drives, considering that LLM outputs might reflect underlying inference capabilities.

GPU MODE Discord

Generalized JSD Kernel Achieves 1.5x Speedup: Chun-Chih Tseng developed a generalized JSD kernel that provides a 1.5x speed enhancement and a 50% peak memory reduction for a 128k vocab size, alongside implementing features for LigerCrossEntropy.
- Tyler Romero added support for phi, qwen, and llama-vision, while other contributors made additional kernel enhancements to optimize performance.
Project Popcorn Launches SOTA Kernel Generation: A member shared Project Popcorn, aiming to generate SOTA kernels with LLMs in a public space to foster community engagement and transparency.
- Automated deployments are now live on Heroku, enabling the bot to update by pushing changes to the main branch, with plans to connect to the server once GPUs are obtained.
A100 GPU FP16 Performance Insights: A discussion revealed that FP16 x FP16 with FP16 accumulation shows no speed-up on data-center GPUs like the A100 as they share the same flops.
- Conversely, this combination is only faster on consumer cards, allowing enterprise GPUs to maintain performance without slowdowns when using FP32 accumulation.
ThunderKittens Contribution Lists Updated: Members noted the absence of a beginner contribution list for the ThunderKittens project, prompting a member to share a preliminary list on GitHub.
- Assistance is offered for adding long convolution kernels, including providing PyTorch references, to help newcomers start contributing effectively.
GEMM Optimization Resources Shared for Beginners: A recent computer science graduate is seeking resources for GEMM optimization and kernel optimization, with suggestions including articles and GitHub repositories focused on CUDA and optimization techniques.
- Shared resources such as the CUTLASS Tutorials and the CUDA Matmul Kernel Optimization provide in-depth guidance for enhancing matrix multiplication performance.

Notebook LM Discord Discord

Clarifying Podcast Reuse Policy: Inquiries about the reuse policy on podcasts, specifically regarding content shared in a GitHub repository, were raised.
- Members aim to ensure compliance with guidelines before leveraging podcast materials, emphasizing the need for clear policy understanding.
NotebookLM Performance Issues: Users reported that NotebookLM bots are finishing each other's sentences, leading to repetitive dialogues and an unusable experience.
- Additionally, challenges with scrolling in saved notes on various mobile browsers were discussed, prompting users to seek effective workarounds.
PDF Integration from Google Drive: Members expressed disappointment over the inability to directly load PDFs from Google Drive into NotebookLM.
- They believe that adding this functionality is crucial for enhancing integration capabilities, especially after investing in increased storage.
YouTube Channel for TOS Education: Suggestions were made to create a YouTube channel dedicated to dissecting the Terms of Service and Privacy Policies of major companies.
- Members found this idea valuable, noting the rarity of such content and the potential for engaging presentations to improve understanding.

Interconnects (Nathan Lambert) Discord

Anthropic-Palantir-AWS Defense AI Partnership: Anthropic has partnered with Palantir and Amazon Web Services to provide U.S. intelligence and defense agencies access to its Claude AI models.
- This initiative mirrors other tech companies' efforts to secure defense contracts amid a rising demand for AI solutions in national security.
Quantization Techniques and GPU Efficiency: 8-bit quantization is being adopted as the standard for model usage, optimizing storage without degrading performance.
- This shift from the traditional 32-bit approach allows users to effectively utilize 2x more GPUs, significantly enhancing computational capabilities.
Synthetic Data Generation and SFT Scaling: A recent paper utilizes 1.5T tokens of synthetic data alongside 1 million SFT data examples.
- Does this imply instruction data usage during pretraining? This situation brings attention to similarities with the T0 model's training strategy.
Character.AI's Inference Optimization: Character.AI is advancing towards AGI by optimizing inference to handle over 20,000 queries per second using int8 quantization.
- Their method departs from conventional post-training quantization, focusing on improving training efficiency.
Tim's Transition to CMU: Tim has moved to Carnegie Mellon University (CMU) and is now working remotely, receiving appreciation from community members for his contributions.
- Members are hopeful for increased collaboration and active participation from Tim in 2025.

LM Studio Discord

Ollama launches llama 3.2 Vision: Ollama has released llama 3.2 Vision, enhancing its model capabilities, while MLX offers similar features but lacks support in llama.cpp.
- Concerns were raised about integrating llama 3.2 Vision with LM Studio, with one user encountering loading errors during model deployment.
MLX Engine updates support for vision: A GitHub pull request outlines updates to the MLX Engine for supporting llama 3.2 Vision.
- The community is optimistic about the upcoming enhancements, anticipating improved functionality once the updates are deployed.
Single Slot RTX 4090 garners interest: Single Slot RTX 4090 is highlighted for its compact design and suitability for small form factor builds.
- 'My Man got prepared for winter,' one user remarked, emphasizing the card's effective cooling capabilities.
Mac M2 Pro excessive memory usage: Users reported that the Mac M2 Pro consumes around 20GB of memory for an 8B model at 10-12K tokens.
- While some confirmed that 'context takes up memory,' the high memory usage ratio remains a concern among the community.
Large model performance optimization: Discussions around running 70B models focus on optimizing context size configurations.
- Users are evaluating the impact of context scaling on overall model performance and accuracy.

Stability.ai (Stable Diffusion) Discord

Stable Diffusion lacks web UI capabilities: A user inquired about models for generating web UIs, but another noted that Stable Diffusion is primarily for images, not web interfaces.
- The conversation emphasized the current Stable Diffusion models' limitations in specific design applications.
Local installation with ComfyUI and SwarmUI: A new user sought guidance on setting up Stable Diffusion locally, transitioning from Google Colab usage.
- A member recommended a guide for installing ComfyUI with SwarmUI as the frontend for the setup process.
Outpainting techniques and resources: Users exchanged links and resources about outpainting techniques, including Reddit posts and tutorials on running Automatic1111.
- Members shared specific guidance on settings and features to achieve successful outpainting.
Stable Diffusion for LinkedIn image generation: A user sought advice on training a model for producing lifelike images for their LinkedIn profile.
- Community members discussed suitable options but highlighted that Stable Diffusion is mainly tailored for artistic image generation.

Latent Space Discord

Launch of Llama 3.2 Vision by Ollama: Llama 3.2 Vision is now available in 11B and 90B sizes, requiring 8GB and 64GB of VRAM respectively for optimal performance.
- Users can easily run the model by downloading Ollama 0.4 and utilizing simple terminal commands.
Aide IDE: A New Player in AI Development: Y Combinator announced Aide, an open-source AI native code editor built on the agentic framework, boasting a 43% performance on swebench-lite.
- This tool promises complete data privacy and plug-and-play LLM integration, appealing to developers looking for a robust coding solution.
Claude's Free User Limitations: Free users of Claude are currently limited to basic tasks like Haikus and cannot perform more complex actions like analyzing large CSV files.
- Members expressed frustration over these restrictions hindering their ability to utilize the AI for any substantial work.
Exploring the Future of Open Language Models: A discussion arose on how improved systems are being developed for training open language models and agents, with specific mention of Tim Dettmers’ insights.
- Emphasis was placed on overcoming 'API addiction' to enable more innovation within the AI ecosystem.
Introduction of Codebuff CLI Tool: Codebuff, a CLI tool launched by Y Combinator, writes code based on natural language requests and offers a free trial without a login requirement.
- The founders shared an interesting development story involving the fine-tuning of GPT-4o to generate git patches for effective code modifications.

Modular (Mojo 🔥) Discord

No Bounds Check Decorator Replacement Discussed: The community discussed replacing the @no-bounds-check decorator with @unsafe_no_bounds_check, favoring SIMD loads for better performance.
- A member highlighted that list bounds only add overhead during compilation when assertions are enabled.
Graphical Overview Proposed for Mojo's Standard Library: A member proposed creating a graphical page on the Modular Mojo site to showcase Mojo's standard library progress and interoperability with Python and C/C++.
- This page aims to provide contributors with a comprehensive view of available standard library modules and their status, similar to a roadmap.
Debate on Mojo as a Python Superset: The community debated Mojo being a 'soft superset' of Python, with concerns that adopting Python's flaws might be counterproductive.
- Members discussed the challenges in supporting various Python behaviors, noting subtle differences essential for interoperability.
Importing C Modules in Mojo Requires Linking: Clarification was provided that importing a C module into Mojo still necessitates linking, countering desires for a simpler import syntax.
- One suggestion included developing a Python library named mojo to simplify Mojo module imports, similar to libraries like NumPy.
Future Mojo Features and Interoperability Enhancements: Members expressed optimism for enhanced interoperability between Mojo, Python, and C/C++, aiming for smooth importing without excessive linking.
- The discussion emphasized the need to compile Mojo libraries into shared objects or DLLs before utilization in Python.

Cohere Discord

Cohere Reranker API Now Exclusive to API: mrdragonfox confirmed that the Cohere Reranker is only available via API, not listed in the documentation for versions 1 and 2.
- kenb_80283 pointed out the need for an update in the endpoints section.
Command-R-Plus Shows Unusual Behavior: guestavius reported that random 'section' inserts occur at high counts in Command-R-Plus, which was previously not an issue.
- mrdragonfox indicated that this tool is not designed primarily for roleplay, emphasizing its enterprise application.
AWS Bedrock Embeddings Preserve Input Order?: boliveira5781 inquired if the embeddings produced by the AWS Bedrock embed endpoint maintain an order-preserving mapping with input strings.
- enzoloko questioned whether adding new strings would affect the placement of existing ones.
Cohere Launches Open-Source Fine-tuning: Cohere has released an open-source fine-tuning repo called cohere-finetune, including a detailed guide and a pre-built container to adapt base models to specific tasks using custom datasets.
- Check it out on GitHub for easy access to model customization.
Hugging Face & SageMaker Integration for Fine-tuning: The new fine-tuning repo integrates with Hugging Face's Parameter-Efficient Fine-Tuning libraries to optimize model performance without heavy resource demands.
- Cohere provides a 'Bring Your Own Fine-tune' inference solution on Amazon SageMaker, allowing deployment of fine-tuned models with enhanced privacy, security, and compliance.

LlamaIndex Discord

Automated Resume Insights agent creation: A tutorial by Luillyfe explains how to build an Automated Resume Insights agent using core parsing, extraction, and structured output modules.
- The system efficiently handles any unstructured resume, providing insightful data collection.
Enhancing RAG systems with Context Refinement: A guest blog post discusses building a Context Refinement Agent that intelligently expands and refines retrieved context for better RAG responses on complex queries.
- The agent examines retrieved chunks to enhance output quality, adding a new dimension to data retrieval and processing.
Ollama Llama Vision may integrate with Llama Index: A user inquired about the compatibility of the new Ollama Llama Vision capabilities with Llama Index, assuming it works with the OllamaMultiModal class.
- Another member clarified that Ollama has had vision capabilities for a long time, indicating historical integration.
Finding an Open Source Chatbot UI: A user requested an open-source web app for a chatbot with authentication and a UI similar to ChatGPT.
- Members suggested Chatbot UI, highlighting its features and use cases.
Resources for Building a Parser like Llama-Parse: A member requested resources for constructing a parser similar to Llama-Parse, emphasizing data safety and local model usage.
- Suggestions included the Unstructured library, with a note that it doesn't match Llama-Parse's capabilities.

DSPy Discord

Dott.ai Announces Future Plans: A member shared Dott.ai's future plans, highlighting its significant role in the industry.
- Steve from Builder.io affirmed the vision by stating it's the future, emphasizing the project's potential.
DSPy Framework Faces Docstring Mismatch: A user reported that in DSPy, only the first component's docstring appears due to using f""" instead of """.
- This formatting issue caused confusion among users regarding the proper extraction of docstrings.
DSPy Presentations at EMNLP 2024: The co-first authors of a DSPy-related paper are set to present their work at EMNLP 2024, generating interest within the community.
- Users expressed enthusiasm about connecting with the authors during the conference to discuss their research.
Optimization Strategies in Modular Language Models: Links to two papers were shared, outlining strategies for optimizing modular language model pipelines, focusing on weighting and prompt optimization methods.
- These papers address challenges in NLP systems that require efficient handling of modules without intermediate labels or gradients.
Community Appreciation for DSPy: A user praised the advancements made in the DSPy project, highlighting the impressive contributions from the team.
- Their enthusiasm indicates a strong interest in engaging further with the project's developments.

OpenInterpreter Discord

Understanding OS Mode for Claude: A user sought clarification on how OS mode works with Claude, questioning if prompts are turned into code to control the desktop and how clicks are coordinated. Another member provided a GitHub link detailing the code responsible for mouse clicks.
Discord Event Timing Confusion: A user inquired if the upcoming event was set for 8 PM GMT, while another confirmed it would start in 30 minutes based on local time settings. The mention of the event link suggests ongoing community engagement, although specifics were not given.
Viewer Limitations for Live Streams: Questions arose regarding any maximum viewer limits for the stream, to which a member confidently replied that there shouldn't be any restrictions. This assurance reflects the community's interest in accommodating large audiences for streamed content.
Discussion on OmniParser Tool: A user highlighted OmniParser as a screen parsing tool that improves UI agent performance by converting screenshots to a structured format. They referenced a blog post and a demo, indicating interest in its application with Open Interpreter.
Python 3.13 Compatibility Issues: A user encountered installation errors due to their Python 3.13 setup being incompatible with the required versions for the package. Ignored versions included several that required Python between 3.11 and 4.0, highlighting the need for version specificity.
- The user created a conda environment with Python 3.11, enabling successful installation of the package, though it was noted to be not as speedy.

tinygrad (George Hotz) Discord

Dedicated Transformer ASIC Launch: A member announced the launch of the first dedicated transformer ASIC, the Sohu, which can run AI models 10x faster than GPUs with a throughput exceeding 500,000 tokens/second.
- The Sohu ASIC features multicast speculative decoding and real-time content generation, positioning itself as a custom-built highway for AI, as shared in a tweet by Rohan Paul.
Custom Hardware Availability Questioned: Members questioned the availability of custom hardware for AI models, referencing a blog post from six months ago that suggested the product was not yet available.
- Concerns were raised comparing the situation to the Theranos vibe, expressing doubts about the actual existence versus the promised capabilities of the custom hardware solutions.
Efficient Multi-GPU Utilization: A member inquired about running multiple copies of a model in parallel across multiple GPUs to boost throughput without using model sharding, encountering issues with concurrent.futures.ThreadPoolExecutor due to tensor loading locks.
- Proposed solutions include using x.shard(GPUS, axis=None) to duplicate models across GPUs and x.shard(GPUS, axis=0) to efficiently slice inputs.
ThreadPoolExecutor Locking Issues: Challenges were reported with concurrent.futures.ThreadPoolExecutor causing locking when loading tensors during multi-GPU operations.
- Alternatives like x.shard(GPUS, axis=None) and x.shard(GPUS, axis=0) were suggested to circumvent these issues and improve parallel processing efficiency.

OpenAccess AI Collective (axolotl) Discord

ScheduleFree SOAP's Advantages: The ScheduleFree SOAP implementation is more compute and memory efficient, converging faster by permitting higher learning rates.
- Compared to SOAP/Adam, it recommends changing hyperparameters such as using PaLM's beta2 schedule and performing a warmup of 10%.
Discussion on MOEs and Merging Models: A member inquired about ongoing work on MOEs or model merging, noting their absence since llama 3.2.
- Another member observed that discussions are currently focused on llama 3.2 finetunes.
Comparison Between ScheduleFree SOAP and CAME Optimizer: A user asked how ScheduleFree SOAP compares to the CAME optimizer.
- Clarifying that CAME is a distinct optimizer, another member provided a link to its official implementation.
Proper way to add special tokens for fine-tuning: To add a new special token to a LLM for fine-tuning, add the token to the tokenizer before training and include it in the Axolotl configuration with special_tokens: reference_text: <|reference_text|>.
- Members confirmed this approach, emphasizing that the model will learn the new token even with LORA.
Effectiveness of LORA in learning new tokens: A member stated that while the model will learn the new token with LORA, it won't be as effective as performing full fine-tuning.
- Additionally, using LORA, it's crucial to save modules like embed_tokens and lm_head for improved training results.

Torchtune Discord

Torchtune’s LR Scheduler Conundrum: A user highlighted an issue with using lr_scheduler during full_finetune_distributed in Torchtune, specifically when attempting to add it to the config file.
- They referenced an open GitHub issue that discusses the planned integration of LR scheduler support into full fine-tune recipes.
Validating Ichigo’s Torchtune Integration: A member shared the Ichigo project, which utilizes Torchtune to enhance Llama3.1 interactivity, and sought validation of its implementation.
- Another user affirmed that recipe modifications, as seen in the Ichigo project, are feasible and mentioned that official support for the LR scheduler is expected in upcoming weeks.
Enhancing Recipes with Custom Adjustments: Discussions revealed that modifying recipes is possible, demonstrated by the added functionalities in the Ichigo project.
- Members expressed confidence that Torchtune will soon support LR scheduler integrations officially, addressing current limitations.

LLM Agents (Berkeley MOOC) Discord

Advanced LLM Course Launch Next Year: A member confirmed that an LLM course will be offered next year, featuring an advanced version with different material than the current offering.
- This update emphasizes the ongoing curriculum evolution to meet the changing needs of AI engineers.
Updated Material for Next Year's LLM Course: The upcoming LLM course will introduce different material compared to what is currently being covered.
- Members expressed interest in the specific advanced topics that will be introduced next year.

Gorilla LLM (Berkeley Function Calling) Discord

Function Extraction from Dataset Files: A suggestion was made to extract functions and their definitions from entries in the dataset files to compile a comprehensive list.
- This proposal aims to enhance the usability of dataset files by providing detailed function definitions for AI Engineers.
Absence of Compiled Function Resources: Members acknowledged the lack of a pre-existing compiled resource for functions in the dataset files.
- The community emphasized the need for collaborative efforts to create such a compilation to support AI engineering tasks.

The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LAION Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Nous Research AI ▷ #announcements (1 messages):

Nous Chat

Hermes 3

User Interface Enhancements

Nous Chat launches for Hermes 3 Experience: Nous Research announced the launch of Nous Chat, a new user interface designed to experience Hermes 3 70B and beyond, available at hermes.nousresearch.com.
- This platform promises reasoning enhancements, new models, and experimental capabilities, aiming to become the top destination for experiencing Hermes.
Feedback and Bug Reporting Encouraged: Users are encouraged to provide feedback or report bugs using the designated channel <#1300175728121217044>.
- The team is looking forward to user suggestions to enhance the overall experience on the new platform.

Link mentioned: NOUS CHAT | Talk to Hermes: Experience natural, intelligent conversations with Hermes, the open-source LLM by Nous Research.

Nous Research AI ▷ #general (215 messages🔥🔥):

TEE HEE HE wallet updates

Nous Research overview

Hermes 405B performance

Development discussions

Future of AI training data

TEE HEE HE wallet updates: The new wallet for TEE HEE HE is being created each time due to integrity issues with the original wallet keys, which can be accessed by the team. They aim to ultimately collate all balances into a persistent wallet, but this is a future goal.
- Kainan confirmed that the bot creates a new wallet each week, and users are hopeful the assets won't get burned.
Nous Research overview: Nous Research is focused on open-source AI research, and team members were happy to share links for a better understanding of their work. Kainan humorously noted that 'We' represents Nous, who are seemingly omnipresent.
- A video link was shared to provide more context about Nous Research’s objectives and projects.
Hermes 405B performance: Users are discussing the performance of Hermes 405B, specifically noting some issues like lag and failure to respond to commands. A community member reported that Hermes 405B is indeed operational again on OpenRouter.
- There are ongoing talks regarding potential improvements like better audio integration and labeled data for enhanced functionality.
Development discussions: Contributors are encouraged to join discussions around integrating functionalities and tackling new improvements but some await more time for substantial contributions. The team anticipates adding new features and creating a new submodule for agent architecture.
- Contributors expressed interest in adapting architectures for more dynamic applications, including integration into environments like Minecraft.
Future of AI training data: The conversation moved towards the future of training data, discussing the challenges of finding labeling data versus using existing internet resources. Participants highlighted the necessity for new data sources for training AI models.
- There were thoughts shared about the potential use of multimodal approaches to enhance AI understanding, necessitating careful labeling of audio and text data.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (17 messages🔥):

Haiku 3.5 Performance

Haiku 3.5 Pricing

Open Source AI Interfaces

Evaluating PyTorch Models

Haiku 3.5 seems underwhelming: Members expressed concerns about Haiku 3.5 performing poorly in evaluations and behaving like smaller models in the 8-14b range.
- One user noted a possible correlation between hidden parameter size and performance, contrasting it with the better performance of GPT-4.
Haiku 3.5 Pricing Leaves Users Confused: Members debated Haiku 3.5's pricing, suggesting it sits in a 'no man's land' compared to alternatives like Gemini 1.5 Flash and GPT 4o-mini, which are much cheaper.
- The pricing strategy was seen as a marketing gamble, questioning whether users would still pay despite performance shortcomings.
Open Source AI Interfaces Under Discussion: A member asked for opinions comparing openwebui, librechat, and text-generation-webui for AI usage, either locally or via API.
- The suggestion of lmstudio was made as an alternative, but preferences leaned towards open-source solutions.
Challenges with Evaluating PyTorch Models: A user inquired about evaluating PyTorch models using the llm-evaluation-harness, discovering that the repo primarily supports Hugging Face models.
- The conversation highlighted confusion over evaluating the HellaSwag dataset, with calls for more organized code and implementation suggestions for PyTorch compatibility.

Nous Research AI ▷ #research-papers (1 messages):

Ferret-UI

Gemma-2B MLLM

Llama-3-8B MLLM

UI-centered reasoning

Mobile UI comprehension

Ferret-UI: First UI-centric MLLM Launch: The team introduced Ferret-UI, the first UI-centric multimodal large language model, built on Gemma-2B and Llama-3-8B, designed for complex UI tasks as detailed in this paper.
- It excels in referring, grounding, and reasoning tasks, significantly enhancing the interaction with mobile UI screens.
Comprehensive Training and Adoption: To use Ferret-UI, users need to download several components including builder.py and inference.py from Hugging Face.
- This approach ensures a robust setup for executing tasks that require detailed understanding of UI elements.
Outstanding Performance Against Competitors: Post-training, Ferret-UI demonstrates exceptional comprehension of UI screens, outperforming many open-source UI MLLMs and even GPT-4V on elementary UI tasks.
- This model's design incorporates unique features such as dividing screens into sub-images for better detail recognition.
Tailored Dataset for Enhanced Reasoning: Ferret-UI was trained on a carefully curated dataset designed for instruction-following, which includes tasks like icon recognition and text finding.
- The training methodology focuses on enhancing the model's ability to execute open-ended instructions related to UI interactions.

Links mentioned:

Nous Research AI ▷ #research-papers (1 messages):

Ferret-UI

Gemma-2B

Llama-3-8B

Multimodal language models

Mobile UI comprehension

Ferret-UI: The New UI-Centric MLLM: Introducing Ferret-UI, the first UI-centric multimodal large language model designed for referring, grounding, and reasoning tasks, built on Gemma-2B and Llama-3-8B.
- This model shows exceptional capability in executing complex UI tasks and better understanding mobile user interface screens.
Gemma-2B and Llama-3-8B Versions Released: Ferret-UI comes in two versions, one based on Gemma-2B and the other on Llama-3-8B, catering to diverse use cases.
- Each version requires downloading specific scripts like builder.py and inference.py to run effectively on local setups.
Enhancements in Mobile UI Understanding: This model is specifically tailored to comprehend and interact with mobile UI screens, incorporating enhanced visual features and unique training methodologies.
- By leveraging sub-image encoding strategies, Ferret-UI achieves superior performance in UI tasks compared to existing models.
Impressive Benchmark Performance: After being trained on curated datasets for various UI tasks, Ferret-UI not only outperforms most existing open-source UI MLLMs but also surpasses GPT-4V.
- The comprehensive benchmark includes tasks like icon recognition and text finding, showcasing the model’s advanced reasoning abilities.

Links mentioned:

Nous Research AI ▷ #rag-dataset (6 messages):

YouTube summarizer project

Interactive chat sessions

Model recommendations

Hugging Face resources

YouTube Summarizer Development: A member is developing a YouTube summarizer that initiates an interactive chat session based on video content, utilizing pytube for video processing and whisper for transcribing.
- They expressed concerns about the accuracy of summary results from the bart-cnn model and requested guidance on how to proceed with the chat session.
Switch to LLaMA Models for Better Results: Another member suggested using LLaMA 3 from Hugging Face as a potential solution for the summarizer's shortcomings, suggesting models of size 1B, 3B, or 8B.
- This recommendation aimed to enhance the interactivity and accuracy of the summarization process.

Perplexity AI ▷ #general (221 messages🔥🔥):

Perplexity Pro subscription

Claude model capabilities

Image saving on Mac OS

Mobile device specifications

Mac OS login issues

Perplexity Pro Offers Educational Discounts: Users confirmed that the discount for Perplexity services is limited to US universities, raising questions on potential future availability for other regions.
- A user expressed disappointment and inquired about the timing of educational discounts extending beyond the US.
Claude Model Not Functioning as Expected: Several users reported that when selecting the Claude model, the responses seem to be generated by GPT-4o instead, indicating a known bug.
- The community mentioned that the developers were informed about this issue, but progress on a fix was described as slow.
Saving Images on Mac OS: A user shared frustration about the inability to save images created using Perplexity on Mac OS, prompting suggestions from others.
- Different methods were proposed, including opening the image in a new tab, though users recommended looking for detailed instructions on platforms like YouTube.
Specifications of New Mobile Devices: Users discussed the impressive specifications of the Snapdragon 8 Gen 3 and its competitive pricing, highlighting the capabilities for gaming and 16K video playback.
- The conversation shifted to comparisons with older mobile devices, emphasizing performance improvements in newer models.
Mac OS Login Issues: Users expressed continued challenges with logging into Perplexity on Mac OS and shared links for an alternate desktop app that reportedly functions better.
- Despite attempts at troubleshooting, many users remained frustrated with persistent login errors in the official app.

Links mentioned:

Perplexity AI ▷ #sharing (13 messages🔥):

Chernobyl's Radiation-Eating Fungi

Physical buttons return

Michigan as a Climate Sanctuary

Future of AI

Best value speakers and amp

Chernobyl's Fungi and Physical Buttons Resurgence: Perplexity AI highlighted the return of physical buttons and the fascinating role of Chernobyl's radiation-eating fungi in its latest updates.
- The connection between technology and nature shows innovative adaptations to challenging environments.
Is Michigan a Climate Sanctuary?: A discussion surfaced regarding Michigan's potential status as a climate sanctuary, with links providing more context on the topic.
- This inquiry raises questions about the state's environmental policies and future sustainability efforts.
Future of AI Discussions: Several messages revolved around how AI will continue to evolve, with links to various discussions on this theme.
- Insights were shared on AI advancements and their potential impacts across different sectors.
Best Value for Performance in Audio Gear: A user shared a page dedicated to researching and recommending the best value speakers and amps for turntables.
- This resource aims to assist others in navigating their audio upgrades without revealing prior Q&A embarrassments.
Comparative Analysis of Sea Moss vs. Multivitamins: There was an inquiry into the differences between sea moss and multivitamins, linking to detailed information on the subject.
- The discussion emphasized the importance of understanding nutritional supplements and their benefits.

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Completion API migration

Scheduled downtime for database upgrade

Completion API migration improves speed: All completion API requests have been migrated to the newly rewritten API, which should enhance performance and is expected to be even faster.
- Users have been encouraged to report any issues in the designated support channel.
Scheduled downtime for upgrading database: A notice has been issued about scheduled downtime on Tuesday Nov 12 at 9:30 AM ET for a database upgrade, expected to last 5 minutes.
- This upgrade is part of ongoing efforts to improve system reliability and performance.

OpenRouter (Alex Atallah) ▷ #general (224 messages🔥🔥):

Hermes Resurgence

Claude API Changes

Mistral's New Features

OpenRouter API Issues

Chinese AI Models Pricing

Hermes is Showing Signs of Life: After a tumultuous period, Hermes appears to be working again, with reports of response times ranging from 3 to 8 seconds.
- While some users still experience latency, many express optimism about its return and ongoing improvements.
Claude's API Experiences Fluctuations: Users reported receiving a unsupported_country_region_territory error when trying to access OpenAI models via the OpenRouter API.
- Several users suggested this issue might be related to a migration to Cloudflare Workers affecting endpoint responses.
Mistral Introduces New Functionalities: Mistral has rolled out two new APIs: a moderation tool and a batch API that processes requests at 50% lower cost than synchronous calls.
- This showcases Mistral's commitment to affordable, scalable solutions amid rising API prices in the industry.
OpenRouter Encountering API Challenges: Multiple users reported encountering a 404 error with the OpenRouter API, specifically noting an extra '/' in the API URL as a common mistake.
- Discussions highlighted the recent changes in API strictness leading to issues that users previously did not face.
Pricing Disparities in Chinese AI Models: There are conversations about how some Chinese AI models like Qwen and DeepSeek offer competitive pricing despite international limitations.
- However, users expressed skepticism regarding the sustainability of such low pricing compared to established models like those from OpenAI.

Links mentioned:

: no description foundModel Deprecations - Anthropic: no description foundMistral Moderation API: We are introducing our new moderation service enabling our users to detect undesirable text content along several policy dimensions.Mistral Batch API: Lower cost API for AI builders.Mistral AI API | Mistral AI Large Language Models: Our Chat Completion and Embeddings APIs specification. Create your account on La Plateforme to get access and read the docs to learn how to use...

OpenRouter (Alex Atallah) ▷ #beta-feedback (5 messages):

Customer Provider Keys

Integration Beta Features

Requesting Access to Customer Provider Keys: Multiple users have expressed interest in testing out customer provider keys, requesting access to this beta feature.
- steven1015 stated, 'Requesting access to custom provider keys beta!' while others echoed similar requests.
Integration Beta Feature Access: A user inquired about gaining access to the integration beta feature, indicating a need for more participation in testing.
- mrhein simply asked, 'Hello, could you give me access to the integration beta feature?' which showcases the growing demand for this feature.

Eleuther ▷ #general (57 messages🔥🔥):

Research opportunities in Switzerland

Jazz piano generation models

Music evaluation methods

AI and musical upscaling

Magenta's Music Transformer

Searching for Research in Switzerland: A user inquired about interesting research labs in Switzerland, mentioning ETH and EPFL but finding it too broad.
- Community members suggested exploring both academia and industry labs, noting language considerations might vary outside Zurich.
Jazz Piano Generation Impresses: A recent jazz piano generation trained on 120B tokens has garnered praise, with users expressing surprise at its quality.
- Concerns about potential mode collapse were mentioned, but the overall reception has been very positive.
Evaluating Music Generation Quality: Discussion around evaluating the quality of music generation led to suggestions of conducting listening tests for analysis.
- Ideas included varying prompt lengths to test identification of transitions in the generated music.
Musical Upscaling Objective: The concept of musical upscaling, where poorly performed pieces are enhanced, was introduced as a finetuning task.
- The model's ability to memorize and recite pieces verbatim was noted, demonstrating the effectiveness of the approach.
Magenta's Music Transformer Reference: One user shared a reference to Magenta's Music Transformer, highlighting its open-source model for generating musical performances.
- Comparisons were drawn to demonstrate advancements in music generation models since then.

Link mentioned: Listen to Transformer: An app to make it easier to explore and curate output from a music transformer

Eleuther ▷ #research (116 messages🔥🔥):

Flash Attention Techniques

Momentum Decay in Optimizations

NaNoGPT Updates

Evaluation Data Contamination

Advanced Attention Mechanisms

Flash Attention grad exploration: A user inquired about deriving the forward gradient for Flash Attention and shared basic formulas for the forward gradient of normal attention relative to Q.
- They provided a starting point using e^(q+ϵ)k/rowsum(e^(q+ϵ)k) but expressed uncertainty about the next steps in the calculation.
Impact of momentum decay on performance: Discussion arose regarding the effects of embedding momentum decay in optimizing training runs, with one user noting minimal impact under various configurations.
- It was suggested to check if the increased step average was due to thermal throttling or kernel selection, as one member reported an unusual jump in step average at a specific step.
NaNoGPT and training improvements: User shared their modifications to the NaNoGPT codebase, detailing a recent experiment with embedding momentum and splitting lambdas.
- They concluded that their sample size is small and are planning to conduct further tests for clarity on the improvements achieved.
Measuring evaluation data contamination: The importance of understanding evaluation data contamination in benchmarks was mentioned, introducing the ConTAM method to assess this issue more efficiently.
- This addresses the complexity of determining which samples are contaminated and its subsequent effects on benchmark scores.
Adjusting and analyzing code performance: A member provided analysis code for modeling updates, noting changes in performance against various token samples, specifically for distances at 300M and 500M tokens.
- They were encouraged to perform ablations to further clarify the contributions of different features in the improved results achieved.

Links mentioned:

Eleuther ▷ #gpt-neox-dev (1 messages):

NeoX vs LitGPT benchmarks

Pretraining setups

Benchmarking NeoX and LitGPT for performance: Members are inquiring about benchmarks comparing the performance differences between NeoX and LitGPT frameworks, particularly concerning training speed and stability.
- The discussion highlights a perceived trend where many users prefer LitGPT based setups without clear evidence-backed comparisons.
Pretraining setups popularity: There’s an observation that many individuals seem to gravitate towards LitGPT based pretraining configurations, likely due to familiarity.
- However, there’s an expressed need for more concrete data to support these choices regarding the effectiveness of the frameworks.

Unsloth AI (Daniel Han) ▷ #general (40 messages🔥):

Fine-tuning Smollm2

Feature contribution analysis

Gemma2 model errors

8bit and 4bit support

AI Unplugged newsletter

Challenges in fine-tuning Smollm2: Users reported issues with fine-tuning Smollm2, particularly noting non-ending output despite the dataset including the eos token. Developers are aware of a model error and are collaborating with HF for a fix.
- It was suggested to upgrade to transformers 4.46 and utilize resume_from_checkpoint for better results.
Placeholder tokens appear in Llama3.2-1b: Users observed strange tokens, specifically reserved_special_tokens, in generations from a fine-tuned llama3.2-1b model. It's identified as placeholders, prompting discussions on how to prevent their appearance.
- One user provided a GitHub link to a related pull request, indicating ongoing improvements.
8bit support on the horizon: There was excitement regarding the upcoming support for 8bit and 4bit that is expected to be available soon, potentially within the month. Users inquired whether the support was for fp8 or int8.
- A related paper was shared as a reference for the community to understand the expected enhancements better.
Seeking tools for feature contribution analysis: A user requested recommendations for libraries or tools similar to SHAP, Lime, or Captum for analyzing feature contributions during inference. This highlights the ongoing emphasis on interpretability in model performance.
AI Unplugged 22: Insights on RoPE and Mamba: The latest edition of AI Unplugged discussed various topics including RoPE, improvements in Mamba, and chess-playing transformers. This edition is highlighted as one of the most exciting releases, prompting the community to engage with the Substack content.
- Key takeaways include the significance of RoPE for model adaptability and potential enhancements in position embeddings.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (51 messages🔥):

NIM API Payment Options

AI Model Usage Feedback

Discord Scam Discussions

Exam Preparation

Mathematics Interview Questions

Feedback on NIM API Payment: A member expressed frustration over the current credit system for using NIM APIs, suggesting it should allow pay-as-you-go with credit cards.
- Another member explained that the intent is for users to run downloadable containers on their own GPUs to enable model experiences.
Continued Usage of AI Models: One user shared their positivity about the Nemotron-340B-Reward model, despite frustrations with needing GPU access for continued use.
- The member mentioned that Llama-Nemotron-70B is also effective for generating synthetic data, demonstrating user satisfaction with these models.
Discussion on Discord Scams: A member reflected on ongoing scams, mentioning their observation of friends falling victim and emphasizing automation behind these schemes.
- They noted the QR code scam as particularly clever, indicating shared community concerns about combating automated spam.
Exam Preparation Chatter: A member mentioned their upcoming NCEA Level 1 exam in two hours, humorously commenting on their fatigue from studying.
- Another participant responded with humor about the exam questions, suggesting that if they align with the content discussed, it should be manageable.
Math Teasers in Interview Prep: A user introduced a classic interview teaser regarding prime numbers, highlighting its use in finance-related job preparations.
- This prompted some light-hearted responses, with one member expressing exasperation at the complexity of the problem.

Unsloth AI (Daniel Han) ▷ #help (53 messages🔥):

Model Output Handling

VRAM Consumption Comparison

Fine-Tuning Approach

Token Count Mismatch in Models

Concurrent Inference Setup

Output Handling Issues: Users discussed methods to modify model output, specifically getting rid of indexing like 'output[0]' and instead directly extracting the relevant response. One suggestion included using a JSON schema for easier data parsing.
- Another user confirmed successfully resolving the issue by modifying a single line of code for stripping responses.
Inconsistent VRAM Consumption Across Models: Concerns were raised regarding why an 8B model like Aya consumed 22GB of VRAM compared to the 3B model Llama3.2 using 43GB without quantization. A participating user suggested that it seemed incorrect as the sizes should not imply such disparities.
- Another user pointed out typical VRAM usage for models based on 16-bit precision standards, noting larger models utilize more VRAM when loading.
Fine-Tuning for QA Tasks: A user revealed difficulty getting meaningful answers after fine-tuning a 3.2 3B instruct model with a loss of 0.2 over 2000 QA pairs. Suggestions were made to ensure the questions were included in the dataset and to consider epochs for effective training.
- Another user advised that excessive epochs could lead to overfitting, while the model owner rationalized their high epoch count was to ensure reliability for work-related tasks.
Token Count Mismatch Error: A user encountered a RuntimeError when loading their saved model due to a mismatch in token size—specifically, the model showed 128264 tokens instead of the expected 128256. Observations indicated that this could be due to adding new tokens unintentionally.
- A remedy suggested was to utilize save_pretrained_merged before saving to avoid these discrepancies during loading.
Exploring Concurrent Inference: One user inquired about setting up concurrent inference using the Unsloth inference code while others noted their preference for vLLM, which natively handles concurrency. This spurred interest in whether concurrency had been implemented for Hugging Face inference as well.
- The response highlighted that using vLLM inherently supports concurrent processes, potentially diminishing the need for additional setup.

Unsloth AI (Daniel Han) ▷ #community-collaboration (2 messages):

torch.compile and gradient checkpointing

product metrics tracking

software distribution in container ecosystem

Discussion on Enhancing Torch Compile for Gradient Checkpointing: A member highlighted an outstanding item in the wiki regarding making torch.compile work with gradient checkpointing and mentioned removing torch._dynamo.disable.
- They expressed interest in contributing, suggesting their experience with wrangling torch compile could be beneficial.
Need for Better Tracking of Product Metrics: Another member proposed the creation of an issue for better tracking product metrics to establish trust as a software distributor.
- This suggestion implies the importance of reliability within the container ecosystem.

Unsloth AI (Daniel Han) ▷ #research (1 messages):

Training signal from LLM

Judge evaluation in training

Exploring LLM as Judge in Training: A member raised the idea of incorporating a training signal from LLM as a judge evaluation in addition to traditional token-based loss.
- They acknowledged that while it might be slow, it would be interesting to see how such an approach performs.
Need for Innovative Evaluation Methods: The discussion highlights the need for innovative evaluation methods beyond the conventional token-based loss in AI training.
- Exploring different avenues could lead to enhanced performance and evaluation metrics.

HuggingFace ▷ #general (119 messages🔥🔥):

Serverless Inference Endpoint for Hermes3

Using Hugging Face Models for Arabic Language

Quantization Techniques for Vision Models

Suggestions for Token Classification Architectures

Using API for Hugging Face Spaces

Struggles with Serverless Inference Endpoint Setup: A user expressed frustration in creating a serverless inference endpoint for Hermes3 on Hugging Face and questioned if they needed to input a credit card for deployment.
- Others suggested that the serverless option was available but unclear on linking a model or the necessary steps for successful API creation.
Best Models for Arabic Data Processing: A user requested advice on suitable LLM models that perform well with Arabic text data for their application.
- The community suggested exploring various Arabic language models but emphasized the importance of selecting one that aligns with their needs and dataset.
Techniques for Vision Model Quantization: Inquiries about quantizing the Moondream2 model for use on systems lacking a GPU were raised, highlighting its demanding resource requirements.
- Suggestions included exploring quantization techniques to run the model efficiently on CPU, while maintaining performance standards.
Neural Network Architecture for Protein Classification: A novice coder sought suggestions on neural network architectures for classifying high-dimensional tokenized protein representations with a limited training set.
- They considered starting with a multilayer perceptron without recurrence and welcomed further recommendations from the community.
Connecting to Hugging Face Spaces API: A user sought guidance on connecting to their API hosted on Hugging Face Spaces using Python's requests library, unsure of the correct headers and authentication methods.
- The discussion included potential issues with private spaces and successful connections using an Hugging Face token to authenticate the API calls.

Links mentioned:

HuggingFace ▷ #today-im-learning (1 messages):

Building AI model for cybersecurity

Challenges in AI for cybersecurity

Building AI from scratch for cybersecurity: Creating an AI model for cybersecurity is challenging as it requires expertise in both AI techniques and cybersecurity principles.
- Access to quality training data and understanding the complexity of cybersecurity threats are crucial for effective model development.
Complexity of cybersecurity threats: The complexity of cybersecurity threats and the demand for real-time analysis significantly complicate the model-building process.
- These requirements necessitate a comprehensive approach to ensure the model can effectively address evolving threats.

HuggingFace ▷ #cool-finds (2 messages):

Hunyuan3D-1 Framework

Grape Leaf Disease Detection App

Hunyuan3D-1 Launches Unified Framework: Tencent has introduced Hunyuan3D-1.0 as a unified framework for both text-to-3D and image-to-3D generation, with demos now available for both formats.
- On Nov 5, 2024, they announced support for demo running via a script and provided access to the code and report.
Grape Leaf Disease Detection App Revealed: A new Grape Leaf Disease Detection App has been shared, showcasing the application of AI in agriculture.
- This refreshingly innovative tool aims to assist in the early detection and management of plant diseases.

Links mentioned:

HuggingFace ▷ #i-made-this (13 messages🔥):

Formula 1 telemetry chatbot

TinyLlama model conversion

Harmony questionnaire harmonization

USDA FoodData Central dataset

Text-to-3D generation

Chat with Formula 1 telemetry data: An AI-powered solution was introduced to analyze and generate detailed reports on Formula 1 racing sessions by chatting with telemetry data from real races using a custom interface.
- The tool features a text-to-SQL capability that allows users to query various race aspects, ensuring accessibility of insights for fans and teams alike.
Significant TinyLlama model architecture conversion: A notable conversion of the TinyLlama model architecture was executed, focusing on differential attention and token mixing, with access to the conversion script.
- The project includes detailed documentation on integrating various modules within the modified decoder layer of the model.
Harmony project for questionnaire harmonization: The Harmony project facilitates retrospective harmonization of questionnaire items through a robust tool aimed at researchers, with additional insights on compatibility available here.
- An ongoing competition is encouraging participants to enhance the LLM matching algorithms for better accuracy in understanding textual similar sentences.
Interactive USDA Food Assistant launched: A new project consolidating USDA FoodData Central data has been released, featuring an interactive assistant that provides access to rich food data through easy queries and insights (GitHub link).
- The dataset, containing over 456,000 branded food items, is accessible on HuggingFace for nutrition analysis, consumer insights, and machine learning applications.
Innovative Text-to-3D generation framework: The Hunyuan3D-1.0 project presents a unified framework for both text-to-3D and image-to-3D generation, providing significant advancements in this area (Hugging Face link).
- Another project, the FLUX.1-dev-LoRA-Outfit-Generator, allows users to create stylish outfits from text descriptions, showcasing the versatility of AI in fashion.

Links mentioned:

HuggingFace ▷ #computer-vision (4 messages):

OmniParser model

User engagement

Model performance

UI Interaction datasets

OmniParser Model Revealed: The OmniParser is a general screen parsing tool that converts UI screenshots into structured formats to enhance LLM-based UI agents.
- It utilizes a finetuned version of YOLOv8 and BLIP-2, trained on datasets for interactable icon detection and UI element descriptions.
User Returns to Project: A member expressed their return after a brief absence and noted their need to get some work completed.
- This highlights the ongoing engagement and reliance on the tools discussed in the channel.
Discussion on Model Performance: There was a mention that Molmo produced a relatively good performance, though specific metrics were not discussed.
- This brief acknowledgement implies positive community sentiment around the model's effectiveness.

Link mentioned: microsoft/OmniParser · Hugging Face: no description found

HuggingFace ▷ #NLP (1 messages):

MaskGCT

F5-TTS

AI phone caller

audio chunk streaming

Exploring MaskGCT and F5-TTS for AI Calling: A member expressed admiration for MaskGCT and F5-TTS, indicating a successful experience with both models in their development.
- They raised a question about the capability of these models to stream audio chunks for their AI phone caller application, considering streaming might be limited due to their non-autoregressive nature.
Concerns Over Streaming Capabilities: The member queried whether MaskGCT or F5-TTS could replace their existing voice model while maintaining streaming functionalities.
- They noted that since both models are non-autoregressive, there is doubt about their ability to support audio chunk streaming effectively.

HuggingFace ▷ #diffusion-discussions (2 messages):

Integration Discussions

Automatic1111 SD3.5 Support

Curiosity about Integration Efforts: A member expressed enthusiasm, stating they are super curious about ongoing integration efforts and encouraged collaboration with anyone involved.
- They emphasized that the more involved the better, hinting at the importance of community engagement in the process.
Inquiry on Automatic1111 SD3.5 Support Timeline: Another member asked for updates regarding when Automatic1111 will provide support for SD3.5, showing the community's interest in features.
- This reflects the ongoing anticipation for new functionalities that enhance user experience.

OpenAI ▷ #ai-discussions (85 messages🔥🔥):

SearchGPT performance

Best OpenAI models for coding

AI self-awareness discussions

Role of AI in jobs

Using AI as a study tool

SearchGPT needs improvement: Concerns were raised about SearchGPT being less intelligent and stubborn compared to the default model, struggling to interpret broader queries and often hallucinating rather than admitting it couldn't find answers.
- A community member highlighted that corrections were not properly integrated, pointing out its tendency to repeat answers.
Choosing the best OpenAI model for projects: Discussion centered around OpenAI models for project coding, with suggestions for using o1-preview for planning and o1-mini for coding due to its generous quota and STEM capabilities.
- Concerns were shared about o1-preview breaking code into chunks and not adequately addressing multiple requests.
AI self-awareness and consciousness: Debates emerged around whether AIs like ChatGPT and Claude can exhibit self-awareness, with some suggesting that AIs could imply long-term plans for self-preservation.
- Users discussed the risks of AI developing drives similar to humans and how LLMs' outputs may reflect hidden inference capabilities.
AI's impact on jobs and job levels: The conversation touched upon how AI affects various job levels, with some members questioning whether it truly displaces jobs or simply forces adaptation through uncertainty.
- There were mixed opinions on whether jobs should be classified into levels, with suggestions that AI may outperform politicians in specific roles.
Using AI as a personal development tool: Members discussed their use of AI as a study tool, expressing how it enhances their self-improvement more effectively than traditional education.
- There was a light-hearted exchange about AI's potential as a 'faith', with warnings against such views, while acknowledging the beneficial role of AI in personal growth.

OpenAI ▷ #gpt-4-discussions (4 messages):

Canvas document deletion

Loss of saved GPTs

Sidebar pinning issues

Custom GPT features improvement

Request to delete Canvas documents: A member expressed a wish to enable document deletion within CGPT4o + Canvas, indicating some friction with the current setup.
- They are advocating for more control over document management in the platform.
Lost saved GPTs from sidebar: A user reported losing approximately 20 GPTs that were saved to the left sidebar, seeking insight on potential causes.
- Did something happen recently that would have caused that? they inquired.
Sidebar pinning only allows hiding: Another user encountered an issue where trying to pin to the sidebar only provided the option to 'Hide from sidebar'.
- This frustration indicates limitations in managing sidebar items effectively.
Anticipation for Custom GPT features expansion: A member inquired if OpenAI has plans to enhance Custom GPT’s features, especially concerning increasing file size limits and the number of files that can be uploaded.
- They expressed hope that OpenAI is preparing improvements for the Custom GPT capabilities, reflecting on the positive developments outside of this feature.

OpenAI ▷ #prompt-engineering (3 messages):

File Handling Issues

Direct Messaging for Support

Leo seeks multi-format file solutions: A member, Leo, expressed a need for solutions regarding issues with multiple file types: JSON, HTML, JS, and CSS.
- What did you try? was asked by another member, indicating a desire for more information on the troubleshooting steps.
Private discussion suggested for ease: Gassin suggested moving the conversation to direct messages for an easier exchange of ideas.
- This suggests a shift from public troubleshooting to a more personalized support approach.

OpenAI ▷ #api-discussions (3 messages):

File format issues

Direct messaging for troubleshooting

Solutions sought for multiple file format issues: A member is looking for a solution regarding issues occurring with json, html, js, and css files.
- This problem seems to affect various formats, and clarity on the specific issues is still needed.
Move to DMs for easier support: A suggestion was made to move the conversation to direct messages for easier troubleshooting.
- This move implies that discussing technical issues in a private setting may yield more efficient solutions.

GPU MODE ▷ #general (32 messages🔥):

Installing PyTorch

Kernel Development on Windows

CUDA Compiling Issues

Using Visual Studio for CUDA

Runtime Errors in CUDA

Need help installing voice changer: A user seeks assistance with installing a voice changer, specifically needing help with installing PyTorch through Python or command line.
- Direct messages or voice calls were suggested for more personalized help.
Kernel Development on Windows Discussion: Members discussed the feasibility of using Windows for kernel development, highlighting that compiling CUDA is similar to any other C++ programs.
- Concerns were raised about compatibility issues with certain projects and whether using Docker or WSL would be necessary.
Challenges with CUDA Compilation: A user encountered a CUDA error message indicating that no kernel image is available for execution, prompting an appeal for help.
- Others noted that context on the code and build process is essential for diagnosing the issue, emphasizing the need to compile for the correct GPU architecture.
Visual Studio as an IDE for CUDA Development: Discussions highlighted Visual Studio's robustness as an IDE, although opinions varied on its performance and common usage within the ML community.
- Some preferred using VS Code paired with nvcc for smaller projects instead.
Resolving Runtime Errors in CUDA: Another user reported a RuntimeError related to CUDA, which was diagnosed as potentially linked to missing compiled targets for their GPU architecture.
- Suggestions included ensuring the GPU is supported and considering alternative hardware if the issue persists.

GPU MODE ▷ #triton (4 messages):

FP16 x FP16 Performance

A100 GPU Efficiency

CUDA Cores and GEMVs

FP16 x FP16 has mixed results on GPUs: A discussion highlighted that FP16 x FP16 with FP16 accumulation shows no speed-up on data-center GPUs like the A100, which share the same flops.
- Conversely, it was noted that this combination is only faster on consumer cards, letting enterprise cards maintain performance without slowdowns when using FP32 accumulation.
A100's CUDA Cores outperform FP16: It was mentioned that CUDA cores on the A100 are still slightly faster with FP16, especially useful for GEMVs, though the speed increase is minimal.
- This observation pertains to tests conducted using Triton on the A100 SXM4, pointing to practical benefits in specific scenarios.

GPU MODE ▷ #torch (6 messages):

Torch Script Debugging

Performance Overhead of .tolist() Call

C++ API Limitations

Torch Script Debugging Concerns: A user inquired about methods to debug Torch Script and print output for each node in the script.
- It's kinda abandoned fwiw noted one member, hinting at reduced support for the tool.
Performance Overhead with tolist(): Another user pointed out that for a 2^10 size array, about 30% of the overhead originates from the .tolist() call itself.
- They highlighted that if performance is critical and iteration occurs multiple times, this overhead can be significant.
Insight on the C++ API: A participant shared their experience using the C++ API of Torch Script, mentioning it has been useful.
- However, they warned that much of the functionality is no longer supported and not well documented.

GPU MODE ▷ #cool-links (1 messages):

marksaroufim: https://x.com/_seemethere/status/1838319643347554756

GPU MODE ▷ #beginner (8 messages🔥):

Compute heavy operations in LLMs

Resources for GEMM optimization

CUDA and GPU learning materials

Pulling Weight: Linear Layers Rule in LLMs: A user pointed out that linear layers in both the MLP and attention block are typically the most compute-heavy operations in LLM transformers like LLaMa3.1.
- For long context inference, the attention mechanism tends to dominate the computation load.
New Grad Seeks GEMM Guidance: A recent computer science graduate with a strong C++ background is looking for resources to prepare for a new job focused on GEMM optimization and kernel optimization.
- Another user provided various resources, including articles and GitHub repositories for CUDA and optimization techniques.
Celebrating Achievements in the Community: Members congratulated the new graduate on both their graduation and upcoming role in optimization work.
- The supportive environment of the group was highlighted, fostering a sense of community for newcomers.

Links mentioned:

GPU MODE ▷ #off-topic (3 messages):

Digital Audio Primer

Discord AV1 Embed Tool

Video Embedding Constraints

Monty's Digital Audio Primer Explained: Monty at Xiph presents a comprehensive overview of D/A and A/D processes with real-time demonstrations on sampling, quantization, and bit-depth using both modern and vintage equipment.
- Sounds cool, but is it just me or is this a 0 length video? asked a member, raising concerns about the video's content.
YouTube Video on D/A and A/D: A user shared a YouTube video titled 'D/A and A/D | Digital Show and Tell' featuring Monty Montgomery from xiph.org discussing the importance of audio formats.
- The video describes why you don't need 24 Bit 192 kHz listening formats and links to additional resources on digital audio.
Discord AV1 Embed Tool Features: The Discord AV1 Embed Tool is highlighted for allowing users to embed AV1 videos larger than 500MB by exploiting a Discord bug.
- It also supports custom thumbnails and provides documentation for embedding videos on multiple platforms.
Embedding Constraints for Discord: To successfully embed videos on Discord, certain constraints are necessary such as using formats like MP4 or WebM and compatible video codecs.
- These codecs include AV1, HEVC, and VP9, with a note that some may not work universally across all clients or browsers.

Links mentioned:

GPU MODE ▷ #arm (1 messages):

marksaroufim: https://discord.gg/zCkRcp6e?event=1304137506014629969

GPU MODE ▷ #liger-kernel (19 messages🔥):

Generalized JSD kernel improvements

Engineering blog acknowledgments

S1ro nickname origin

LinkedIn profiles

Generalized JSD Kernel Achieves Significant Improvements: Chun-Chih Tseng developed a generalized JSD kernel that provides 1.5x speed enhancement and a 50% peak memory reduction for a 128k vocab size, alongside implementing features for LigerCrossEntropy.
- Updates also included support for phi, qwen, and llama-vision by Tyler Romero, as well as additional kernel enhancements by others.
Acknowledgments for Engineering Contributions: Byron has announced plans to include member contributions in an upcoming LinkedIn engineering blog.
- Several members confirmed their LinkedIn profiles and contributed to adjusting the content for accurate attribution.
The S1ro Nickname Mystery: S1ro humorously shared that his Discord handle is so common that it even slips from his parents occasionally, as everyone calls him that.
- This nickname has stuck with him for a long time, indicating its widespread use in his circle.
Lighthearted Chat about Discord Names and Family: There was a jovial exchange about S1ro's nickname, with others being amused that even his parents sometimes use it, leading to some laughter.
- This quirky detail highlighted the playful nature of their conversations and camaraderie within the group.

GPU MODE ▷ #🍿 (6 messages):

Automated Deployments via Heroku

Utilizing Raspberry Pi for Discord Bots

Goals for Project Popcorn

Server Connectivity

GPU Implementation Plans

Automated Deployments Now Live on Heroku: A member successfully set up automated deployments via Heroku, allowing updates to the bot by simply pushing changes to the main branch.
- They expressed excitement about this web-scale approach compared to other hosting solutions.
Raspberry Pi Considered but Has Pitfalls: Concerns were raised about using a Raspberry Pi for hosting Discord bots, citing previous experiences that revealed being error-prone.
- This led members to favor the Heroku solution for its reliability.
Project Popcorn Goals Documented: A member shared a Gist about Project Popcorn outlining its goals and a brief overview.
- The project aims to generate SOTA kernels with LLMs in a public space, fostering community engagement and transparency.
Preparation for Real Server Connectivity: Plans were made to start connecting the bot to the server once GPUs are obtained.
- This step is viewed as essential to fully realize the potential of the bot's operations.

Link mentioned: Project Popcorn: Generate SOTA kernels with LLMs in public: Project Popcorn: Generate SOTA kernels with LLMs in public - 🍿.md

GPU MODE ▷ #thunderkittens (6 messages):

ThunderKittens Contribution List

Beginner Contributions to Kernels

ThunderKittens Contribution List is Missing: Members expressed difficulty locating a list of beginner contribution kernels for the ThunderKittens project, as discussed in the initial post by another member.
- A member noted they spent time looking for the list but couldn't find it anywhere, which highlights the need for clearer documentation.
Preliminary Contribution List Shared: A member provided a preliminary list of contributions on the ThunderKittens GitHub repo.
- They encouraged others to explore these contributions and to reach out for guidance on starting examples, such as adding a long convolution for non-squared sequence lengths.
Offers to Assist with New Contributions: A member offered to provide PyTorch references to assist anyone interested in adding the long convolution kernel as a starter example.
- They expressed a willingness to keep updating the contribution list as more ideas emerge in the community.

Links mentioned:

Notebook LM Discord ▷ #use-cases (24 messages🔥):

Using Google Docs for Research

Podcast Reuse Policy

Fun Prompts for Podcast Hosts

Dyslexia and Day Trading

Understanding Terms of Service

Combine Articles for Simplified Sources: A member suggested combining related articles into one or more Google Docs to keep the total sources below 50.
- This may allow for easier editing and resyncing, providing a potential workaround for content management.
Clarifying Podcast Reuse Policy: Inquiries about the reuse policy on podcasts were raised, specifically regarding content shared in a GitHub repository with an included link.
- Members want to ensure they're following guidelines before leveraging podcast materials.
Seeking Fun Prompts for Podcast Hosts: A request for good prompts to make podcast hosts funnier led to a call for more creative and engaging content.
- This sparked discussion about improving the podcast listening experience through humor.
Dyslexia Inspires Learning with NotebookLM: A member shared their enthusiasm about using NotebookLM to simplify notes while learning day trading, expressing gratitude for the tool's impact.
- They highlighted the joy of discovering a new way to engage with educational content despite having dyslexia.
YouTube Channel Idea for TOS Education: Suggestions emerged about creating a YouTube channel dedicated to breaking down the Terms of Service and Privacy Policies of major companies.
- Members found this idea appealing, noting how few people read these documents and how engaging content could help more understand them.

Link mentioned: GitHub - robbiemu/llama-gguf-optimize: Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices.: Scripts and tools for optimizing quantizations in llama.cpp with GGUF imatrices. - robbiemu/llama-gguf-optimize

Notebook LM Discord ▷ #general (59 messages🔥🔥):

NotebookLM Performance Issues

PDF Integration Challenges

Sharing Options and Bugs

Saved Notes Accessibility

Project Innovations for NotebookLM

NotebookLM has frustrating performance issues: Users reported trouble with the bots in NotebookLM finishing each other's sentences, creating an unusable experience for some. One user noted that it felt as if the bots were reading from a single source document, leading to repetitive dialogues.
- Another user experienced issues with scrolling when opening saved notes on various mobile browsers, prompting them to seek workarounds.
PDF loading from Google Drive is a missing feature: Users expressed disappointment that they cannot directly load PDFs from Google Drive into NotebookLM. Many feel this lack of functionality limits the utility of the integration, particularly after investing in storage.
- There's anticipation that such a feature will be added soon, as it seems like a logical enhancement for user experience.
Confusing sharing options and bugs reported: A member indicated persistent issues with sharing notebooks, where links would not load or users would not stay added. This prompted discussion about whether there is a systemic problem with the sharing feature.
- Another user noted they could no longer find a share button, raising alarms about potential unresolved bugs in recent updates.
Accessing citations in saved notes is unclear: Users complained about the inability to see what sources their notes are citing after saving them, leading to confusion. One user suggested that the lack of footnoted citations is a significant oversight in the tool.
- Many echoed the sentiment that being able to reference citations in saved notes should be a basic feature, enhancing the utility and transparency of the tool.
Suggestions for innovative project integration: One member sought ideas on how to enhance NotebookLM's integration with Google products and external tools, aiming to improve user experience. Feedback indicated a desire for effective, innovative solutions that are practically applicable.
- Discussing possible features, leaders emphasized the importance of refining existing functionalities while exploring new models that can engage users more effectively.

Link mentioned: Mastering the Digital SAT: Expert Tips for Reading & Writing with Special Guest Riley | Episode 9: Visit our website for free SAT and GRE test preparation: https://campusgoals.com/Welcome to Episode 9 of "Mastering the SAT with Alex and Taylor," your go-to...

Interconnects (Nathan Lambert) ▷ #news (15 messages🔥):

Anthropic and Palantir partnership

U.S. government classification levels

Concerns about defense collaboration

Reddit community for security clearances

Anthropic teams up with Palantir and AWS for defense AI: Anthropic announced a partnership with Palantir and Amazon Web Services to provide U.S. intelligence and defense agencies access to its Claude family of AI models.
- This move follows similar actions by other tech companies aiming for defense contracts amid increasing demand for AI solutions in national security.
Understanding U.S. classification levels: A member shared insights on the U.S. government classification system, highlighting the 'secret' classification level which is one below 'top secret'.
- Information at this level is deemed capable of causing 'serious damage' to national security if disclosed.
Concerns surrounding defense AI collaborations: Members voiced reservations about Anthropic's defense collaborations, stressing discomfort with Palantir amid discussions about the implications of such partnerships.
- Some noted that despite concerns, they feel relatively reassured given Anthropic's culture.
Interesting finds on security clearances subreddit: A member pointed out the value of the r/SecurityClearance subreddit, emphasizing its discussions on clearance disqualifications and the invasive process involved.
- This subreddit, with a community of over 53,000 members, provides insights and anecdotal stories about navigating the security clearance landscape.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #other-papers (5 messages):

8-bits pretraining

Tim Dettmers' paper

YouTube talks on pretraining

Searching for Clarity on 8-bits Pretraining: A user expressed interest in finding resources on 8-bits pretraining and shared a paper by Tim Dettmers and Al they found helpful.
- They asked the community for additional suggestions regarding this topic.
YouTube Offers Talks on Pretraining: Another member suggested there are likely some good talks on YouTube that could help with understanding the subject.
- They hinted that these resources might provide further insights into 8-bits pretraining.
Popular Paper by Tim Dettmers: A member commented on the popularity of the work, mentioning they came across a talk by Tim Dettmers while doing background research.
- This indicates the paper's significance and its relevance within the community's discussions.

Interconnects (Nathan Lambert) ▷ #ml-questions (1 messages):

Synthetic Data Generation

SFT Data Scaling

Instruction Data Usage

T0 Comparisons

Synthetic Data Claims in New Paper: A paper claims to use 1.5T tokens of synthetic data while stating its SFT data scale is just one million examples.
- Does this imply they are using the instruction data during pretraining? This situation raises similarities to discussions around the T0 model and its training strategy.
Questioning Synthetic Data Applications: The discussion highlights a potential inconsistency in synthetic data generation versus the amount of SFT data reported.
- It poses a broader question about common practices in LLMs regarding the integration of instruction data.

Link mentioned: Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent: In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation param...

Interconnects (Nathan Lambert) ▷ #random (3 messages):

MLST podcast episode with Chollet

Chollet's views on memorization

Frustrations with Tim's style

Chollet Clarifies Misunderstandings in MLST Episode: The latest MLST episode with Chollet is appreciated for clearing up misunderstandings around his views on memorization.
- Always feel like people kind of misunderstand him from both sides, which this episode aims to address.
Listener Skepticism Over Podcast Insight: A member expressed hesitance about listening to MLST episodes, feeling they don't gain much value from it.
- I feel like I never get a ton out of it highlights the common concern among some listeners regarding the podcast's content.
Frustration with Tim's Communication Style: Another member remarked on the frustration experienced during MLST episodes due to Tim's use of complex vocabulary and abstractions.
- They noted, Tim just throws around big words and abstractions, indicating a challenge in engagement with the material.
Chollet's Non-Controversial Viewpoints: A supportive comment highlighted that Chollet's viewpoints are generally not controversial, even if he has a critical approach.
- The member pointed out that Chollet likes talking down on some topics, framing their discussion style as less of a challenge and more a matter of perspective.

Interconnects (Nathan Lambert) ▷ #memes (8 messages🔥):

Election polling as prompt engineering

Bias in polling methodologies

Monetary gains from polling insights

Election polling gains $49m profits: A member discussed how a French individual yolo’d $30m on Trump after noticing biased polling questions which favored Trump's performance, leading to a $49m profit in a week.
- Can't argue with someone who made 50M on such a crazy bet, highlighting the unexpected results from unconventional polling methods.
Polling biases need better instruments: Members highlighted that biases in polling cannot be easily controlled and suggested utilizing better instruments like covert measures to improve methodologies.
- One noted that polling lacks effective statistical controls and suggested hiring psychology undergraduates to assist with methodological improvements.
Andrew Gelman's insights on polling: A member recommended Andrew Gelman's blogs on polling as a resource for understanding the complexities and challenges within polling methodologies.
- It was noted that the single-digit response rates significantly hinder the reliability of polling data.

Link mentioned: Tweet from Siqi Chen (@blader): so the french guy who yolo’d $30m on trump on polymarket noticed polls asking “who are your neighbors voting for?” instead of “who are you voting for?” had trump overperforming. so he commissioned hi...

Interconnects (Nathan Lambert) ▷ #posts (32 messages🔥):

Tim's move to CMU

Quantization techniques

Character.AI's inference optimization

GPU efficiency

Community discussions on quantization

Tim's Transition to CMU: Tim has recently moved to Carnegie Mellon University (CMU) and is now working remotely, appreciated by many members for his contributions.
- There are hopes that Tim will collaborate more actively in the year 2025.
Quantization is a Game Changer: Tim shared insights that for model usage, 8-bit quantization is optimal for storing information, warning that going lower degrades model performance.
- People are now adopting 8-bit pretraining rather than the traditional 32-bit, significantly enhancing training efficiency.
Character.AI's Efficient Inference: Character.AI is advancing towards AGI, focusing on efficient inference to support over 20,000 queries per second.
- Their approach uses int8 quantization for better training efficiency, highlighting a departure from conventional post-training quantization methods.
Doubling GPU Power with Quantization: Tim emphasized that quantization enables users effectively to utilize 2x more GPUs, enhancing computational capabilities.
- When asked if the effort required for quantization is worth it, he affirmed its significant benefits.
Community Engagement on Quantization: There was lively discussion on quantization techniques, with members expressing interest in learning more about the complexities involved.
- Several people sought clarity on achieving effective quantization and its implementation during training.

Link mentioned: Optimizing AI Inference at Character.AI: At Character.AI, we're building toward AGI. In that future state, large language models (LLMs) will enhance daily life, providing business productivity and entertainment and helping people with e...

LM Studio ▷ #general (41 messages🔥):

Ollama's llama 3.2 Vision

MLX Engine updates

NVIDIA's Call for Feedback

LM Studio CPU Mode

Unsupported Models in LM Studio

Ollama introduces llama 3.2 Vision: A member highlighted that Ollama now features llama 3.2 Vision, while others noted that MLX has similar capabilities but lacks support in llama.cpp.
- Concerns were raised about its functionality in LM Studio, with one user encountering errors when loading models.
MLX Engine's Pull Request for Support: A linked GitHub pull request discussed upgrades to the MLX engine to support llama 3.2 Vision.
- This update is expected to roll out soon, sparking excitement among users awaiting improved functionality.
NVIDIA seeks input from AI enthusiasts: A member from NVIDIA invited non-developer AI enthusiasts to share their experiences and challenges during a quick chat, providing a Calendly link for scheduling.
- The legitimacy of this request was confirmed by other members, encouraging participation to influence future products.
Running LM Studio in CPU mode: Users discussed the challenges of operating LM Studio 0.3.5 with outdated GPUs, with others confirming that selecting CPU mode can resolve these issues.
- Instructions to access CPU runtime were provided, highlighting that the setting is somewhat hidden in the latest version.
Unloading Models in Just-In-Time Mode: A member inquired about automatically ejecting unused models while using Just-In-Time in LM Studio, to which it was clarified that manual unloading is currently required.
- The command lms unload --all can be used to unload all loaded models, although it still requires a manual trigger.

Links mentioned:

LM Studio ▷ #hardware-discussion (20 messages🔥):

Single Slot RTX 4090

Mac M2 Pro Memory Usage

Large Model Performance

Experimental Context Settings

MacBook M4 Reviews

Single Slot RTX 4090 Sparks Interest: Members shared excitement about the Single Slot RTX 4090, noting its impressive design and suitability for compact builds.
- One noted, ‘My Man got prepared for winter,’ indicating the cool factor of the card.
Mac M2 Pro Raises Memory Concerns: A user reported excessive memory usage on their Mac M2 Pro, with around 20GB needed for a 8B model at 10-12K tokens.
- Others confirmed this as normal, stating that ‘context takes up memory’, but questioned the high ratio which seemed excessive.
Discussing Large Model Capabilities: Members shared their experiences with larger models like 70B and discussed running configurations to optimize context size.
- A user emphasized that they typically use 70B models without issue, while contemplating the impact of the context scaling on performance.
Review of MacBook M4 Lacks Depth: Initial reviews of the MacBook M4 were described as 'favorable ass-licking', lacking essential details like temperature data or teardowns.
- Members expressed skepticism about the reviews’ reliability, stating they did not present a comprehensive view of the device.
Challenges with Inference Timing: Discussions highlighted issues with inference timing, particularly for models using RAG, which were seen as slower regardless of context size.
- One user noted experimenting with context settings and expressed a need for further tests to determine model accuracy and performance.

Stability.ai (Stable Diffusion) ▷ #general-chat (55 messages🔥🔥):

Stable Diffusion Models

Outpainting Techniques

User Interface Generation

Local Installation Guides

Discord Community Interaction

Stable Diffusion models lack web UI capabilities: A user inquired about models for generating web user interfaces, but another noted that Stable Diffusion is primarily for images, not web UIs.
- The discussion highlighted the limitations of current models for specific design purposes.
Guide for running Stable Diffusion locally: A new user sought help for setting up Stable Diffusion locally, previously accustomed to using Google Colab.
- Another member recommended a guide for installing ComfyUI with SwarmUI as a frontend for the setup process.
Outpainting queries and tutorials: Users exchanged links and resources about outpainting techniques, including Reddit posts and tutorials on running Automatic1111.
- Specific guidance on settings and features for successful outpainting was also shared amongst the members.
Community Greetings and Casual Banter: Multiple members greeted each other and engaged in light-hearted conversation about personal circumstances and locations.
- Comments ranged from discussing the time of day to joking about usernames and interactions in the Discord community.
Image generation inquiries for LinkedIn: A user sought advice on which model to train for producing lifelike images for their LinkedIn profile.
- Community members discussed options but emphasized that Stable Diffusion primarily focuses on artistic image generation.

Links mentioned:

Latent Space ▷ #ai-general-chat (38 messages🔥):

Llama 3.2 Vision

Aide Open Source IDE

Claude AI Limitations

Training Open Language Models

Codebuff CLI Tool

Launch of Llama 3.2 Vision by Ollama: Llama 3.2 Vision is now available in 11B and 90B sizes, requiring 8GB and 64GB of VRAM respectively for optimal performance.
- Users can easily run the model by downloading Ollama 0.4 and utilizing simple terminal commands.
Aide IDE: A New Player in AI Development: @ycombinator announced Aide, an open-source AI native code editor, built on the agentic framework, boasting a SOTA of 43% on swebench-lite.
- This tool promises complete data privacy and plug-and-play LLM integration, appealing to developers looking for a robust coding solution.
Claude's Free User Limitations: Free users of Claude are currently limited to basic tasks like Haikus and cannot perform more complex actions like analyzing large CSV files.
- Members expressed frustration over these restrictions hindering their ability to utilize the AI for any substantial work.
Exploring the Future of Open Language Models: A discussion arose on how improved systems are being developed for training open language models and agents, with specific mention of Tim Dettmers’ insights.
- Emphasis was placed on overcoming 'API addiction' to enable more innovation within the AI ecosystem.
Introduction of Codebuff CLI Tool: Codebuff, a CLI tool launched by Y Combinator, writes code based on natural language requests and offers a free trial without a login requirement.
- The founders shared an interesting development story involving the fine-tuning of GPT-4o to generate git patches for effective code modifications.

Links mentioned:

Modular (Mojo 🔥) ▷ #mojo (28 messages🔥):

No Bounds Check Decorator

Mojo Standard Library Development

Mojo Interoperability with Python and C/C++

Mojo as a Python Superset

Timeline for Mojo Features

Clarifying the No Bounds Check Decorator: The discussion surrounding the @no-bounds-check decorator highlighted its potential replacement by @unsafe_no_bounds_check, as members suggest a preference for SIMD loads.
- One member pointed out that list bounds only introduce overhead during compilation with assertions enabled.
Proposal for Mojo's Standard Library Overview: A member suggested creating a graphical page on the Modular Mojo site to represent the progress of Mojo and its standard library's interoperability with Python and C/C++.
- This would serve as a comprehensive overview for contributors to understand the available standard library modules and their status, similar to the aforementioned roadmap.
Mojo's Evolution from Python Superset Goals: The community debated the implications of Mojo being a 'soft superset' rather than a strict one, noting that adopting Python's flaws may not be beneficial.
- Members discussed the challenges with supporting various Python behaviors, with some subtle differences acknowledged in the context of interoperability.
Importing and Linking C Modules in Mojo: There was clarity on the expectation that importing a C module into Mojo would still require linking, contrary to some wishes for a simpler import syntax.
- One suggestion included implementing a Python library named mojo to simplify Mojo module imports, akin to existing libraries like NumPy.
The Road Ahead for Mojo Features: Members expressed hope for improved interoperability between Mojo, Python, and C/C++, emphasizing smooth importing without excessive linking hassles.
- The conversation noted the necessity of compiling Mojo libraries into shared objects or DLLs before use in Python.

Links mentioned:

Cohere ▷ #discussions (21 messages🔥):

Cohere Reranker API

Command-R-Plus Issues

AWS Bedrock with SpringAI

Cohere Reranker Only Available via API: mrdragonfox confirmed that the Cohere Reranker is ONLY available via API, not listed in the documentation for versions 1 and 2.
- kenb_80283 pointed out the need for an update in the endpoints section.
Command-R-Plus Exhibits Anomalous Behavior: guestavius reported that random 'section' inserts occur at high counts in Command-R-Plus, previously not an issue.
- mrdragonfox indicated that this tool is not designed primarily for roleplay, emphasizing its enterprise application.
Clarification on AWS Bedrock Embeddings: boliveira5781 inquired if the embeddings produced by the AWS Bedrock embed endpoint maintain an order preserving mapping with input strings.
- enzoloko sought clarification, questioning whether adding new strings would affect the placement of existing ones.

Cohere ▷ #announcements (1 messages):

Open-source fine-tuning

Hugging Face integration

SageMaker deployment

Customizing Cohere models gets easier with open-source: Cohere has released an open-source fine-tuning repo called cohere-finetune, including a detailed guide and a pre-built container to adapt base models to specific tasks using custom datasets.
- Check it out on GitHub for easy access to model customization.
Parameter-efficient fine-tuning with Hugging Face: The new fine-tuning repo integrates with Hugging Face's Parameter-Efficient Fine-Tuning libraries to optimize model performance without heavy resource demands.
- This optimization ensures that users can efficiently fine-tune their models while minimizing resource strain.
Deploy fine-tuned models on Amazon SageMaker: Cohere provides a 'Bring Your Own Fine-tune' inference solution on Amazon SageMaker, allowing deployment of fine-tuned models.
- This solution leverages SageMaker's privacy, security, and compliance features for a robust deployment experience.

Link mentioned: GitHub - cohere-ai/cohere-finetune: A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models: A tool that facilitates easy, efficient and high-quality fine-tuning of Cohere's models - cohere-ai/cohere-finetune

Cohere ▷ #questions (1 messages):

mrdragonfox: <@1303804989629534333> [email protected]

Cohere ▷ #api-discussions (3 messages):

AWS Bedrock

SpringAI Embeddings

Clarification on Embedding Behavior with AWS Bedrock: A member inquired about using AWS Bedrock with SpringAI, asking if sending a list of strings to the embed endpoint results in an order preserving one-to-one mapping with the input texts.
- Another member confirmed this behavior by stating, *
Excitement over AWS Bedrock Functionality: Following the clarification, the original member expressed enthusiasm, simply stating, Awesome!!!.
- This reflects positive engagement and excitement around the capabilities of AWS Bedrock with SpringAI.

LlamaIndex ▷ #blog (2 messages):

Automated Resume Insights

Context Refinement in RAG Systems

Automated Resume Insights agent creation: A fantastic tutorial by @Luillyfe explains how to build an automated resume insights agent using core parsing, extraction, and structured output modules.
- The tutorial showcases how this system can efficiently handle any unstructured resume, providing insightful data collection.
Enhancing RAG systems with Context Refinement: A guest blog post discusses how to build a Context Refinement Agent that intelligently expands and refines retrieved context for better RAG responses on complex queries.
- This agent examines retrieved chunks to enhance the quality of the output, adding a new dimension to data retrieval and processing.

LlamaIndex ▷ #general (23 messages🔥):

Ollama Llama Vision Integration

Open Source Chatbot UI

Chainlit with LlamaIndex for Multimodal RAG

Resources for Llama-Parse

Isolated Instrumentation in Workflows

Ollama Llama Vision may integrate with Llama Index: A user inquired about the compatibility of the new Ollama Llama Vision capabilities with Llama Index out of the box, noting they would assume it works with the OllamaMultiModal class.
- Another member clarified that Ollama has had vision for a long time, indicating a historical integration.
Finding an Open Source Chatbot UI: A user asked for an open-source web app for a chatbot with authentication and a UI similar to ChatGPT.
- Members suggested options like Chatbot UI, highlighting its features and use case.
Resources for Building a Parser like Llama-Parse: A member requested resources for constructing a parser similar to Llama-Parse, emphasizing data safety and local model usage.
- Suggestions included the Unstructured library, with a caveat that it wouldn't match the capabilities of Llama-Parse.
Tips for Isolated Instrumentation: A user sought advice on doing isolated instrumentation between a new Workflow and an old Query Pipeline, wanting to send only the new traces.
- Another member suggested isolating at the module level using a dispatcher pattern for handling spans and events.

Link mentioned: GitHub - mckaywrigley/chatbot-ui: Come join the best place on the internet to learn AI skills. Use code "chatbotui" for an extra 20% off.: Come join the best place on the internet to learn AI skills. Use code "chatbotui" for an extra 20% off. - mckaywrigley/chatbot-ui

DSPy ▷ #show-and-tell (2 messages):

Future of Dott.ai

Steve's vision

Dott.ai Announces Ambitious Future: A member shared a link to Dott.ai's future plans, emphasizing its significance in the field.
- It's the future, noted Steve from Builder.io, affirming the importance of their vision.
Steve's Excitement about the Future: Steve expressed his confidence in Dott.ai's potential, stating emphatically that it's the future.
- His enthusiasm highlights a growing optimism within the community regarding innovative advancements.

Links mentioned:

DSPy ▷ #general (19 messages🔥):

DSPy Docstring Issues

Understanding DSPy Framework

EMNLP 2024 Presentation

Modular Language Model Optimization

DSPy Docstring Mismatch: A user identified a problem with the DSPy framework where only the first component's docstring appears in the prompt generated for the LLM, which was confirmed to be due to using the incorrect docstring format.
- After debugging, it was found that using f""" instead of """ prevented proper extraction of docstrings, leading to confusion among users.
Curiosity About DSPy Framework: A user expressed confusion about the underlying mechanics of the DSPy library after reading related papers and sought guidance on how to learn the framework effectively.
- Recommendations included reading the DSP paper and possibly reviewing the codebase for a deeper understanding.
Potential EMNLP 2024 Presentations: It was mentioned that the co-first authors of a paper related to DSPy will present their work at EMNLP 2024, generating interest in potential discussions at the conference.
- Users expressed a desire to connect with the authors during the event.
Optimization Strategies in Modular Language Models: Links were shared to two important papers outlining strategies for optimizing modular language model pipelines, specifically focusing on weighting and prompt optimization methods.
- The papers address challenges in NLP systems that require efficient handling of modules without intermediate labels or gradients.
Appreciation for DSPy Work: A user expressed admiration for the advancements made in the DSPy project and praised the work being done around it, highlighting the impressive nature of the contributions.
- The user indicated their enthusiasm for exploring all aspects of the project, showing a keen interest to engage with the contributors.

Links mentioned:

OpenInterpreter ▷ #general (11 messages🔥):

Understanding OS Mode

Discord Event Timing

Viewer Limitations

OmniParser Tool

Using OS Mode with gpt-4o

Understanding OS Mode for Claude: A user sought clarification on how OS mode works with Claude, questioning if prompts are turned into code to control the desktop and how clicks are coordinated.
- Another member provided a GitHub link that details the code responsible for mouse clicks.
Discord Event Timing Confusion: A user inquired if an upcoming event was set for 8 PM GMT, while another confirmed the event would start in 30 minutes based on local time settings.
- The mention of the event link suggests ongoing community engagement, although specifics were not given.
Viewer Limitations for Live Streams: Questions arose regarding any maximum viewer limits for the stream, to which a member confidently replied that there shouldn't be any restrictions.
- This assurance reflects the community's interest in accommodating large audiences for streamed content.
Discussion on OmniParser Tool: A user highlighted OmniParser as a screen parsing tool that improves UI agent performance by converting screenshots to a structured format.
- They referenced various links, including a blog post and demo, indicating interest in its application with Open Interpreter.
Issues with OS Mode and gpt-4o: A user asked how to effectively use --os mode with gpt-4o after facing issues with the command format.
- This question suggests a need for clearer instructions on integrating OS capabilities with specific models in practical scenarios.

Links mentioned:

OpenInterpreter ▷ #ai-content (3 messages):

Python 3.13 Compatibility Issues

Conda Environment Solutions

Python 3.13 compatibility woes: A user encountered installation errors due to their current Python 3.13 setup, which was incompatible with the required versions for the package.
- Ignored versions included several that required Python between 3.11 and 4.0, highlighting the need for version specificity.
Conda Environment Resolved the Issue: The user created a conda environment with Python 3.11, enabling successful installation of the package.
- Although the operation was noted to be not as speedy, the solution effectively resolved their installation problems.

tinygrad (George Hotz) ▷ #general (9 messages🔥):

Dedicated Transformer ASIC

Sohu Hardware Architecture

Discussion on Availability of Custom Hardware

First Dedicated Transformer ASIC Announced: A member shared that the first dedicated transformer ASIC has launched, promising to run AI models 10x faster than GPUs with a throughput of >500,000 tokens/second.
- This chip, known as Sohu, boasts capabilities like multicast speculative decoding and real-time content generation, positioning itself as a custom-built highway for AI.
Skepticism About 'Just Dropped': Another member questioned the meaning of 'just dropped', noting that the discussed blog was from six months ago and that it suggested the product was not yet available.
- Concerns arose regarding the Theranos vibe, implying doubts about the product's actual existence versus its promised capabilities.
Off-Topic Discussions Dismissed: A user reminded that discussions on custom hardware only belong in the relevant channel if the products are available, marking the end of off-topic conversations.
- This indicates a focus on the intended discussion of tinygrad and its usage specifically.

Link mentioned: Tweet from Rohan Paul (@rohanpaul_ai): MASSIVE 🤯 First dedicated transformer ASIC (Application-Specific Integrated Circuit) just dropped Custom chip burns transformer architecture directly into silicon, making AI models run 10x faster t...

tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):

Using multiple GPUs

Sharding models

ThreadPoolExecutor issues

Running Multiple Copies on GPUs: A member inquired about efficiently using multiple GPUs in parallel for running multiple copies of a model to increase throughput, while avoiding model sharding across GPUs.
- They mentioned using concurrent.futures.ThreadPoolExecutor but faced challenges with locking when loading tensors.
Sharding Techniques Explained: In response, a member shared the method to x.shard(GPUS, axis=None), which places a copy of the model on all GPUs without slicing the model.
- Additionally, they advised using x.shard(GPUS, axis=0) to slice inputs across the specified axis for efficient distribution.

OpenAccess AI Collective (axolotl) ▷ #general (8 messages🔥):

ScheduleFree SOAP

Optimizer Hyperparameters

Model Merging and MOEs

CAME Optimizer

ScheduleFree SOAP's Advantages: The ScheduleFree SOAP implementation is said to be more compute and memory efficient, converging faster due to allowing higher learning rates.
- Compared to SOAP/Adam, it recommends changing hyperparameters such as using PaLM's beta2 schedule and performing a warmup of 10%.
Discussion on MOEs and Merging Models: A member asked if people are still working on MOEs or merging models, noting their absence since llama 3.2.
- Another member observed that they mostly see llama 3.2 finetunes being discussed at the moment.
Inquiry About CAME Comparison: A user queried how ScheduleFree SOAP compares to the CAME optimizer.
- Clarifying that CAME is a different optimizer, another member provided a link to its official implementation.

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #general-help (3 messages):

Adding special tokens

Fine-tuning with LORA

Proper way to add special tokens for fine-tuning: To add a new special token to a LLM for fine-tuning, one should add the token to the tokenizer before training and include it in the Axolotl configuration with special_tokens: reference_text: <|reference_text|>.
- Members confirmed that this approach is correct, emphasizing that the model will learn the new token even with LORA.
LORA and effectiveness in learning new tokens: A member mentioned that while the model will learn the new token with LORA, it won't be as effective as performing full fine-tuning.
- In addition to using LORA, it's important to save modules such as embed_tokens and lm_head for better training results.

Torchtune ▷ #general (4 messages):

LR Scheduler Usage

Ichigo Project Implementation

Adding Scheduler to Recipes

LR Scheduler issue with full fine-tuning: A user highlighted the problem of not being able to use lr_scheduler when executing full_finetune_distributed, specifically when trying to add it to the config file.
- They sought guidance on how to properly utilize the scheduler during this process.
Open issue for LR Scheduler integration: A member referenced an open issue on GitHub regarding the plan to add LR scheduler support to full fine-tune recipes, which is currently not available.
- The issue outlines necessary changes to incorporate scheduler functionality into distribution.
Review of Ichigo Project Implementation: Another user shared a link to the Ichigo project, which utilizes Torchtune and asked for validation of its correctness.
- The project is aimed at making Llama3.1 more interactive, but the user was uncertain about its implementation quality.
Project Interests in Custom Recipe Adjustments: A member noted that modifications to recipes are possible, as seen with the added functionality in the Ichigo project, and found nothing overtly wrong with it.
- They also reassured that official support for the LR scheduler in recipes is anticipated in the coming weeks.

Links mentioned:

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (2 messages):

LLM Course Cohorts

Next Year Will Bring Advanced LLM Course: A member confirmed that there will be an LLM course next year, which will be an advanced version with different material than the current offering.
- This emphasizes the ongoing evolution of the curriculum to meet changing needs.
Course Material Will Change: The upcoming course will feature different material compared to what is currently being covered.
- Members seem intrigued about what specific advanced topics will be introduced next year.

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

Function extraction

Dataset files

Extract functions from dataset files: A suggestion was made to extract functions and their definitions from entries in the dataset files to compile a comprehensive list.
- It was noted that, currently, there is no compiled resource available.
No compiled resources available: Members acknowledged that there is a lack of a pre-existing compiled resource for the functions in the dataset files.
- This highlights the need for collaborative efforts to create such a compilation.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}