**the old o1-mini version for comparison**

AI News for 12/17/2024-12/18/2024. We checked 7 subreddits, 433 Twitters and 32 Discords (215 channels, and 4542 messages) for you. Estimated reading time saved (at 200wpm): 497 minutes. You can now tag @smol_ai for AINews discussions!

You are reading AINews generated by o1-mini-2024-09-12. As is tradition on new frontier model days, we try to publish multiple issues for A/B testing/self evaluation. Check our archives for the o1-2024-12-17 version. We are sorry for the repeat sends yesterday (platform bug) but today’s is on purpose.

{% if medium == ‘web’ %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

Here are the key discussions organized by topic:

OpenAI o1 API Launch and Features

o1 model released to API with function calling, structured outputs, vision support, and developer messages. Model uses 60% fewer reasoning tokens than o1-preview and includes a new “reasoning_effort” parameter.
Performance Benchmarks: @aidan_mclau noted o1 is “insanely good at math/code” but “mid at everything else”. Benchmark results show o1 scoring 0.76 on LiveBench Coding, compared to Sonnet 3.5’s 0.67.
New SDKs: Released beta SDKs for Go and Java. Also added WebRTC support for realtime API with 60% lower prices.

Google Gemini Updates

@sundarpichai confirmed that Gemini Exp 1206 is Gemini 2.0 Pro, showing improved performance on coding, math and reasoning tasks.
Gemini 2.0 deployment accelerated for Advanced users in response to feedback.

Model Development & Architecture

Discussion around model sizes and training - debate about whether o1-preview’s size matches o1 and relationship to GPT-4o.
Meta’s new research on training transformers directly on raw bytes using dynamic patching based on entropy.

Industry & Business

@adcock_brett reported successful deployment of commercial humanoid robots at client site with rapid transfer from HQ.
New LlamaReport tool announced for converting document databases into human-readable reports using LLMs.

Memes & Humor

Joke about watching “Attention Is All You Need” re-release in IMAX

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Hugging Face’s 3B Llama Model: Outperforming the 70B with Search

Hugging Face researchers got 3b Llama to outperform 70b using search (Score: 668, Comments: 123): Hugging Face researchers achieved a breakthrough by making the 3B Llama model outperform the 70B Llama model in MATH-500 accuracy using search techniques. The graph demonstrates that the 3B model surpasses the 70B model under certain conditions, with accuracy measured across generations per problem, highlighting the model’s potential efficiency and effectiveness compared to larger models.
- Inference Time and Model Size Optimization: Users discuss the potential of finding an optimal balance between inference time and model size, suggesting that smaller models can be more efficient if they perform adequately on specific tasks, especially when the knowledge is embedded in prompts or fine-tuned for particular domains.
- Reproducibility and Dataset References: Concerns are raised about the reproducibility of the results due to the non-publication of the Diverse Verifier Tree Search (DVTS) model, with a link provided to the dataset used (Hugging Face Dataset) and the DVTS implementation (GitHub).
- Domain-Specific Limitations: There is skepticism about the applicability of the method outside math and code domains due to the lack of PRMs trained on other domains and datasets with step-by-step labeling, questioning the generalizability of the approach.

Theme 2. Moonshine Web: Faster, More Accurate than Whisper

Moonshine Web: Real-time in-browser speech recognition that’s faster and more accurate than Whisper (Score: 193, Comments: 25): Moonshine Web claims to provide real-time in-browser speech recognition that is both faster and more accurate than Whisper.
- Moonshine Web is open source under the MIT license, with ongoing efforts to integrate it into transformers as seen in this PR. The ONNX models are available on the Hugging Face Hub, although there are concerns about the opacity of the ONNX web runtime.
- Discussion highlights include skepticism about the real-time capabilities and accuracy claims of Moonshine compared to Whisper models, specifically v3 large. Users are curious about the model’s ability to perform speaker diarization and its current limitation to English only.
- Moonshine is optimized for real-time, on-device applications, with support added in Transformers.js v3.2. The demo source code and online demo are available for testing and exploration.

Theme 3. Granite 3.1 Language Models: 128k Context & Open License

Granite 3.1 Language Models: 128k context length & Apache 2.0 (Score: 144, Comments: 22): Granite 3.1 Language Models now feature a 128k context length and are available under the Apache 2.0 license, indicating significant advancements in processing larger datasets and accessibility for developers.
- Granite Model Performance: The Granite 3.1 3B MoE model is reported to have a higher average score on the Open LLM Leaderboard than the Falcon 3 1B, contradicting claims that MoE models perform similarly to dense models with equivalent active parameters. This is despite having 20% fewer active parameters than its competitors.
- Model Specifications and Licensing: The Granite dense models (2B and 8B) and MoE models (1B and 3B) are trained on over 12 trillion and 10 trillion tokens, respectively, with the dense models supporting tool-based use cases and the MoE models designed for low latency applications. The models are released under the Apache 2.0 license, with the 8B model noted for its performance in code generation and translation tasks.
- Community Insights and Comparisons: The Granite Code models are praised for their underrated performance, particularly the Granite 8BCode model, which competes with the Qwen2.5 Coder 7B. Discussions also highlight the potential for MoE models to facilitate various retrieval strategies and the importance of familiar enterprise solutions like Red Hat’s integration of Granite models.

Theme 4. Moxin LLM 7B: A Fully Open-Source AI Model

Moxin LLM 7B: A fully open-source LLM - Base and Chat + GGUF (Score: 131, Comments: 5): Moxin LLM 7B is a fully open-source large language model trained on text and coding data from SlimPajama, DCLM-BASELINE, and the-stack-dedup, achieving superior zero-shot performance compared to other 7B models. It features a 32k context size, supports long-context processing with grouped-query attention, sliding window attention, and a Rolling Buffer Cache, with comprehensive access to all development resources available on GitHub and Hugging Face.
- Moxin LLM 7B is praised for being an excellent resource for model training, with its clean and accessible code and dataset, as noted by Stepfunction. The model’s comprehensive development resources are highlighted as a significant advantage.
- TheActualStudy commends the model for integrating Qwen-level context, Gemma-level tech, and Mistral-7B-v0.1 performance. This combination of advanced methods and data is regarded as impressive.
- Many_SuchCases mentions exploring the GitHub repository and notes the absence of some components like intermediate checkpoints, suggesting that these might be uploaded later.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Imagen v2 Quality Elevates Image Generation Benchmark

New Imagen v2 is insane (Score: 680, Comments: 119): Imagen 3 is establishing new benchmarks in image quality with its release, referred to as Imagen v2. The post highlights the impressive advancements in the technology without providing additional context or details.
- Access and Usage: Users discuss accessing Imagen 3 through the Google Labs website, suggesting the use of VPNs for regions with restrictions. There is a mention of free access with some daily usage quotas on labs.google/fx/tools/image-fx.
- Artistic Concerns: There is significant concern among artists about Imagen 3’s impact on the art industry, with fears of reduced need for human artists and the overshadowing of traditional art by AI-generated images. Some users express the belief that this shift may lead to the privatization of creative domains and the erosion of artistic labor.
- Model Confusion and Improvements: Some confusion exists regarding the naming and versioning of Imagen 3, with users clarifying it as Imagen3 v2. Users note significant improvements in image quality, with early testers expressing satisfaction with the results compared to previous versions.

Theme 2. NotebookLM’s Conversational Podcast Revolution

OpenAI should make their own NotebookLM application, it’s mindblowing! (Score: 299, Comments: 75): NotebookLM produces highly natural-sounding AI-generated podcasts, surpassing even Huberman’s podcast in conversational quality. The post suggests that OpenAI should develop a similar application, as it could significantly impact the field.
- NotebookLM’s voice quality is praised but still considered less natural compared to human hosts, with Gemini 2.0 offering live chat capabilities with podcast hosts, enhancing its appeal. Users note issues with feature integration across different platforms, highlighting limitations in using advanced voice modes and custom projects.
- The value of conversational AI for tasks like summarizing PDFs is debated, with some seeing it as revolutionary in terms of time savings and adult learning theory, while others find the content shallow and lacking depth. The Gemini model is noted for its large context window, making it well-suited for handling extensive information.
- Google’s hardware advantage is emphasized, with their investment in infrastructure and energy solutions allowing them to offer more cost-effective AI models compared to OpenAI. This positions Google to potentially outperform OpenAI in the podcast AI space, leveraging their hardware capabilities to reduce costs significantly.

Theme 3. Gemini 2.0 Surpass Others in Academic Writing

Gemini 2.0 Advanced is insanely good for academic writing. (Score: 166, Comments: 39): Gemini 2.0 Advanced excels in academic writing, offering superior understanding, structure, and style compared to other models, including ChatGPT. The author considers switching to Gemini 2.0 until OpenAI releases an improved version.
- Gemini 2.0 Advanced is identified as Gemini Experimental 1206 on AI Studio and is currently available without a paid version, though users exchange data for access. The naming conventions and lack of a central AI service from Google cause some confusion among users.
- Gemini 2.0 Advanced demonstrates significant improvements in academic writing quality, outperforming GPT-4o and Claude in evaluations. It provides detailed feedback, often critiquing responses with humor, which users find both effective and entertaining.
- Users discuss the availability of Gemini 2.0 Advanced through subscriptions, with some confusion over its listing as “2.0 Experimental Advanced, Preview gemini-exp-1206” in the Gemini web app. The model’s performance in academic contexts is praised, with users expressing hope that it will push OpenAI to address issues in ChatGPT.

Theme 4. Veo 2 Challenges Sora with Realistic Video Generation

Google is challenging OpenAl’s Sora with the newest version of its video generation model, Veo 2, which it says makes more realistic-looking videos. (Score: 124, Comments: 34): Google is competing with OpenAI’s Sora by releasing Veo 2, a new version of its video generation model that claims to produce more realistic videos.
- Veo 2’s Availability and Performance: Several commenters highlight that Veo 2 is still in early testing and not widely available, which contrasts with claims of its release. Despite this, some testers on platforms like Twitter report impressive results, particularly in areas like physics and consistency, outperforming Sora.
- Market Strategy and Accessibility: There is skepticism about the release being a marketing strategy to counter OpenAI. Concerns about the lack of public access and API availability for both Veo 2 and Sora are prevalent, with a noted confirmation of a January release on aistudio.
- Trust in Video Authenticity: The discussion touches on the potential erosion of trust in video authenticity due to advanced generation models like Veo 2. Some propose solutions like personal AIs for verifying media authenticity through blockchain registers to address this issue.

AI Discord Recap

A summary of Summaries of Summaries by o1-2024-12-17

Theme 1. Challenges in AI Extensions and Projects

Codeium Extension Breaks Briefly in VSCode: The extension only displays autocomplete suggestions for a split second, making it unusable. Reverting to version 1.24.8 restores proper functionality, according to multiple user reports.
Windsurf Performance Crumbles Under Heavy Load: Some users experience over 10-minute load times and sporadic “disappearing code” or broken Cascade functionality. Filing support tickets is the top recommendation until a stable fix arrives.
Bolt Users Cry Foul Over Wasted Tokens: They jokingly proposed a “punch the AI” button after receiving irrelevant responses that deplete credits. Many called for improved memory controls in upcoming releases.

Theme 2. New and Upgraded Models

OpenAI o1 Dazzles With Function Calling: This successor to o1-preview introduces a new “reasoning_effort” parameter to control how long it thinks before replying. It also features noticeably lower latency through OpenRouter.
EVA Llama Emerges as a Storytelling Specialist: Targeted at roleplay and narrative tasks, it reportedly excels at multi-step storytelling. Early adopters praise its creative outputs and user-friendly design.
Major Price Cuts on Fan-Favorite Models: MythoMax 13B dropped by 12.5% and the QwQ reasoning model plunged 55%. These discounts aim to widen community access for experimentation.

Theme 3. GPU & Inference Pitfalls

AMD Driver Updates Slash Performance: Users saw tokens-per-second plummet from 90+ to around 20 when upgrading from driver 24.10.1 to 24.12.1. Rolling back fixes the slowdown, reinforcing caution with fresh GPU driver releases.
Stable Diffusion on Ubuntu Hits Snags: Tools like ComfyUI or Forge UI often demand in-depth Linux know-how to fix compatibility issues. Many still recommend an NVIDIA 3060 with 16GB VRAM as a smoother baseline.
TinyGrad, Torch, and CUDA Memory Confusion: Removing checks like IsDense(y) && IsSame(x, y) solved unexpected inference failures, but introduced new complexities. This led developers to reference official CUDA Graphs discussions for potential solutions.

Theme 4. Advanced Fine-Tuning & RAG Techniques

Fine-Tuning Llama 3.2 With 4-bit Conversions: Many rely on load_in_4bit=true to balance VRAM usage and model accuracy. Checkpoints can be reused, and resource constraints are minimized through partial-precision settings.
Depth AI Indexes Codebases at Scale: It attains 99% accuracy answering technical queries, though indexing 180k tokens may take 40 minutes. Rival solutions like LightRAG exist, but Depth AI is praised for simpler setup.
Gemini 2.0 Adds Google Search Grounding: A new configuration allows real-time web lookups to refine answers. Early reviews highlight improved factual precision in coding and Q&A scenarios.

Theme 5. NotebookLM and Agentic Workflows

NotebookLM Revamps Its 3-Panel UI: The update removed “suggested actions” due to low usage, but developers promise to reintroduce similar features with better design. Plans include boosted “citations” and “response accuracy” based on user feedback.
Multilingual Prompts Spark Wide Engagement: Users tried Brazilian Portuguese and Bangla queries, discovering that explicitly telling NotebookLM the language context makes interactions more fluid. This showcases its capability for inclusive global communication.
Controlling Podcast Length Remains Elusive: Even with time specifications in prompts, final outputs often exceed or ignore constraints. Most rely on flexible length ranges to strike a balance between deep coverage and listener engagement.

PART 1: High level Discord summaries

Codeium (Windsurf) Discord

Codeium Extension AutoComplete Issues: Users reported that the Codeium extension in VSCode displays autocomplete suggestions only briefly, rendering it unusable. Reverting to version 1.24.8 restores functionality.
- Multiple suggestions to remedy the issue were discussed, focusing on version rollback as a potential solution.
Windsurf Performance and Error Handling: Windsurf is experiencing significant performance lags, with instance load times exceeding 10 minutes and frequent error messages disrupting workflows.
- Users called for clearer communication from Codeium regarding bugs like ‘disappearing code’ and Cascade functionality failures.
Flex Credits Usage Concerns: Several users inquired about whether flex credits roll over, noting issues with credits being deducted during service outages.
- Concerns were raised about the impact of frequent error messages and service downtime on credit usage.
Connection Issues with Codeium Server: Members shared difficulties connecting to the Codeium server, sharing their experiences and seeking assistance.
- A recommendation was made to file support tickets for further investigation and potential fixes.
Prompting with o1 in AI Applications: A user shared a link to a course on o1 prompting that covers its applications in coding and reasoning tasks.
- Another user requested a summary of the course content due to its complexity.

Cursor IDE Discord

Cursor 0.44.2 Update Stabilizes Editor: The Cursor team rolled back to version 0.44.2 after addressing bugs in v0.44, leading to enhanced stability.
- Users highlighted new features like the terminal and various bug fixes improving the overall experience.
PyQt/PySide6 Setup Hits Snags: Developers faced issues with missing files like ‘QtWebEngineCore.dll’ when setting up PySide6, causing application failures.
- Recommendations included verifying the correct Python version and following detailed installation steps to resolve the issues.
O1 Pro Boosts Bug Fix Efficiency: O1 Pro users reported successful bug resolutions with fewer prompts compared to earlier versions.
- Despite the added cost, many found O1 Pro’s performance beneficial for their workflows.
Kepler Browser Focuses on Privacy: Development on the Kepler Community browser emphasizes privacy and lightweight functionality.
- The developer is encouraging open-source collaboration, inviting contributions to enhance user privacy features.
Cursor’s Copy-Paste Functionality Frustrates: Users reported that Cursor’s copy-paste sometimes pastes terminal text as plain text instead of code.
- Suggestions included using Ctrl + Shift + V and properly targeting terminal outputs to improve usability.

aider (Paul Gauthier) Discord

o1 API Access Controversy: Discussions highlighted frustrations among Tier 5 subscribers regarding access to the o1 API, with concerns about the $15 per million tokens pricing compared to the $200 o1 pro subscription.
- Members debated the justification of the pricing structure, noting that while some find it reasonable, others believe it is prohibitively expensive for their use cases.
Aider vs. Sonnet Performance: Aider’s latest updates have surpassed Sonnet in effectiveness, achieving a benchmark score of 84.2 comparable to Sonnet’s performance.
- Users observed that while Aider excels in editor mode, Gemini models encounter difficulties with JavaScript tasks, leading to a preference for Aider in certain coding scenarios.
Upcoming Models: Veo 2 and R1: Anticipation surrounds the release of Veo 2 and R1, with members discussing how these models might influence OpenAI’s market position amidst growing competition.
- Conversations indicated that the introduction of newer models could render existing ones like Sora less competitive, sparking debates on their ongoing effectiveness.
Gemini 2.0 Google Search Integration: Gemini 2.0 Flash Experimental models on Vertex AI now support Google Search grounding, enabled through specific configurations detailed in a recent GitHub pull request.
- This integration enhances the model’s ability to perform grounded searches, aligning with the latest advancements in Gemini capabilities.
Depth AI Codebase Understanding: Depth AI impresses users with its ability to generate a comprehensive knowledge graph of codebases, achieving 99% accuracy in answering technical queries.
- While setup is straightforward, indexing larger projects ranging from 200k to 1.5 million tokens can take considerable time, as one user reported a 40-minute indexing for a 180k token repository.

OpenAI Discord

12 Days of OpenAI Updates: OpenAI is celebrating the 12 Days of OpenAI by encouraging members to secure the role in <#customize> to stay informed and participate in the festivities. This initiative aims to keep the community engaged with ongoing updates and events.
- On Day 10, a linked YouTube video showcased the day’s celebrations, prompting members to explore the exciting content related to the events.
OpenAI vs Google: AI Advancements: The ai-discussions channel sparked debates on OpenAI and Google’s competitive advancements in AI, with many members asserting that Google is currently surpassing OpenAI in AI development. Concerns emerged that OpenAI might be restricting model releases for strategic gains.
- Participants speculated that Google’s swift innovation trajectory could significantly shape the future AI landscape, affecting how technologies evolve and are adopted.
DALL·E vs Midjourney: Image Generation Showdown: Members compared OpenAI’s DALL·E with Midjourney and Google’s Imagen, often criticizing DALL·E for its recognizable ‘AI-generated’ outputs despite its free access. Discussions highlighted Midjourney’s pricing and superior production quality as key factors.
- Users expressed frustration over DALL·E’s limitations, while acknowledging Midjourney’s strengths, reflecting a preference for higher-quality image generation models even at a cost.
Custom GPTs Functionality: In the gpt-4-discussions channel, members questioned the effectiveness of prompting ChatGPT with the instruction ‘you are now a manager to train me’, aiming to enhance response quality.
- Additionally, frustrations were voiced regarding the inability to edit custom GPTs, prompting concerns about limited customization options for users.
Channel Posting Etiquette Enforcement: Discussions in prompt-engineering and api-discussions channels focused on enforcing channel posting etiquette, with members criticizing others for posting in multiple channels as spam and advising message deletions from incorrect channels.
- Members also highlighted challenges in identifying the appropriate channels for seeking help, emphasizing the importance of adhering to specified guidelines to maintain order and streamline discussions.

Nous Research AI Discord

Falcon Models Show Promise: The Falcon3 models, especially the 7B and 10B variants, are exhibiting robust performance. Recent updates have introduced tool-use support, enhancing their capabilities for complex interactions.
- Engineers are keen on testing these models across various applications, noting the improved functionality post-update.
Innovative Prompt Chaining Strategies: Prompt chaining is being utilized to refine model outputs by sequentially processing responses through multiple models. Techniques like structured output and tree structures are being explored to enhance creative tasks such as storytelling.
- These strategies aim to iteratively improve response quality, as discussed in the Langflow documentation.
OpenAI’s Safety Practices Under Scrutiny: Concerns have been raised about OpenAI’s safety protocols, especially after a demonstration revealed a jailbreak for their models during a GPT-4o vs o1 preview comparison. This has sparked debates on the alignment between OpenAI’s safety claims and actual model vulnerabilities.
- The discussion highlights the need for more transparent safety evaluations, as referenced in Democratize Intelligence’s tweet.
Function Calling on Local Models Explored: A query on the best libraries and methods for function calling on small local models indicates a focus on optimizing AI performance locally. This interest points to ongoing efforts to enhance model efficiency without relying on external APIs.
- The conversation underscores the importance of suitable libraries for effective local model deployment.
Ensuring Consistency in LLM Outputs: Discussions are focused on the consistency of LLM outputs, particularly for long and very long text generations. Members are seeking recommendations for top papers that address these challenges in maintaining output quality over extended lengths.
- This interest reflects a broader concern within the engineering community about sustaining model reliability in extensive applications.

Notebook LM Discord Discord

3-panel UI Changes in NotebookLM: The new 3-panel UI removes the ‘suggested actions’ feature from NotebookLM, addressing low utilization due to its limited discoverability and functionality.
- The development team plans to reintroduce similar functionalities with improved design, focusing on enhancing citations and response accuracy, and has encouraged users to provide feedback for upcoming releases.
Multilingual Functionality Enhancements: Members are leveraging NotebookLM’s interactive functions to facilitate conversations in languages like Brazilian Portuguese and Bangla, improving engagement through multilingual prompts.
- One user highlighted that expressing multilingual capabilities in prompts simplifies discussions, fostering more inclusive and diverse interactions within the tool.
Interactive Mode Rollout Challenges: The rollout of interactive mode in NotebookLM is experiencing delays and inconsistent access, with some users facing issues like audio generation lag and unexpected resets.
- Feedback indicates the need for a more reliable deployment strategy to ensure all users with the new UI can access interactive features seamlessly.
Podcast Length Customization Strategies: Users are exploring templates to control podcast episode lengths, aiming to maintain deep content exploration without sacrificing engaging dialogue.
- Discussions revealed a preference for flexible timing ranges over fixed durations, highlighting the complexity in implementing precise podcast length controls.
Knowledge Base Generation with NotebookLM: Members are investigating NotebookLM’s capability to generate a knowledge base akin to retrieval augmented generation (RAG), seeking insights and alternative solutions.
- A shared YouTube video demonstrated using NotebookLM as a knowledge base, aligning with users’ needs for structured information retrieval.

Unsloth AI (Daniel Han) Discord

Fine-tuning Llama 3.2 with 4-bit Conversion: A member is exploring how to effectively fine-tune the Llama 3.2 model with added datasets, discussing options for loading previous checkpoints. Another member emphasized that settings like load_in_4bit=true allow automatic conversion for models not uploaded by Unsloth.
- This approach aims to enhance model performance while managing resource constraints, as detailed in the Unsloth Tutorial.
Optimizing Batch Size and VRAM Management: Discussions about the optimal batch size revealed that larger sizes may improve training stability and accuracy but require more VRAM. Members agreed that increasing gradient accumulation is a viable alternative for those with limited VRAM.
- This balance is crucial for efficient training workflows, ensuring both model performance and resource utilization are maximized.
Debate on Open Source Reasoning Models like QwQ: Members debated the effectiveness of open source reasoning models such as QwQ, noting that while reproducing reasoning is straightforward, creating a successful model remains challenging. Skepticism was expressed about the necessity of reinforcement learning (RL) in current model designs.
- Suggestions were made that pure supervised fine-tuning (SFT) with high-quality datasets might suffice, potentially simplifying model development processes.
Multi-GPU and Mac Support in Unsloth: Unsloth Pro now supports multi-GPU setups, enhancing the model training experience for both local and cloud environments. However, support for M4 MAX GPUs on Macs remains unavailable, with a speculative timeline around Q2 2025.
- Community contributions are encouraged to expedite Mac support, addressing the limitations faced by users without NVIDIA hardware.
DiLoCo Research and Distributed Training Techniques: A member shared their research on DiLoCo (Distributed Low-Communication Training of Language Models), presenting their findings to the group. This sparked interest and encouraged broader dissemination for additional feedback.
- References were made to the DiLoCo Presentation and related ArXiv papers for deeper insights into distributed training methodologies.

OpenRouter (Alex Atallah) Discord

OpenAI o1 Model Rolls Out with Enhanced Features: The new OpenAI o1 model is now live, succeeding the o1-preview with features like function calling and reduced latency.
- It introduces a new reasoning_effort API parameter for controlling the model’s thinking time before answering, enhancing user interactivity.
Structured Outputs Normalization Expands: OpenRouter now normalizes structured outputs for 46 models across 8 companies, streamlining result formatting.
- A tutorial was shared to demonstrate its practical usage.
EVA Llama Launches as Storytelling Specialist: The EVA Llama model has been released, focusing on roleplay and storytelling, alongside updates for Grok 2 and Cohere models.
- Details about EVA Llama can be explored here.
Significant Price Drops on Popular Models: MythoMax 13B sees a 12.5% price reduction, while the QwQ reasoning model experiences a 55% price drop, enhancing affordability.
- These reductions aim to make the models more accessible to the community.
OpenRouter Introduces Provider Pages Analytics: Provider pages now offer detailed analytics, allowing users to view model hosting charts by clicking on provider names.
- An example can be seen with the DeepInfra provider page, providing comprehensive insights.

Eleuther Discord

Debating Warmup Phase Formulas: Discussions centered around Kevin’s formula (1 - beta1^step) for approximating the warmup phase have highlighted the lack of support from current LR schedulers.
- Members shared their implementations, raising concerns about off-by-one errors when using lambdaLR.
Leveraging Meta-Learning to Mitigate Overfitting: The community explored whether Meta-Learning strategies could effectively reduce overfitting in supervised learning models, seeking specific application examples.
- While theoretical frameworks supporting this approach exist, participants noted a scarcity of practical implementations within current models.
Advancements in Neural Network Compression: Members delved into compression methods such as depthwise compression and pruning techniques like OATS, which integrates sparse and low-rank matrices.
- Concerns were voiced regarding potential performance degradation and data coverage loss, especially for models trained on memorization tasks.
Exploring the Grokking Phenomenon in AI: The grokking phenomenon was a focal point, discussing its significance and the current absence of effective methods to induce it in AI models.
- Participants expressed that while grokking is acknowledged, most research efforts remain concentrated on large language models, limiting broader exploration.
Questioning the Integration of Koopman Operator Theory: There was skepticism regarding the applicability of Koopman operator theory to neural networks, questioning the benefits of modeling neural layers as dynamical systems.
- Critics argued that the theory primarily rephrases the use of residual connections without introducing substantial innovations.

Stability.ai (Stable Diffusion) Discord

Effective Lora Training: A user shared practical steps for creating a Lora: start with a strong dataset, choose an appropriate model, train the Lora, then test it. They emphasized research on creating quality datasets for optimal results.
- Emphasizing the importance of dataset quality, the user highlighted that thorough research is crucial for achieving optimal training outcomes.
Preferred Stable Diffusion Models: Users discussed their preferred models for Stable Diffusion, with some favoring the ‘flux’ model while others recommend ‘InvokeAI’ for its usability.
- There’s a consensus on the necessity of having an NVIDIA GPU, with suggestions like a 3060 with 16GB VRAM for smoother performance.
Challenges Running SD on Ubuntu: Users expressed frustrations with running SDXL on Ubuntu, citing compatibility issues with ComfyUI and Forge UI.
- Effective operation of SDXL may require in-depth familiarity with the Ubuntu system to navigate these compatibility challenges.
Optimal Image Resolution for Generation: A beginner inquired about the optimal image resolution for generation, seeking a balance between quality and processing time.
- Recommendations included experimenting with around 1024x1024 resolution and utilizing hires.fix for enhanced quality output.
AI Generated Content Metrics: There was a discussion about the techniques and metrics used in model training, specifically with the Pony model and its scoring system.
- Users noted how this unique approach impacts image generation and influences community perceptions.

Perplexity AI Discord

Custom Web Sources enhance Perplexity: Perplexity now offers custom web sources in Perplexity Spaces to tailor search queries to specific use cases.
- The launch video demonstrates the new customization capabilities.
Perplexity Pro Subscriptions launched: Perplexity Pro subscriptions are now available, offering 1 to 12-month gifting options that provide access to 3x more sources and latest AI models.
- Users are leveraging these subscriptions to enhance their search capabilities and stay updated with the newest artificial intelligence developments.
AI Model Performance under scrutiny: Community members are evaluating the performance of AI models in Perplexity Pro, attempting to improve search quality and suggesting alternatives like Claude 3.5 Sonnet.
- Questions have been raised regarding the advancements claimed with models like GPT-4o, leading to discussions on selecting optimal architectures.
Meta aims to block OpenAI’s for-profit ventures: Meta has voiced intentions to block OpenAI from pursuing for-profit business models, which could significantly influence future AI developments in the industry.
- This move has sparked debates on market competition and the potential reshaping of AI innovation dynamics.
Users face Rate Limits in Perplexity: Several users reported encountering rate limits while using Perplexity, prompting discussions on the necessity for personalized rate limit enhancements.
- There is speculation on the benefits of higher subscription tiers in mitigating these restrictions, with users sharing their experiences.

GPU MODE Discord

CUDA Memory Copy Issues: A member reported that removing the condition IsDense(y) && IsSame(x, y) from the code resolves unexpected behavior during LLM model inference, highlighting that CudaCopy initiates CUDA kernels. Refer to Reduce time to first kernel when using CUDA graphs for more details.
- Discussions also touched on the lack of official documentation for CUDA graphs supporting cudaMemcpyAsync, raising concerns about handling asynchronous memory operations within CUDA implementations.
Megatron-LM’s Training Efficiency: Megatron-LM’s efficiency remains under scrutiny as members plan to enhance training throughput in distributed setups. Insights from Gensyn and Christine Yip’s active community were suggested for optimizing distributed training.
- The conversation emphasized the importance of leveraging community resources to address scalability challenges and improve overall training performance with Megatron-LM.
Custom Vision Encoder Integration: A member proposed developing a custom vision encoder to better handle small pixel-scale images within existing language models, arguing that flexibility in encoder pairing outweighs the benefits of pretrained VLMs.
- The potential for integrating the encoder with various LLMs was discussed, highlighting the adaptability and improved performance in specialized image processing tasks.
RTX 3090 Finetuning Experiments: Experiments using an RTX 3090 for finetuning were shared, with discussions on the optimal setup employing bf16 or QLora+int8 precision. An example from WandB confirmed that 8bit Lora is effective for 8B models on this GPU.
- Members explored the balance between computational efficiency and model performance, aiming to identify the best finetuning practices for large-scale models on consumer-grade hardware.
Axolotl Lora Configuration Success: The Axolotl Lora config for llama-3-vision was validated to work seamlessly with 2x A6000 GPUs, demonstrating reliable performance in multi-GPU environments.
- There is ongoing interest in securing compute sponsors to facilitate larger-scale experiments, contingent upon the success of initial configurations.

LM Studio Discord

LM Studio Setup and Compatibility: Users shared their LM Studio setups, including RTX 4060 laptops and M3 Max with 96GB RAM, highlighting the application’s versatility.
- A user encountered an ‘unknown model architecture’ error when loading Llama 3.2 11B Vision in LM Studio.
Qwen QwQ Excels in Roleplay Applications: Discussions recommended Qwen QwQ as a strong candidate for roleplay LLM tasks, with multiple users lauding its performance.
- One member noted that Qwen2 demonstrates exceptional performance in Python programming contexts.
AMD GPU Drivers Causing Llama Performance Drops: Users reported that AMD GPUs using 24.12.1 drivers are experiencing ‘Safetensors header is unexpectedly large’ errors, leading one to revert to 24.10.1.
- Llama 3.2 3B model performance dropped from 90+ tok/s on driver 24.10.1 to 20 tok/s on the newer driver.
LM Studio Lacks Mobile Support: A member expressed the need to use LM Studio on mobile devices but found that no mobile app is currently available.
- Alternate solutions were suggested, yet direct mobile compatibility remains unavailable.
High RAM Needed for Large Model Inference: Running a 70B model requires 70GB of VRAM or main memory, as discussed by users.
- It was recommended to have 10-20% extra VRAM for operational flexibility when operating at q8.

Stackblitz (Bolt.new) Discord

Seamless Switch: Firebase to Supabase Migration: A user in #prompting sought the optimal strategy to transition their entire site from Firebase to Supabase, highlighting the need for comprehensive migration practices.
- The community is actively sharing strategies and best practices to ensure data integrity and minimize downtime during the migration process.
Bootstrap Battles with create-mf-app: A member discussed challenges in #prompting when integrating create-mf-app with Bootstrap, noting conflicts with Tailwind that lead to unstable setups.
- Solutions proposed include standardized integration methods to harmonize the use of both frameworks without compromising project stability.
Bolt Pilot Seeks Testers: In #prompting, a member introduced Bolt Pilot, a new GPT for Bolt, and requested the community to test its functionalities for improvements.
- Feedback from early testers is crucial for optimizing Bolt Pilot’s performance and feature set before a broader release.
Bolt’s Token Drain Frustrates Users: In #discussions, numerous users expressed dissatisfaction with Bolt’s excessive token usage, with suggestions like adding a ‘punch the AI’ button to mitigate waste.
- Members are sharing experiences of receiving irrelevant responses, prompting discussions on optimizing token allocation for better efficiency.
Enhancing Bolt with Payment Integrations: There was a conversation in #discussions about the complexity of implementing payment integrations such as Stripe and PayPal into Bolt.
- Users emphasized the necessity for dynamic billing features and expressed interest in upcoming updates that would support these integrations.

Cohere Discord

Cohere Toolkit Deployment Issues: A member deployed the Cohere Toolkit using AWS instructions but encountered an intermittent stream ended unexpectedly error.
- Another member recommended checking the docker logs to diagnose the issue, suggesting that deeper insights might be found in the application logs.
Findr App Launch on Product Hunt: Findr officially launched on Product Hunt, aiming to provide humans with infinite memory and a searchable digital brain.
- The team is seeking support through their promotional tweet, receiving positive feedback from the community.
Multimodal Embed-v3 Rate-limit Increase: In response to community feedback, the rate limit for the Multimodal Image Embed endpoint increased from 40 images/min to 400 images/min for production keys.
- Trial rate limits remain at 5 images/min, and other endpoints like Chat have their own specific rate limits, as detailed in the API Keys and Rate Limits — Cohere documentation.
Cohere Reranker Performance: A developer reported that the Cohere Reranker with ContextualCompressionRetriever sometimes fails to select the most relevant chunks, leading to incorrect answers.
- Despite accurate chunking in their RAG application, the reranking behavior appears random, causing confusion among users.
Embedding Models Dimensionality Challenges: A user inquired about creating separate vector stores for embeddings from text-3-embedding-large (3072 dimensions) and Cohere Embed v3 (1024 dimensions).
- The dimensionality differences may impact the storage strategy when integrating embeddings for text, tables, and images.

Modular (Mojo 🔥) Discord

Mojo REPL Troubles on Archcraft: A user reported issues entering the Mojo REPL on Archcraft Linux, citing a missing mojo-ldd library.
- The community discussed potential linker errors related to mojo-lld and the necessary installation steps to resolve the issue.
Var Keyword Debate in Mojo Docs: Updates in the Mojo documentation sparked a debate over the necessity of the var keyword in variable declarations.
- Members suggested making var optional, while discussing its impact on struct definitions and code clarity.
Clarifying Mojo Kernel Terminology: The term ‘kernel’ in Mojo was clarified to refer to functions running on accelerators rather than traditional OS kernels.
- Discussions highlighted the optimization of code blocks for hardware and the distinction between compute kernels and OS kernels.
Custom Ops Loading Issues in Max: Issues were reported when loading the mandelbrot custom op in Max, specifically related to unregistered Mojo kernels.
- Members pointed out the need for proper registration of custom ops to ensure smooth execution within Mojo.
Enhancements for Custom Op Handling: A feature request was made to improve error messages and handling for missing custom ops in Max.
- This includes directing users to relevant documentation when errors occur, enhancing the overall user experience.

OpenInterpreter Discord

Open Interpreter’s Persistent Pitfalls: Multiple users reported ongoing issues with Open Interpreter, particularly errors related to the --conversations command, leading to loss of valuable conversations.
- Members are actively seeking solutions to these persistent errors, emphasizing the need for reliable conversation management.
Upgrading to Open Interpreter 1.x: A user inquired about upgrading from Open Interpreter 0.34 to the latest 1.x version, sparking discussions on the availability of OS mode in the new release.
- Members strategized potential improvements and shared insights on the new features expected in Open Interpreter 1.0.
Innovating AI Applications and Models: Discussions focused on leveraging AI for projects like Raspberry Pi setups and integrating voice-to-speech models for home automation.
- Users explored methods to connect smaller models with larger systems to enhance overall functionality.
Truffle-1: The New AI Powerhouse: A member introduced the Truffle-1, a personal computing stack capable of running multiple models with 64GB unified memory, available for $500 deposit and $115 monthly. More details can be found on the Truffle website.
- The Truffle-1 promises infinite inference time and supports writing and sharing apps, with units set to ship in January.
Using OS Mode Locally in Open Interpreter: A user asked about the feasibility of using OS mode locally with Open Interpreter, which led to discussions on available configuration options.
- Members shared configuration tips to help users experiencing issues with local OS mode setups.

tinygrad (George Hotz) Discord

Benchmark Showdown: TinyGrad OpenCL vs PyTorch CUDA: A member requested benchmarks comparing TinyGrad’s OpenCL implementation with PyTorch’s CUDA for various Llama models.
- This highlights an ongoing interest in performance comparisons between different AI frameworks within the community.
Mergeable Shapes: Tackling ShapeTracker Complexity: Discussion emerged on the complexity of proving the mergeability of two arbitrary ShapeTrackers in Lean, with a user stating it’s impossible to have a simple criterion like a matrix determinant.
- They emphasized the presence of coincidences in strides and shapes that complicate mergeability checks.
Layout Algebra Unveiled in CuTe: Members inquired whether mergeability is equivalent to composition in CuTe’s layout algebra, referencing a note on the algebra of CuTe Layouts.
- This discussion touched on the fundamental abstractions in NVIDIA’s CUTLASS library and the mathematical treatment of layout operations.
NP-Hard Challenges in Layout Injectivity: Concerns were raised about proving conditions related to injectivity in layout algebra, with suggestions that such checks might be NP hard.
- Participants emphasized the difficulties in establishing sufficient conditions in layout algebra due to potential stride interferences.
Symbolic Superiority: Functions vs Layouts: A member pointed out that symbolic integer functions are strictly more powerful than layouts in terms of checking necessity and sufficiency.
- This aligns with discussions on algorithm complexities in merging views and supports ongoing research directions.

Torchtune Discord

FSDP Normalization Scaling: Discussions revealed that FSDP’s normalization by world_size must be addressed, and scaling by world_size can correct an average operation issue.
- A member suggested opening a PR #2172 to implement this fix, focusing on the scale_grads function.
Explicit Scaling in Training: The community highlighted the importance of explicit scaling of the loss within the training recipe rather than hiding logic elsewhere, to simplify comprehension.
- After evaluations, members agreed to clarify the scaling process in both training and optimization hooks.
Bug Identification Across Frameworks: It was identified that a similar bug affecting the reduction by a factor of 1/world_size might exist across various libraries, including trl and Hugging Face’s trainer.
- Members commended the Hugging Face team for recognizing and addressing these issues in their training framework, as noted in linked GitHub issues.
Handling No Sync in Hugging Face: Members discussed how Hugging Face handles no sync scenarios by avoiding gradient accumulation normalization while properly computing loss.
- Specific implementation details are available in the trainer.py file.
Evolutionary Algorithms in ML: Evolutionary algorithms are gaining traction in machine learning discussions, highlighting their potential applications.
- A member pointed out their significance, suggesting further exploration into their use cases within the community.

DSPy Discord

AI Reshaping the Knowledge Economy: AI and Knowledge Economy introduces a framework analyzing how AI transforms the knowledge economy by reallocating roles between ‘workers’ and ‘solvers’. Basic autonomous AI displaces humans, while advanced autonomous AI benefits larger, more productive firms.
- As autonomous agents gain traction, they predominantly benefit the most knowledgeable individuals, allowing efficient management of routine work, while less knowledgeable individuals benefit from non-autonomous AI like chatbots.
Coconut - Continuous Thought Paradigm: The paper Training Large Language Models to Reason in a Continuous Latent Space from Meta proposes Coconut, a new reasoning paradigm that uses the last hidden state of LLMs for reasoning instead of the traditional language space.
- This approach seeks to overcome limitations of language-based reasoning by exploring unrestricted latent spaces, potentially enhancing LLMs’ performance on complex reasoning tasks.
TypedReAct Enigma Solved: A member shared a new implementation of TypedReAct, questioning whether to submit a PR, but noted potential deprecated issues with TypedChainOfThought in upcoming versions.
- Another member suggested that removing the ‘Typed’ prefix would resolve compatibility issues, emphasizing that built-in ReAct is effective without the typing.
RouteLLM Maintenance Concerns: A member expressed concerns about the lack of maintenance for RouteLLM, indicating interest in potential DSPy integration.
- The conversation highlighted the importance of supporting development for models with reduced oversight.
DSPy Evolution with Reasoning Models: A member inquired about how DSPy might evolve with the rise of reasoning models, emphasizing fine-tuning at the branching level.
- This perspective shifts focus from traditional prompting to process reward mechanisms, indicating a potential paradigm shift in model training.

Nomic.ai (GPT4All) Discord

GPT4All Struggles with Jinja Templates: Users reported that GPT4All is experiencing significant issues with Jinja templates, which are essential for model functionality. Current problems include incorrect spacing, new line errors, and unsupported functions like ‘none’ and ‘[1:]’.
- Efforts to address these template issues are ongoing, but detailed solutions have yet to be implemented.
Demand for Docker Deployment of GPT4All: A request was made for a Docker version of GPT4All featuring a web UI, aiming to simplify deployment processes.
- As of now, the community has not provided specific resources or existing solutions to fulfill this demand.
CLI Access to Local Documents in GPT4All: Users are encountering difficulties using local documents with the GPT4All CLI, as the old CLI no longer supports it officially.
- However, it was noted that the server API allows programmatic access to local documents when enabled through the GUI.

LlamaIndex Discord

AI SDR Automates Lead Generation with LlamaIndex: An agentic AI SDR built using LlamaIndex showcased its capability in automated lead generation, linking to multiple GitHub features.
- This tool emphasizes LlamaIndex’s integration capabilities, enhancing efficiency in lead generation workflows.
Crash Course Teaches Agent Building with LlamaIndex: A crash course led by LlamaIndex focuses on building agents with function calling to manage real-time data queries.
- Participants also learn to create an agentic RAG that routes intelligently between vector and summary tools, and how to implement ReAct.
OpenAIAgent Faces Concurrency Execution Limits: A member reported that OpenAIAgent function execution remains non-concurrent even after async modifications in an asynchronous environment.
- This highlights a limitation in OpenAIAgent’s execution model, affecting asynchronous operations.
Community Engages on RAG Evaluation Strategies: Discussions on RAG evaluation are active, with a member inviting peers to DM for in-depth conversations.
- Participants are exploring effective evaluation strategies within the AI community.

Gorilla LLM (Berkeley Function Calling) Discord

BFCL Leaderboard Functionality Down: A user reported that the BFCL Leaderboard function call demo is stuck on ‘Loading Model Response…’.
- Another member confirmed a certificate issue is causing the model endpoint to be down.
Gorilla Benchmark for Structured Outputs: A user inquired about using the Gorilla benchmark to evaluate structured outputs from the model, specifically asking about subtasks for generating text according to a provided JSON schema or Pydantic model.

LLM Agents (Berkeley MOOC) Discord

Appreciation in MOOC Channel: A member expressed gratitude: Thank you for that! in the mooc-questions channel.
- This expression highlights positive engagement within the LLM Agents (Berkeley MOOC) discussions.
Positive Feedback in MOOC Discussions: A thank you message was shared in mooc-questions, stating: Thank you for that!
- Such acknowledgments indicate active participation and satisfaction among AI Engineers in the guild.

Axolotl AI Discord

New Engineer Joining for Reinforcement Learning: A new engineer is set to join in January to assist with Reinforcement Learning.
- Their expertise will enhance the team’s capabilities in Reinforcement Learning, contributing to ongoing projects.
Support for KTO Project Enhanced: The new engineer will provide support for the kto project starting in January.
- This assistance is anticipated to positively impact the development of the kto project.

Mozilla AI Discord

Developer Hub Update Released: A significant update for the Developer Hub was announced, detailing improvements and new features. You can view the full announcement here.
- Community feedback is encouraged to enhance the user experience.
Blueprints Initiative for Open-Source AI: The Blueprints initiative aims to assist developers in creating open-source AI solutions. More details can be found in the thread.
- This initiative serves as a resource for developers to kickstart their projects effectively.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == ‘web’ %}

Codeium (Windsurf) ▷ #discussion (60 messages🔥🔥):

Codeium Extension Issues, Windsurf Performance Problems, Flex Credits Concerns, Connection to Codeium Server, Prompting with o1

Codeium Extension suffers from Autocomplete Issues: Multiple users reported that the Codeium extension in VSCode is showing autocomplete suggestions only for a fraction of a second, making it unusable.
- Suggestions to remedy this issue included reverting to version 1.24.8, which appears to restore functionality.
Windsurf Performance Lagging: Users have expressed frustration with Windsurf becoming extremely slow or failing to load altogether, with one user waiting over 10 minutes for it to open.
- Another reported frequent error messages disrupting their workflow and asked for potential fixes.
Concerns about Flex Credits Usage: Several users inquired whether flex credits roll over, as they struggle with frequent error messages and service outages affecting their usage.
- Users reported issues with credits being deducted even when experiencing service downtime.
Connection Issues with Codeium Server: Territory of discussions regarding difficulties in connecting to the Codeium server, with users sharing their experiences and requesting assistance in resolving the issue.
- A suggestion was made to file support tickets for further investigation and possible fixes.
Prompting with o1 in AI Applications: A user shared a link about the o1 prompting, which discusses how it can effectively perform coding and reasoning tasks, urging others to explore its capabilities.
- Another user asked for a summary of this course content due to the complexity of information provided.

Links mentioned:

Reasoning with o1: Learn how to use and prompt OpenAI's o1 model for complex reasoning tasks.
Hello There GIF - Hello there - Discover & Share GIFs: Click to view the GIF

Codeium (Windsurf) ▷ #windsurf (678 messages🔥🔥🔥):

Windsurf vs Cursor, Model Performance Comparisons, Error Handling in Windsurf, AI Integration in Development, Coding Performance and Tools

Windsurf vs Cursor: Users are discussing the differences between Windsurf and Cursor, highlighting Cursor’s $20 plan as providing better value with features like unlimited requests, compared to Windsurf’s higher pricing and credit system.
- Some prefer to keep both options open for comparison, while others favor Cursor for cost-effectiveness.
Model Performance Comparisons: The discussion reveals that Codeium’s 4o-mini and Haiku models are generally regarded as more efficient and cost-effective, with comparisons also made to other models such as Llama 3.1 and GPT.
- Participants mention that 4o-mini can perform analogous tasks effectively and has recently added the ability to accept images.
Error Handling in Windsurf: Users report various errors and bugs with Windsurf, including ‘disappearing code’ and issues with Cascade’s functionality not working as expected.
- Some are experiencing internal errors during file operations, and there’s a call for clearer communication from Codeium regarding these issues.
AI Integration in Development: Participants express interest in how various AI tools, including Copilot and Codeium, integrate with their coding workflows, discussing the effectiveness of these tools for autocomplete and code suggestions.
- Analyses about the effectiveness of these tools indicate a general consensus on the importance of experimenting with different models to find the best fit.
Coding Performance and Tools: Conversations around best practices in using AI for coding reflect a need for clarity on when to use chat mode versus write mode in tools like Windsurf.
- Suggestions emphasize the importance of using absolute paths and defining clear goals for prompts to improve the effectiveness of AI-assisted coding.

Links mentioned:

LiveBench: no description found
Reddit - Dive into anything: no description found
Cannot use windsurf as git editor | Feature Requests | Codeium: git config --global core.editor 'windsurf --wait ' throws error on rebases hint: Waiting for your editor to close the file... [1119/144632.
Windsurf - Focus Follows Mouse (as a configuration option) | Feature Requests | Codeium: There is an open GitHub PR for VSCode which is, on the surface, more than 4 years old, however it is way older than that.
- YouTube: no description found
Productionizing and scaling Python ML workloads simply | Ray: Ray manages, executes, and optimizes compute needs across AI workloads. It unifies infrastructure and enables any AI workload. Try it for free today.
Reddit - Dive into anything: no description found
Reddit - Dive into anything: no description found

Cursor IDE ▷ #general (707 messages🔥🔥🔥):

Cursor Update 0.44.2, Development tools in Cursor, PyQt and PySide6 issues, O1 Pro usage, Kepler Community browser

Cursor Update 0.44.2 Released: The Cursor team has rolled back to version 0.44.2 after addressing bugs in the previous version 0.44, with users reporting stability improvements.
- Users have discussed their experiences with the update, including new features like the terminal and bug fixes.
Challenges with PyQt and PySide6: Users experienced issues related to missing files like ‘QtWebEngineCore.dll’ when setting up PySide6, leading to problems in their applications.
- Recommendations were made to ensure the correct Python version is installed and to troubleshoot installation steps.
O1 Pro Enhancements: Users discussed the benefits of using O1 Pro, reporting successful bug resolutions in a fraction of the prompts used compared to earlier versions.
- The cost of O1 Pro was noted, with some users finding value in its performance despite the additional expense.
Kepler Community Browser Development: One user shared their progress on developing the Kepler Community browser, emphasizing its focus on privacy and lightweight functionality.
- The developer expressed a commitment to open-source collaboration, inviting others to contribute to the project aimed at enhancing user privacy.
Cursor’s Copy-Paste Functionality: Users reported frustrations with Cursor’s handling of copied terminal text, which sometimes pastes as a plain text instead of code.
- Suggestions included using Ctrl + Shift + V for pasting and targeting terminal outputs effectively to improve usability.

Links mentioned:

no title found: no description found
Settings | Cursor - The AI Code Editor: You can manage your account, billing, and team settings here.
Downloads | Cursor - The AI Code Editor: Choose your platform to download the latest version of Cursor.
Poetry - Python dependency management and packaging made easy: no description found
Python Environment Manager - Visual Studio Marketplace: Extension for Visual Studio Code - View and manage Python environments & packages.
uv: no description found
GitHub - ultrasev/cursor-reset: Mac utility to reset Cursor editor's device identification system. Helps resolve account restrictions and trial-related issues.: Mac utility to reset Cursor editor's device identification system. Helps resolve account restrictions and trial-related issues. - ultrasev/cursor-reset
GitHub - ZackPlauche/add-cursor-to-win-context-menu: Contribute to ZackPlauche/add-cursor-to-win-context-menu development by creating an account on GitHub.
WARNING: Cursor v0.44 breaks all devcontainers v0.394.0: How did you forcibly disable Cursor from updating? I’m stuck in a world where upon restarts of Cursor, it will always update to v0.44.0 now. The added issue is even if I disable the “devcontainer” ex...
Danger Alert GIF - Danger Alert Siren - Discover & Share GIFs: Click to view the GIF
Changelog | Cursor - The AI Code Editor: New updates and improvements.
index.html - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
style.css - Pastebin.com: Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
GitHub - TheGalaxyStars/KEPLER-COMMUNITY: Explore freely, leave no trace.: Explore freely, leave no trace. Contribute to TheGalaxyStars/KEPLER-COMMUNITY development by creating an account on GitHub.

aider (Paul Gauthier) ▷ #general (264 messages🔥🔥):

o1 API access, Benchmark Performance, Refund and Support Experiences, Gemini vs. Sonnet, Aider Functionality

Controversy over o1 API access and pricing: Discussions revealed mixed experiences regarding access to the o1 API, with some users expressing frustration over not receiving it despite being Tier 5 subscribers.
- A member noted the pricing structure, highlighting that 15$/1 million tokens for the API is considered high compared to the $200 subscription for o1 pro, which some find justifiable.
Performance Comparisons of Aider and Sonnet: Users compared the performance of Aider and Sonnet, reporting that Aider’s latest updates show it to be more effective, with o1 achieving a benchmark of 84.2, rivaling Sonnet.
- Others discussed that o1 functions well in editor mode, while Gemini models struggled with JavaScript, suggesting that Aider has performed better in certain coding tasks.
Refund Process for Subscription Services: Several members shared their experiences with the refund process for the o1 pro subscription, noting that responses can be delayed but refunds do eventually occur.
- While some reported long wait times for refunds, others claimed it was quicker, in particular, one member received a refund within hours of the request.
Expectations from Upcoming Models: Members expressed anticipation for upcoming models like Veo 2 and R1, noting that competition is growing and could impact OpenAI’s market position.
- Conversations suggested that as newer models come out, existing models like Sora may fall behind, sparking debates on their effectiveness and performance.
Aider’s Improved Functionality: Users noted improvements in Aider functionality, specifically discussing the ability to see all files without needing to /add them manually, highlighting the potential need for a .aiderignore file.
- A member discussed the efficiency of using Aider’s editor capabilities, particularly with Gemini models, while raising concerns about the editing limitations with JavaScript.

Links mentioned:

Tweet from Andrew Ng (@AndrewYNg): OpenAI just announced API access to o1 (advanced reasoning model) yesterday. I'm delighted to announce today a new short course, Reasoning with o1, built with @OpenAI, and taught by @colintjarvis,...
Tweet from Poonam Soni (@CodeByPoonam): Google just dropped Veo 2 and it's INSANESpoiler: OpenAI Sora is now falling behind.10 Wild Examples of what it's capable of: (Don’t miss the 5th one)
Linting and testing: Automatically fix linting and testing errors.
o1 - API, Providers, Stats: The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason using ...
Options reference: Details about all of aider’s settings.
Options reference: Details about all of aider’s settings.

aider (Paul Gauthier) ▷ #questions-and-tips (18 messages🔥):

Aider Support for Gemini Flash 2, Working with /architect and /ask Modes, Managing Code Refactoring, File Upload Issues, Google Search Grounding in Gemini 2.0

Aider does not support special features for Gemini Flash 2: A member raised a question about Aider’s support for Gemini Flash 2’s grounding feature, but it was clarified that Aider doesn’t do anything special in the API for this.
- Another member mentioned that the Gemini models support Google Search grounding, with specific requirements related to the model and pricing involved.
Using /architect and /ask for Project Planning: Members discussed how to effectively utilize the /architect and /ask modes for defining project plans that could be implemented through /code mode.
- One member suggested requesting Aider to create a todo.md file for task tracking, enhancing workflow organization.
Challenges with Code Refactoring and Task Management: A member expressed that as projects grow larger, maintaining clean code becomes difficult when using Claude for feature development.
- Participants shared that without supervision, the generated code could become messy and might require refactoring steps along the way.
Issues with File Uploads on Aider: A member reported not receiving a file dropdown when attempting to add files to Aider, raising concerns about usability in the new version.
- Another user confirmed that this bug has been fixed in the main branch and provided instructions to update.
Integration of Google Search in Gemini 2.0: A member detailed that Gemini 2.0 Flash Experimental models on Vertex AI support Google Search grounding enabled by specific configurations.
- They shared a relevant GitHub pull request that enhances support for this functionality along with the YAML configuration needed for setup.

Links mentioned:

Repository map: Aider uses a map of your git repository to provide code context to LLMs.
FAQ: Frequently asked questions about aider.
yamad - Overview: yamad has 85 repositories available. Follow their code on GitHub.
FAQ: Frequently asked questions about aider.
GitHub - yamadashy/repomix: 📦 Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, and Gemini.: 📦 Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) o.....
Add support for Gemini 2.0 GoogleSearch tool by samling · Pull Request #7257 · BerriAI/litellm: TitleAdd googleSearch() tool to valid tool list for Gemini/VertexAI models to support Gemini 2.0 grounding.Relevant issuesEnhances #7188Type🆕 New Feature✅ TestChangesAdd googleSearch() too...

aider (Paul Gauthier) ▷ #links (11 messages🔥):

Depth AI, LightRAG, Codebase Indexing, AI Assistant Deployment

Depth AI impresses with code understanding: Users have been enjoying Depth AI for its ability to construct a comprehensive knowledge graph of their codebase, answering deep technical questions with 99% accuracy.
- Many found the setup easy, although indexing larger projects (200k - 1.5 mil tokens) may take some time, with one user noting a 180k token repo took 40 minutes.
LightRAG discussed as an alternative: A user suggested trying LightRAG, described as a simple and fast retrieval-augmented generation tool, during discussions about Depth AI.
- However, another user expressed preference for Depth AI, labeling it easier to set up and potentially more effective.
Indexing duration causes mixed reactions: While one user reported successful indexing of their project with Depth AI, another mentioned their medium-sized project has been indexing for 4 hours.
- The time taken to index appears to vary significantly based on the token size, emphasizing the importance of patience.
Concerns over Depth AI output: One user expressed frustration when Depth AI returned ‘no output was generated’ for their queries after completing the indexing.
- This raises questions about the reliability of output despite successful indexing.

Links mentioned:

Depth AI - AI that deeply understands your codebase: Chat with your codebase or build customised AI assistants. Deploy them wherever you work — Slack, Github Copilot, Jira and more.
GitHub - HKUDS/LightRAG: "LightRAG: Simple and Fast Retrieval-Augmented Generation": "LightRAG: Simple and Fast Retrieval-Augmented Generation" - HKUDS/LightRAG

OpenAI ▷ #annnouncements (1 messages):

12 Days of OpenAI, Stay Updated Role

Join the 12 Days of OpenAI Fun: OpenAI is encouraging members to stay in the loop during the 12 Days of OpenAI by picking up the <@&1261377106890199132> role in id:customize.
- This is a fantastic way to receive updates and be involved in the ongoing festivities.
Day 10 Celebration Features: The announcement highlights the Day 10 celebration through a linked YouTube video that showcases the ongoing events.
- Members are encouraged to check it out for exciting content related to the day’s activities.

OpenAI ▷ #ai-discussions (220 messages🔥🔥):

OpenAI vs Google AI advancements, Experiences with different AI models, AI and safety concerns, AI for personal assistance, DALL·E vs Midjourney for image generation

OpenAI and Google’s Competitive Landscape: Discussion centers around the competition between OpenAI and Google, with many participants believing Google is currently outperforming OpenAI in AI advancements.
- Concerns were raised about how OpenAI may be holding back models for competitive strategy, while others speculated that Google’s rapid innovation could define future AI landscapes.
Diverse Experiences with AI Models: Members shared their opinions on different AI models with many siding in favor of OpenAI’s GPT models for programming and math, while also highlighting some dissatisfaction with Gemini 2.0 Flash’s performance.
- Users expressed how agents could significantly enhance life for people with disabilities by performing tasks autonomously, reflecting a desire for practical applications of AI.
AI Safety and Ethical Concerns: Participants debated the effectiveness and ethics of current AI safety measures, with some claiming that current solutions may limit creativity and usefulness.
- Emphasis was placed on finding a balance between ensuring safety and allowing for the exploration of AI capabilities, with some noting over-censorship as a potential issue.
Interest in Personal AI Assistants: A significant discussion point was the desire for personal AI assistants that can manage tasks autonomously and simplify daily life, particularly for elderly users recovering from health issues.
- Conversations focused on how such technology could improve life quality, with references to Google’s ongoing developments in this area.
Comparing Image Generation Models: Users compared OpenAI’s DALL·E with Midjourney and Google’s Imagen, often lamenting the limitations and quality of DALL·E despite its no-cost access.
- Discontent was expressed about DALL·E’s output being easily recognizable as ‘AI-generated,’ while users highlighted Midjourney’s pricing and production quality as factors for consideration.

Link mentioned: GitHub - AlignAGI/Alignment: Promoting global awareness and action for ethical AI alignment and safeguarding humanity against AI self-replication risks. Includes research, frameworks, and open-source resources.: Promoting global awareness and action for ethical AI alignment and safeguarding humanity against AI self-replication risks. Includes research, frameworks, and open-source resources. - AlignAGI/Alig…

OpenAI ▷ #gpt-4-discussions (3 messages):

Custom GPTs functionality, Manager role in training

Training Role of ChatGPT Clarified: A member questioned whether the instruction ‘you are now a manager to train me’ functions effectively when prompting ChatGPT to assume a particular role.
- Is this the key to unlocking better responses?
Limitations on Editing Custom GPTs: Another member expressed frustration regarding the inability to edit custom GPTs, signaling a potential flaw in the system.
- Are we stuck without options?

OpenAI ▷ #prompt-engineering (4 messages):

Channel Posting Etiquette, Seeking Help in Appropriate Channels

Channel Posting Etiquette Under Fire: A member criticized another for posting in multiple channels, labeling it as spam and instructing to delete the messages from all but the correct channel <#1047565374645870743>.
- This comment emphasized the importance of using designated channels to maintain order and avoid confusion.
Searching for Help in the Right Place: One member expressed uncertainty about the appropriate channel, stating they were just trying to find the best input.
- This inquiry underscores the challenges users face when navigating channel guidelines for assistance.

OpenAI ▷ #api-discussions (4 messages):

Channel Overposting, Seeking Help, Proper Channel Usage, Spam Concerns

Channel Overposting Sparks Debate: A member questioned why a post was shared in four channels, highlighting concerns about spam.
- They suggested deleting the post from other channels to streamline the discussion.
Member Seeking Guidance: Another member expressed uncertainty about the appropriate channel for their inquiry, stating they were just looking for help.
- This raised questions about channel organization and member awareness.
Call for Proper Channel Usage: In response, a member emphasized that the correct channel for such posts is specified, recommending adherence to guidelines.
- They offered assistance after the post is removed from other locations.

Nous Research AI ▷ #general (210 messages🔥🔥):

Falcon Model Performance, Prompt Chaining Techniques, OpenAI Safety Discussions, Feedback and Evaluation Systems, API and Tool-Use Support in Models

Falcon Models Show Promise: The Falcon3 models, particularly the 7B and 10B versions, are demonstrating strong performance, with users expressing interest in testing their capabilities for various applications.
- Recent updates have added tool-use support, enhancing their functionality, especially in contexts requiring complex interactions.
Innovative Prompt Chaining Strategies: Discussion on prompt chaining highlighted its utility for enhancing model output by using a series of models to process and refine responses iteratively.
- Techniques such as structured output and tree structures are suggested to improve storytelling and other creative tasks.
OpenAI’s Safety Credibility Under Scrutiny: Concerns were raised about OpenAI’s focus on safety practices, particularly in light of a demonstration that showcased a jailbreak for their models during a comparison between GPT-4o and o1 preview.
- This has led to an ongoing conversation regarding the alignment between their safety claims and actual model vulnerabilities.
Feedback and Rating Systems: Users are implementing evaluation frameworks to assess story quality generated by models using specific rubrics which detail various narrative elements.
- This systematic approach aims to produce higher-quality outputs through iterative feedback and assessment mechanisms.
API and Local Model Performance: There is a discussion regarding the commonality of running inference without batching in local models, with users advocating for queuing requests for efficiency.
- The aim is to explore its integration into various applications, including market simulations, where real-world testing becomes essential.

Links mentioned:

Welcome to Langflow | Langflow Documentation: Langflow is a new, visual framework for building multi-agent and RAG applications. It is open-source, Python-powered, fully customizable, and LLM and vector store agnostic.
Scaling test-time compute - a Hugging Face Space by HuggingFaceH4: no description found
Tweet from Democratize Intelligence (@demi_network): "It's not a question of alignment between the company and AI, it's a question of alignment between the company and you. It’s going to be very important who your AI works for.If your AI is ...
tiiuae/Falcon3-7B-Instruct-1.58bit · Hugging Face: no description found
tiiuae/falcon-11B · Hugging Face: no description found
tiiuae/falcon-7b-instruct · Hugging Face: no description found
tiiuae/Falcon3-10B-Instruct · Hugging Face: no description found
Welcome to the Falcon 3 Family of Open Models!: no description found
tiiuae/falcon-40b-instruct · Hugging Face: no description found
tiiuae/Falcon3-10B-Instruct · Hugging Face: no description found
Tweet from xjdr (@_xjdr): this was one of the most interesting things i heard repeated from ~trusted sources a few times at NeurIPS (newsonnet being 400B dense)Quoting Aidan McLau (@aidan_mclau) @Heraklines1 @deedydas no not ...
Safepine: no description found
Reddit - Dive into anything: no description found

Nous Research AI ▷ #ask-about-llms (13 messages🔥):

Function calling on local models, Bias in function fetching, Effectiveness of search engines, Hermes 3 405B model issues, Pink elephant problem in AI responses

Exploring Function Calling Libraries: A query was raised on the best libraries and methods for function calling on small local models.
- This indicates an ongoing interest in optimizing AI performance on local systems.
Bias Due to Model Recall: Discussion centered around the pitfalls of using language models for data recall, emphasizing that correctness is subjective based on the source and purpose.
- Concern was expressed that models might mistake biased information as truth if they leverage generic web searches.
Search Engine Quality Debate: One member voiced frustration, suggesting that current search engines are plagued by SEO spam and untrustworthy news sites.
- They yearned for a superior search engine indexing all books and papers ever written.
Hermes 3 405B Model Feedback: A user reported issues with the Hermes 3 405B model reverting prompts during responses, despite instructions not to do so.
- They noted a comparison with GPT-4O showed fewer issues, questioning whether rephrasing prompts might help.
Pink Elephant Problem & Model Responses: The ‘pink elephant problem’ was discussed, illustrating how instructing models about what NOT to do can inadvertently trigger that behavior.
- Research on enhancing model robustness against such pitfalls was mentioned, prompting a shift in user prompting strategy.

Nous Research AI ▷ #research-papers (2 messages):

Signal and Noise in Inference, Consistency of LLM Output, Long Output Challenges

Signal vs. Noise: Key to Clear Thinking: The importance of the signal to noise ratio was highlighted as vital for coherent and clear inference, similar to its role in the human brain.
- When will we hear about something like this? indicates anticipation for deeper discussions on this topic.
Seeking Recommendations on LLM Consistency Papers: A member expressed interest in finding the best papers focused on the consistency of LLM output, especially for long and very long outputs.
- This prompts further exploration on the challenges faced by LLMs in maintaining output quality over extended text lengths.

Nous Research AI ▷ #research-papers (2 messages):

Signal and Noise in Inference, Consistency of LLM Output

Importance of Signal vs. Noise in Inference: A member emphasized that the signal and noise ratio is crucial for coherent and clear thinking inference, drawing parallels to its role in the human brain.
- When can we expect to hear more about this?
Seeking Recommendations on LLM Consistency Papers: Another member expressed interest in hearing recommendations for the best papers on the consistency of LLM output, specifically focusing on long to very long outputs.
- They asked for the group’s input, making it clear that they wanted relevant discussions around this topic.

Notebook LM Discord ▷ #announcements (1 messages):

3-panel UI changes, Suggested actions removal, Workarounds for missing features

3-panel UI rollout removes suggested actions: The new 3-panel UI has eliminated the ‘suggested actions’ feature that had been a part of NotebookLM, which included prompts like ‘Explain’ and ‘Critique’. The previous setup was rarely utilized due to its limited discoverability and functionality.
- Many users have noticed this change, which follows feedback that highlighted the sparse use of the suggested actions.
Plan to restore functionality with better design: The development team plans to restore much of the functionality from the suggested actions in a more intuitive way over the coming months. They aim to enhance the user experience by integrating new features that improve citations and response accuracy.
- Users are encouraged to share additional feedback as the improvements are implemented in the upcoming releases.
Alternative workarounds introduced: In the interim, users can recreate the suggested actions by copying text from sources and asking for explanations or summaries directly in the chat. The ‘convert all notes to source’ feature allows users to create a new source from notes for more structured querying.
- This method maintains functionality by ensuring responses include clickable citations while focusing on user notes directly.

Notebook LM Discord ▷ #use-cases (27 messages🔥):

Multilingual Functionality, Podcast Length Customization, Interactive AI Use Cases, Knowledge Base Generation, Creative Podcast Production

Multilingual Functionality Experimentation: Members are excited about experimenting with the interactive function of NotebookLM to streamline conversations in different languages, particularly in Brazilian and Bangla.
- One user mentioned that expressing multilingual in prompts makes it easier to engage in these discussions during the chat.
Podcast Length Customization Template Discussion: A suggestion was made to create a timing template to control episode length, with a member expressing their desire for longer podcasts to explore content deeply without skipping engaging dialogue.
- Another member questioned how such a template would function, implying a need for a range rather than an exact duration.
Creative Uses of Interactive AI: Various users discussed leveraging NotebookLM and similar tools for creative endeavors, including generating podcasts and engaging with niche topics that may not be widely covered.
- One user shared their approach of recording concise episodes while reviewing academic materials for an open-source prediction market platform.
Knowledge Base Generation with NotebookLM: A member inquired whether NotebookLM could generate a knowledge base similar to retrieval augmented generation (RAG), asking for insights or alternative solutions.
- Another user pointed to a YouTube video showcasing the use of NBLM as a knowledge base, suggesting that it may be indicative of what the inquirer was looking for.
AI-Powered Podcast Production Insights: A user shared their experience creating AI-generated podcasts, emphasizing the need to add personal commentary to avoid ‘AI slop’ and maintain content quality.
- They expressed plans to enhance their podcast by not just relying on first drafts from NotebookLM but also engaging in interactive mode for refined content.

Links mentioned:

- YouTube: no description found
Ask Gennie! Reverse Mortgage Q&A - What is a Reverse Mortgage for Seniors? What are the benefits of the reverse mortgages for elder people and retirees?: Ask Gennie! Mortgage Questions Answered with Experts from GenNext.Mortgage (NMLS #2326098) · Episode

Notebook LM Discord ▷ #general (194 messages🔥🔥):

NotebookLM Podcast Features, Interactive Mode Rollout, Audio Overview Functionality, Source Integration and Updates, Case Study Preparation Using NotebookLM

Challenges with Podcast Length Control: Users are struggling to set specific lengths for podcasts, with efforts to include audio length notes often being ignored.
- Some suggest using precise prompting techniques, but inconsistencies in output persist.
Interactive Mode Rollout Issues: The interactive mode feature’s rollout is slow and random; users with the new UI may not access the feature yet.
- Feedback indicates that audio generation frequently lags or fails, with some users experiencing resets to manage limitations.
Syncing Google Docs as Sources: Users are uncertain if Google Docs linked as sources automatically sync with updates or require manual refreshes.
- Currently, sources do not auto-update, raising questions about future roadmap plans for auto-syncing files.
Combining and Managing Notes: The new UI lacks the ability to combine selected notes, restricting operations to only single or all notes at once.
- This limitation has sparked discussions about potential UI improvements to facilitate better note management.
Case Studies and Study Aids: Users share experiences utilizing NotebookLM for study aids, emphasizing the tool’s assistance in organizing speaker notes.
- In-depth tips for exam preparation, particularly for case studies, highlight the importance of applying concepts through thorough resource integration.

Links mentioned:

Noob GIF - Noob - Discover & Share GIFs: Click to view the GIF
Upgrading to NotebookLM Plus - NotebookLM Help: no description found
NotebookLM gets a new look, audio interactivity and a premium version: NotebookLM is introducing new features, and a premium version called NotebookLM Plus.
Reddit - Dive into anything: no description found
- YouTube: no description found

Unsloth AI (Daniel Han) ▷ #general (66 messages🔥🔥):

Fine-tuning Llama 3.2, Batch Size and Training, Function Calling in Models, Multi-GPU Support in Unsloth, Overfitting in Machine Learning Models

Fine-tuning Llama 3.2 and 4-bit Conversion: A member is exploring how to effectively fine-tune the Llama 3.2 model with added datasets, discussing options for loading previous checkpoints.
- Another member emphasized that settings like load_in_4bit=true allow automatic conversion for models not uploaded by Unsloth.
Optimizing Batch Size for Training: A discussion arose about the optimal batch size, where a larger size may improve training stability and accuracy, although it requires more VRAM.
- Members agreed that increasing gradient accumulation could be an alternative for those with limited VRAM.
Function Calling and Models Understanding: Clarification was sought about model prompt formats including function calls, with some members noting that including special tokens directly is feasible.
- A resource link was shared, illustrating the prompt format for function calling in Llama models.
Multi-GPU Support for Unsloth Pro: A user inquired if multi-GPU support for Unsloth Pro is operational, specifically if it works with local setups or only through cloud platforms.
- The response confirmed that multi-GPU functionality is available, enhancing the model training experience.
Addressing Overfitting in Fine-tuned Models: A member reported poor performance from their exported fine-tuned model on Hugging Face, suggesting potential overfitting.
- Another member advised that issues might stem from the model parameters or dataset quality rather than the fine-tuning framework itself.

Links mentioned:

no title found: no description found
Tutorial: How to Finetune Llama-3 and Use In Ollama | Unsloth Documentation: Beginner's Guide for creating a customized personal assistant (like ChatGPT) to run locally on Ollama
llama-models/models/llama3_2/text_prompt_format.md at main · meta-llama/llama-models: Utilities intended for use with Llama models. Contribute to meta-llama/llama-models development by creating an account on GitHub.

Unsloth AI (Daniel Han) ▷ #off-topic (139 messages🔥🔥):

Open Source Reasoning Models, Unsloth Model Training, Fine-Tuning with QwQ, DiLoCo Presentation, LORA vs Model Architecture

Open Source Reasoning Models Debate: Members discussed the effectiveness of open source reasoning models like QwQ potentially outperforming traditional models, noting that while reproducing reasoning is easy, creating a successful model remains challenging.
- There’s skepticism around the necessity of reinforcement learning (RL) in current model designs, with suggestions that pure supervised fine-tuning (SFT) coupled with high-quality datasets may suffice.
Unsloth Training Experiences: A user detailed their experience with Unsloth for training models, encountering issues related to saving models in GGUF format due to dependencies on external repositories.
- The conversation included troubleshooting methods, highlighting the importance of proper installations and the need for specific files to be present for successful execution.
Differences between Adapter and Model Explained: Users received clarification that models consist of a collection of weights affecting their parameters, while Low-Rank Adapters (LoRAs) only modify a small subset of these parameters.
- This discussion emphasized how LoRAs can be combined with models for efficient training without altering the entire architecture.
DiLoCo Research Sharing: One member shared their research on DiLoCo (Distributed Low-Communication Training of Language Models) and created a presentation for their group, sparking interest from others in the channel.
- The member was encouraged to post their findings in a broader context for additional feedback.
Training with LORA Output Size Queries: A user inquired about the expected output size when training a model with LoRA, noting that their output was significantly smaller than expected due to the nature of adapter training.
- Discussions followed about how to combine models and adapters effectively, with references to documentation on saving and quantizing models properly.

Links mentioned:

Google Colab: no description found
Hugging Face – The AI community building the future.: no description found
Fine-Tuning Ollama Models with Unsloth: In the previous two articles, we explored Host Your Own Ollama Service in a Cloud Kubernetes (K8s) Cluster and Run Your Own OLLAMA in…
Saving to GGUF | Unsloth Documentation: Saving models to 16bit for GGUF so you can use it for Ollama, Jan AI, Open WebUI and more!
Eule - a kaleinaNyan Collection: no description found
kaleinaNyan/eule-qwen2.5instruct-7b-111224 · Hugging Face: no description found
Unsloth Notebooks | Unsloth Documentation: See the list below for all our notebooks:
DiLoCo: Distributed Low-Communication Training of Language Models: DiLoCo: Distributed Low-Communication Training of Language Models OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training INTELLECT-1 Technical Report
DiLoCo: Distributed Low-Communication Training of Language Models: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected acc...
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training: OpenDiLoCo is an open-source implementation and replication of the Distributed Low-Communication (DiLoCo) training method for large language models. We provide a reproducible implementation of the DiL...

Unsloth AI (Daniel Han) ▷ #help (15 messages🔥):

Llama 3.2 training loss, M4 MAX GPU compatibility, Unsloth support on Mac

Loss Discrepancy with Llama 3.2: A user reported that their loss is 3x higher when training the Llama 3.2 1bn instruct model using the llama template compared to the alpaca prompt, initially starting at 5.1.
- Another user sought clarification on whether the dataset was used correctly with the llama template.
M4 MAX GPUs still uncharted territory: A user inquired about support for M4 MAX GPUs, noting that the current conda install instructions are only for CUDA.
- The response indicated that Unsloth is not currently supported on Macs.
Mac Support Timeline Speculated: A member speculated that support for Macs should land around Q2 2025, but it depends on available time for development.
- Community contributions are encouraged to expedite this process.
Limited Fine-Tuning Options on Mac: A user mentioned the lack of fast fine-tuning alternatives on Mac and questioned if that is still the case.
- The response confirmed the uncertainty, as that user does not have NVIDIA hardware.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenAI o1 model, Structured outputs, EVA Llama model, Price reductions, Provider pages

OpenAI o1 model launches with cool features: The new OpenAI o1 model is live, succeeding the o1-preview with features like function calling and reduced latency.
- It introduces a new reasoning_effort API parameter for controlling the model’s thinking time before answering, enhancing user interactivity.
Structured outputs gain traction: OpenRouter now normalizes structured outputs for 46 models across 8 different companies, making it easier to get results in a preferred format.
- A tutorial on this finesse was shared here, highlighting its relevance in practical usage.
New storytelling model EVA Llama joins the lineup: A new roleplay and storytelling model, EVA Llama, has been launched along with updates for Grok 2 and Cohere models.
- Users can explore EVA Llama details in more depth via this link.
Exciting price drops on popular models: A 12.5% reduction has been implemented for the mythomax-l2-13b model, making it more accessible.
- In addition, there’s a whopping 55% price drop for the sought-after QwQ reasoning model, impressing the community with affordability.
Provider pages offer insightful analytics: Users can now click on provider names to view model hosting charts, enhancing transparency about performance over time.
- An example was noted with DeepInfra’s provider page, providing detailed insights.

Links mentioned:

o1-preview - API, Providers, Stats: The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding.The o1 models are optimized for math, science, programming, and other STEM-related tasks...
Tweet from OpenRouter (@OpenRouterAI): Structured outputs are very underrated. It's often much easier to constrain LLM outputs to a JSON schema than asking for a tool call.OpenRouter now normalizes structured outputs for- 46 models- 8 ...
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
EVA Llama 3.33 70b - API, Providers, Stats: EVA Llama 3.33 70b is a roleplay and storywriting specialist model. Run EVA Llama 3.33 70b with API
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
MythoMax 13B - API, Providers, Stats: One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. #merge. Run MythoMax 13B with API
QwQ 32B Preview - API, Providers, Stats: QwQ-32B-Preview is an experimental research model focused on AI reasoning capabilities developed by the Qwen Team. As a preview release, it demonstrates promising analytical abilities while having sev...
Tweet from OpenRouter (@OpenRouterAI): OpenAI o1 is now live for all! Try its 🧠 on:- image inputs- structured outputs- function calling- a "reasoning effort" controlThe Chatroom link below has a couple challenges you can try with ...

OpenRouter (Alex Atallah) ▷ #general (209 messages🔥🔥):

Exposed OpenRouter keys, Chat details in API, Using OpenRouter API keys with PKCE, OpenRouter pricing structure, Model performance comparisons

Reporting Exposed OpenRouter Keys: A user discovered exposed OpenRouter API keys on GitHub with limits over $100 and inquired where to report them, with a member suggesting [email protected].
- There was a discussion about the safety of sending these compromised keys over email.
Retrieving Chat Details in API: An inquiry was made about viewing chat details of API calls, amid concerns about the inability to retrieve prompts or responses once the metadata is accessed.
- A suggestion was made for having a flag to see conversations as chats instead of stateless requests.
Using OpenRouter with PKCE: A user discussed creating a web app using OpenRouter API keys via PKCE, considering the security of handling keys on the client-side versus the backend.
- Recommendations were made for managing API keys securely while maintaining a near-stateless architecture.
OpenRouter Pricing and Costs: Clarifications were sought regarding the costs associated with OpenRouter’s service, particularly if using own API keys incurred additional fees.
- It was noted that using custom keys incurs a 5% fee on top of the upstream provider’s costs.
Performance of Various Models: A user noted inconsistencies in model responses, particularly with QwQ, prompting discussions about the role of model sizes in instruction following.
- Users were encouraged to utilize higher-end models like Google Experimental 1206 or DeepSeek-v2 for more consistent coding assistance.

Links mentioned:

OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
Integrations | OpenRouter: Bring your own provider keys with OpenRouter
Model Spec (2024/05/08): no description found
Limits | OpenRouter: Set limits on model usage
LLM Rankings | OpenRouter: Language models ranked and analyzed by usage across apps

Eleuther ▷ #general (1 messages):

Retail/E-commerce ad models, Runway, OpenAI Sora, Veo 2

Seeking Effective Retail Ad Models: A member queried about effective models for creating retail/e-commerce ad content, including both video and copy formats.
- They specifically mentioned considering Runway, OpenAI Sora, and Veo 2, while inviting suggestions for other options.
Exploring Alternatives for Ad Content: The discussion aimed to identify other potential models tailored for ad content beyond what was already mentioned.
- The member’s focus on gathering diverse options led to a broader conversation on existing technologies in the market.

Eleuther ▷ #research (123 messages🔥🔥):

Warmup phase for learning rates, Meta-Learning to reduce overfitting, Compression methods in neural networks, Grokking in large models, Koopman operator theory in neural networks

Debating the Warmup Phase Formula: Kevin’s formula for approximating the warmup phase, which is (1 - beta1^step), currently lacks support from LR schedulers, leading to discussions on its implementation.
- Members shared their implementations, expressing concerns about off-by-one errors related to step counts when using lambdaLR.
Utilizing Meta-Learning to Address Overfitting: A discussion emerged on whether Meta-Learning could assist in mitigating overfitting in supervised learning models with specific examples requested.
- The community noted that while theoretical frameworks exist, practical examples remain sparse.
Exploring Compression Techniques in Neural Networks: Members explored ideas around compressing neural networks, with emphasis on depthwise compression and neural network pruning methods like OATS which combine sparse and low-rank matrices.
- Concerns were raised about the potential loss in data coverage and performance due to compression, particularly in regards to models trained for memorization tasks.
Grokking as a Central Theme in AI Research: The phenomenon of grokking was discussed, focusing on its significance and the lack of compelling methods to induce it within AI models.
- There is a shared sentiment that while grokking is somewhat studied, the predominant research interest lies with large language models, overshadowing broader exploration.
Skepticism Towards Koopman Theory Integration: The applicability of Koopman operator theory to neural networks was debated, with members expressing skepticism about its benefits and the legitimacy of framing neural layers as dynamical systems.
- Critics pointed out the potential for obfuscation in the paper, arguing it mainly translates to the utilization of residual connections rather than introducing significant innovations.

Links mentioned:

Tweet from BlinkDL (@BlinkDL_AI): RWKV-7-World 0.1B (L12-D768) trained w/ ctx4k perfectly solves NIAH ctx16k 🤯 100% RNN and attention-free. RWKV is all you need. https://www.rwkv.com/ #RWKVQuoting BlinkDL (@BlinkDL_AI) RWKV-7 "Go...
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition: The recent paradigm shift to large-scale foundation models has brought about a new era for deep learning that, while has found great success in practice, has also been plagued by prohibitively...
Are Emergent Abilities of Large Language Models a Mirage?: Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguin...
Time-Delay Observables for Koopman: Theory and Applications: Nonlinear dynamical systems are ubiquitous in science and engineering, yet analysis and prediction of these systems remains a challenge. Koopman operator theory circumvents some of these issues by con...
Tweet from BlinkDL (@BlinkDL_AI): RWKV-7 "Goose" 🪿 0.4B trained w/ ctx4k automatically extrapolates to ctx32k+, and perfectly solves NIAH ctx16k🤯Only trained on the Pile. No finetuning. Replicable training runs. tested by ou...
Common arguments regarding emergent abilities — Jason Wei: This blog post doesn’t represent the positions of my employer (past, present, or future). I’ll review some common arguments that come up when discussing emergent abilities of large language models...
Representing Neural Network Layers as Linear Operations via Koopman Operator Theory: The strong performance of simple neural networks is often attributed to their nonlinear activations. However, a linear view of neural networks makes understanding and controlling networks much more ap...
Growing Neural Cellular Automata: Training an end-to-end differentiable, self-organising cellular automata model of morphogenesis, able to both grow and regenerate specific patterns.
GitHub - Jamba15/SpectralTools: Spectral analysis and training of dense layers: Spectral analysis and training of dense layers . Contribute to Jamba15/SpectralTools development by creating an account on GitHub.

Eleuther ▷ #lm-thunderdome (6 messages):

doc_to_text function arguments, Creating new configs, Overloading config fields

Extra arguments for doc_to_text function: A user inquired whether it’s possible to pass extra arguments to the doc_to_text function in a new task.
- Another member clarified that the main entry point for this is through configs.
Creating different configs for prompts: A user explained they have a base config where functions are defined and are considering separate configs for different prompts.
- This would lead to the creation of different subtasks for each prompt, enhancing task customization.
Overloading configs with included tasks: It was suggested that a new config can be created based on another using include: <other configs> to overload specific fields.
- However, this approach would apply the overload across all included tasks in that group config, which could limit flexibility.
Link to MMLU config example: A member shared that users can also add contents to a group config, but it will overload the included tasks overall.
- They provided a reference link to the MMLU config for further details.

Eleuther ▷ #gpt-neox-dev (9 messages🔥):

WANDB logging, Configuring WANDB run names, Pull Requests on features

WANDB Logging for MFU and Performance Metrics: A member inquired about the possibility of logging MFU, batches/sec, and tokens/sec to WANDB during pretraining with neox, hinting that it would be beneficial for direct plots.
- Another member confirmed that while there isn’t an option currently available, it may be implemented similarly to the existing logging method.
Setting WANDB Run Names in Config: A user sought clarity on how to set a WANDB run name from the config, but encountered errors when attempting to add it directly.
- One member responded that this option isn’t currently available but promised to add it along with the metrics logging in a forthcoming PR.
Pull Requests Planned on Features: A member expressed intent to submit a pull request (PR) for the non-parametric layernorm feature over the weekend.
- Another member offered assistance with the logging improvements but later assured they will handle the PR themselves.

Link mentioned: gpt-neox/megatron/logging.py at f5325805678c2b9e35aae4528283e0132c5f5bbc · EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries - EleutherAI/gpt-neox

Stability.ai (Stable Diffusion) ▷ #general-chat (122 messages🔥🔥):

Lora Training Techniques, Current Models in Use, Running Stable Diffusion on Linux, Navigating Image Resolution and Performance, Understanding AI Generated Content and Models

Effective Steps for Lora Training: A user shared practical steps for creating a Lora: start with a strong dataset, choose an appropriate model, train the Lora, then test it. They emphasized research on creating quality datasets for optimal results.
Preferred Models for Stable Diffusion: Various users discussed their models of choice; some favor the ‘flux’ model while others recommend ‘InvokeAI’ for its usability. Others pointed out the importance of having an NVIDIA GPU, suggesting a 3060 with 16GB VRAM for smoother performance.
Stable Diffusion on Ubuntu Challenges: Users expressed frustrations with running SDXL on Ubuntu, citing issues with ComfyUI and Forge UI’s Linux compatibility. It was noted that running SDXL effectively may require familiarity with the system.
Choosing Image Resolution for Generative Models: A beginner asked about optimal image resolution for generation, finding a balance between quality and processing time. Recommendations included experimenting with around 1024x1024 resolution and using hires.fix for better quality output.
Understanding AI Generated Content Metrics: Discussion emerged around the techniques and metrics used in model training, specifically with the Pony model and its scoring system. Users noted how this unique approach impacts image generation and community perceptions.

Links mentioned:

Epoch Helper - v1.1 | Other Other | Civitai: source code - https://github.com/Monkellie/epochcalc # The Epoch Helper Tool This is a tool I created (AI Assisted) to help myself with calculation...
- YouTube: no description found
stable-diffusion-webui-forge/webui-user.sh at main · lllyasviel/stable-diffusion-webui-forge: Contribute to lllyasviel/stable-diffusion-webui-forge development by creating an account on GitHub.
static FFmpeg binaries for macOS 64-bit Intel: Download static FFmpeg binaries for macOS 64-bit Intel. snapshots and release binaries are available. FFmpeg developers strongly encourage all users to use a current snapshot build instead of a releas...

Perplexity AI ▷ #announcements (1 messages):

Custom Web Sources, Perplexity Spaces

Introducing Custom Web Sources in Perplexity Spaces!: Perplexity now allows users to choose custom web sources for searches, enabling further tailoring of queries to specific use cases that matter most to you.
- Accompanying this announcement is a launch video showcasing the new feature.
A New Level of Customization!: This update provides enhanced customization options for users, allowing them to curate their Perplexity experience more effectively based on their specific needs.
- By selecting the websites Perplexity searches, users can improve the relevance and quality of the information retrieved tailored to their preferences.

Perplexity AI ▷ #general (108 messages🔥🔥):

Perplexity Pro Subscriptions, New Features and Updates, User Experience with AI Models, Rate Limits and Performance, User Interface Suggestions

Perplexity Pro Subscriptions Available: Users discussed the launch of Perplexity Pro subscriptions that allow gifting knowledge with options for 1 to 12 month duration, enhancing the user experience.
- This subscription provides additional features such as searching 3x as many sources and accessing the latest AI models.
Call for New Features amidst User Expectations: Members expressed desires for new features from Perplexity, especially as competitors like Google and OpenAI release new models frequently, generating a sense of stagnation at Perplexity.
- Thoughts on potentially collaborating with firms like Meta for advancements were also shared, highlighting urgency in innovation.
Concerns about Rate Limits: A user reported hitting rate limits while using Perplexity, receiving messages indicating they needed to sign up for higher personalized rate limits for better access.
- Other users speculated on the actual benefits of higher tiers in alleviating these restrictions and shared personal experiences with the rate limits.
User Interface Enhancement Suggestions: One user suggested adding a snowfall effect to the Perplexity UI, receiving mixed feedback; some found it visually appealing while others preferred practicality.
- Members continued to discuss how interface aesthetics and usability could better meet their professional needs.
Discussion on AI Model Performance: Conversations emerged around the performance of AI models, with some users feeling the Pro Search quality could be improved based on their experiences.
- A user proposed using Claude 3.5 Sonnet for better outcomes and questioned the claimed advancements with models like GPT-4o.

Links mentioned:

Tweet from Perplexity Supply (@PPLXsupply): Give the gift of knowledge. Perplexity Pro gift subscriptions now available.
Perplexity Pro Subscription | Perplexity Supply: Perplexity Supply exists to explore the relationship between fashion and intellect with thoughtfully designed products to spark conversations and showcase your infinite pursuit of knowledge.
TikTok - Make Your Day: no description found

Meta vs OpenAI Pro-Fit, Microbe Threat Warning, Plant Communication, Dopamine Precursors, Cell Revival

Meta wants to block OpenAI’s for-profit ventures: Meta has expressed its desire to prevent OpenAI from pursuing for-profit business models, as discussed in various forums.
- An intriguing discussion surrounds the impact this could have on future AI developments within the industry.
Microbe Threat Warning surfaces: The community highlighted a recent warning regarding potential threats posed by microbes, an issue that could affect ecological balance.
- The discourse included references to preventive measures and the importance of awareness in addressing these microbial risks.
Plants exhibit crying behavior: A shared article explored the concept of plants exhibiting ‘crying’ behavior, suggesting a unique form of communication among flora.
- The findings suggest implications for understanding plant responses to environmental stress, stirring curiosity about plant intelligence.
Understanding dopamine precursors: A resource was linked concerning precursors to dopamine, shedding light on the biochemical pathways essential for mental health.
- This topic stirred interest in the community regarding its relevance to neurological studies and potential therapeutic outcomes.
Reviving dead cells technology: Members discussed fascinating advancements in technology that allow for the revival of dead cells, posing significant bioethical questions.
- The implications of this technology on both medicine and ethics sparked lively debate within the group.

Link mentioned: YouTube: no description found

Perplexity AI ▷ #pplx-api (1 messages):

Perplexity API, Web Search Feature, Cost Overview

Inquiring about Web Search in Perplexity API: A member asked if the web search feature is included in the chat completion API call for their upcoming project using Perplexity.
- This raises important questions about the integration capabilities of Perplexity compared to other APIs they’ve worked with.
Seeking Cost Overview for Perplexity: The same member expressed interest in finding a cost overview for using Perplexity’s services.
- Understanding the pricing structure will be crucial for planning their project effectively.

GPU MODE ▷ #general (41 messages🔥):

6D Parallelism Article, PC Troubleshooting, GPU Performance and Coil Whine, Multi-GPU Instances with NVLink, Coil Whine and Audio Experimentation

Insight into 6D Parallelism: A detailed article on 6D parallelism highlights collective communications in training, aiming to provide clearer visuals and explanations compared to existing resources.
- It critiques the lack of depth in other writings that fail to address the complexity of combining various training approaches.
Successful PC Boot After Wait: A user experienced initial issues with their new PC, reporting no signal to the monitor until the system booted successfully after about a minute as the LEDs turned off.
- Another member suggested trying a single memory stick to troubleshoot the issue.
Radeon Card Frustrations: A user expressed dissatisfaction with their Radeon card, noting it produces around 10 FPS more than an Nvidia 4060, but suffers from unacceptable coil whine.
- They concluded that both the Radeon and Nvidia cards have their drawbacks, with the coil whine from the Radeon being particularly troublesome.
Seeking Multi-GPU Instances on VastAI: A user inquired about finding multi-GPU instances with NVLink on VastAI, concerned about the limited NVLink bandwidth visible in the listings.
- They speculated that another member might have experience with this subject based on past conversations.
Coil Whine Musical Experiment Idea: Discussion arose about the notion of creating a program that uses coil whine from GPUs to play music, with members noting its pitch changes based on power draw.
- One member humorously suggested donating a GPU for this musical project, linking it to their experiences with coil whine.

Links mentioned:

NCCL Source Code Study: no description found
Visualizing 6D Mesh Parallelism: Plus some lore

GPU MODE ▷ #triton (1 messages):

Kernel Computation Optimization, Memory Management in GPU, Output Concatenation Techniques

Concatenation inside GPU Kernel: A user asked if there’s an efficient way to concatenate outputs inside a loop within a GPU kernel, referencing previous successful summation methods.
- They suggested that writing out to global memory during the loop may be slow and inquired about using syntax like var[idx:idx+block_size] = value as a feasible alternative.
Seeking Efficient Memory Techniques: Another discussion point emerged regarding the speed of writing to global memory when concatenating outputs in GPU kernels.
- The user emphasized the need for a non-slow solution while running operations inside a loop, indicating a common concern among developers.

GPU MODE ▷ #cuda (8 messages🔥):

CUDA Memory Copy Issues, Comparing A100 and H100 GPUs, AMP Related Differences

CUDA Memory Copy Issues: A member reported that commenting out a specific code segment related to IsDense(y) && IsSame(x, y) resulted in correct functionality, while including it led to unexpected behavior during LLM model inference.
- They noted that CudaCopy will trigger CUDA kernels, raising questions on the handling of memory operations in this context.
A100 vs H100 Training Discrepancies: A member inquired about the differences between A100 and H100 GPUs during training, specifically noting a 0.3% loss discrepancy in the first step of training with a single GPU task.
- This unexpected result prompted discussions and potential comparisons of performance metrics between the two GPU models.
Questions on CUDA Graph Support: A member raised concerns regarding a lack of official documentation on why CUDA graphs could not support the cudaMemcpyAsync operation within their implementations.
- This led to further discussions about asynchronous operations and their limitations with CUDA graphs.
AMP Potential Influence: A member speculated if the discrepancies noticed between the A100 and H100 might be related to Automatic Mixed Precision (AMP) settings utilized during training.
- This opened up a dialogue on how AMP can affect training outcomes and whether adjustments are necessary for different GPU models.

GPU MODE ▷ #torch (33 messages🔥):

Megatron-LM efficiency, Torch.compile warnings handling, Distributed training community, FlexAttention development, Keras/PyTorch contributions

Megatron-LM’s training efficiency questioned: A member inquired if Megatron-LM is still efficient for training, as they plan to enhance training throughput in a distributed setup.
- Another member suggested reaching out to Gensyn for insights, mentioning Christine Yip’s active community for distributed training.
Handling Torch.compile warnings intelligently: A user sought guidance on managing torch.compile warnings when supporting various shapes, highlighting slow kernels with dynamic=True.
- Another member proposed using fn_compiled = torch.compile(fn) to adaptively handle function calls without being tied to the decorator.
Challenges in developing FlexAttention: A discussion highlighted the difficulties faced in upgrading PyTorch, with members sharing the extensive processes involved in image builds.
- One member acknowledged the aim to make FlexAttention robust while managing upgrade challenges due to their customized setups.
Wrapping model invocations for better flexibility: A member suggested wrapping the model invocation rather than the torch.compile for better handling of shape variations.
- They also noted the possibility of using Python’s warnings module to filter out specific warnings rather than suppressing all logs.
Interest in contributions to Keras/PyTorch: An inquiry was raised about ongoing contributions to Keras and PyTorch, emphasizing community engagement.
- This may signal an interest in collaboration or getting involved in further development efforts.

Links mentioned:

Reduce time to first kernel when using CUDA graphs: I’ve been profiling the inference stack that I’m using against vLLM and I found that in their case after calling graph replay, the first kernel gets executed almost instantly(left), whereas in my code...
GitHub - pytorch/torchtitan: A native PyTorch Library for large model training: A native PyTorch Library for large model training. Contribute to pytorch/torchtitan development by creating an account on GitHub.

GPU MODE ▷ #cool-links (4 messages):

Raspberry Pi 5 Deployment, Edge Device Models, Esp32 / Xtensa LX7 Chips

Raspberry Pi 5 with NVMe Speeds Up LLM Performance: The Raspberry Pi 5 has been overclocked to 2.8GHz with a 256GB NVMe to enhance data transfer speeds for deploying smaller 1.5B parameter models.
- Using Ollama compiled with OpenBlas, the models are run locally on the Pi5, streamlining edge device operations.
Excitement for Esp32 / Xtensa LX7 Chips: There’s anticipation for the Esp32 / Xtensa LX7 chips to enable a new scenario where LLMs are called remotely via API.
- One user expressed enthusiasm, stating it looks fun! as they explore different deployment strategies.

GPU MODE ▷ #jobs (1 messages):

MatX LLM accelerator, Job openings in ML, ASIC roles

MatX develops LLM accelerator ASIC: MatX is actively building an LLM accelerator ASIC aimed at enhancing machine learning performance and efficiency.
- They are currently seeking candidates for roles like low-level compute kernel author, compiler, and ML performance engineer to join their team.
Hiring opportunities at MatX: MatX has multiple job openings listed on their website, including positions crucial for the development of their ASIC technology.
- Interested candidates can find more details about these opportunities at MatX Careers.

Link mentioned: Tweet from MatX | Jobs: no description found

GPU MODE ▷ #torchao (5 messages):

int4group scheme, Training process quantization, Tinygemm compute method

Clarification on int4group flow: A member inquired whether in the int4group scheme, the weights remain quantized (int4) while activations stay in fp16, leading to a matmul of fp16 x int4 = fp16.
- An image was shared to visualize the process, confirming the understanding aligns with the described flow.
No quantization for activations during training: A discussion about the training process questioned if there would be any fake quantization for activations.
- It was clarified that Tinygemm uses bf16 for compute, and at both QAT and inference time, activations remain unquantized.
On-the-fly dequantization in kernels: A member confirmed that the dequantization of int4 weight occurs on the fly inside the matmul kernel.
- This aligns with expectations on processing flow, providing clarity on how the matmul kernel operates.

GPU MODE ▷ #rocm (1 messages):

MigraphX in MI300X, ONNX frontend support, Opset compatibility

Building MigraphX on MI300X for ONNX: Discussion centered around the possibility of building MigraphX on the MI300X for the ONNX frontend, suggesting it should be feasible.
- One member noted, ‘I didn’t check the opset(11) supported, should be the latest one’ indicating the need for further verification on compatibility.
Inquiry on ONNX Opset Support: A question was raised regarding whether the operations for opset(11) are supported in the latest implementation.
- This indicates a potential gap in the current knowledge that may need further exploration by the team.

GPU MODE ▷ #thunderkittens (1 messages):

kimishpatel: what i cam here for 🙂

GPU MODE ▷ #arc-agi-2 (18 messages🔥):

Custom Vision Encoder, Chain of Thought Generation, Axolotl Configurations, Efficient Sampling Processes, Experimenting with Finetuning

Custom Vision Encoder Discussion: A member suggested creating a custom vision encoder to integrate with existing language models, as current models may not handle small pixel-scale images effectively.
- The potential benefits of flexibility in pairing the encoder with various LLMs were highlighted, outweighing the improvements from pretrained VLMs.
Exploring Chain of Thought Generation: Discussion centered around the granularity of chain of thought (CoT) implementation, questioning whether it merely explains core ideas or genuinely offers iterative thinking.
- One member proposed dual methods: a reasoning monologue before outputs and multiple templates for guided exploration based on riddle types.
Axolotl Lora Configuration Success: A member confirmed that the example Axolotl Lora config for llama-3-vision works well with 2x A6000 GPUs.
- There was an interest in finding compute sponsors to support larger experiments once initial setups are validated.
Decentralized Sampling Process for CoT Prompts: The potential for running sampling processes without training was discussed, aiming to improve CoT prompts through human-guided exploration.
- This decentralized approach could help in collecting datasets efficiently for future research.
Experimenting with RTX 3090 Finetuning: A member mentioned their capability to run experiments on an RTX 3090 while inquiring about the best finetuning setup using bf16 or Qlora+int8.
- There was confirmation that 8bit Lora could indeed work for 8B models on the RTX 3090, referencing an example from WandB.

Links mentioned:

augmxnt: Weights & Biases, developer tools for machine learning
axolotl/examples/llama-3-vision/lora-11b.yaml at effc4dc4097af212432c9ebaba7eb9677d768467 · axolotl-ai-cloud/axolotl: Go ahead and axolotl questions. Contribute to axolotl-ai-cloud/axolotl development by creating an account on GitHub.

LM Studio ▷ #general (87 messages🔥🔥):

LM Studio setup, Qwen QwQ and roleplay LLMs, Model compatibility and errors, Using LM Studio on mobile, New developments in AI models

LM Studio Setup and Compatibility: Users discussed their setups for LM Studio, mentioning various hardware configurations like RTX 4060 laptops and M3 Max with 96GB RAM, showcasing the application’s versatility.
- A specific case highlighted issues with loading Llama 3.2 11b Vision in LM Studio, where one user encountered a ‘unknown model architecture’ error.
Qwen QwQ as a Roleplay LLM: Discussions led to suggesting Qwen QwQ as a suitable option for roleplay style applications, with several users expressing satisfaction with its performance.
- One member noted that Qwen2 performs exceptionally well with Python, indicating its robustness in programming contexts.
Model Errors and Download Issues: An error message related to ‘Safetensors header is unexpectedly large’ prompted conversations about potential file corruption during downloads.
- Reminders to ensure that models were downloaded correctly from within LM Studio were made, with some users reporting successful loads on their systems.
Using LM Studio on Mobile Devices: A member expressed interest in accessing LM Studio from their phone while on the go, but discovered there is currently no mobile app available.
- Suggestions were made to explore alternatives, but direct mobile compatibility remains a limitation.
Recent Developments in AI Models: Users inquired about developments in the application of o1-like CoT to open-source models and mentioned the Falcon3 bitnet models.
- The community highlighted ongoing interests and trends in AI model enhancements, speculating on future possibilities and accessibility.

Links mentioned:

mlx-community/Llama-3.2-11B-Vision-Instruct-4bit · Hugging Face: no description found
Tweet from Haider. (@slow_developer): 🚨 NVIDIA Introduces Jetson Nano Super> compact AI computer capable of 70-T operations per second> designed for robotics, it supports advanced models, including LLMs, and costs $249
LM Studio Beta Releases: LM Studio Beta Releases

LM Studio ▷ #hardware-discussion (17 messages🔥):

3060ti confusion, AMD driver issues, Llama model performance, Inference hardware desires, RAM requirements for large models

3060ti vs Regular 3060: Discussion sparked over a possible 3060ti variant with 11GB, leading to a clarification that it might be referring to the regular 3060 with 12GB.
- Participants expressed confusion over their specifications, with one noting unusual performance behavior.
AMD GPUs and Driver Problems: It was mentioned that the Radeon VII might be experiencing the same issues as other AMD GPUs with the 24.12.1 driver, which led one user to revert to 24.10.1.
- Issues included loading models forcing the GPU to 100% utilization without power usage, resulting in significant lag.
Llama Model Performance Concerns: One user reported a stark decrease in performance on a simple Llama 3.2 3B model, from 90+ tok/s on 24.10.1 to 20 tok/s on the new driver.
- Another user suggested checking configurations to ensure llama.cpp is set to use CUDA to potentially improve performance.
Desire for Powerful Hardware: A user expressed a desire for a powerful M4 MacBook Pro for inference, reflecting on their experience with an M2 MBA as a mere introduction.
- Another member humorously commented on the slippery slope of needing more powerful hardware by referencing gddr6x.
RAM Requirements for Large Models: Users discussed the RAM requirements for running large models, noting that running a 70B model requires 70GB either in VRAM or main memory.
- It was emphasized that having 10-20% extra VRAM is ideal for context and operational flexibility when running at q8.

Stackblitz (Bolt.new) ▷ #prompting (6 messages):

Migrating Firebase to Supabase, Using Bootstrap with create-mf-app, Google reCAPTCHA Issues, Testing ChatGPT Bolt Pilot, Vite Pre-Transform Errors

Migrating from Firebase to Supabase: A user inquired about the best approach to migrate their entire site built on Firebase to Supabase.
- The discussion remains open for strategies and best practices for such a migration.
Create-mf-app and Bootstrap Clash: A member seeks a consistent method to integrate create-mf-app with Bootstrap without conflicting with Tailwind.
- They noted that attempts to combine the two often lead to a chaotic setup.
Google reCAPTCHA Troubleshooting: A user reported an initial error of ‘Invalid key type’ while using Google reCAPTCHA due to selecting v3 instead of v2 as implemented by Bolt.new.
- After switching to v2, they still face issues with verification counts and not receiving emails from the contact form.
Feedback Request for Bolt Pilot: A member announced their creation of Bolt Pilot, a new GPT for Bolt, and requested users to test its functionalities.
- They encouraged feedback on any areas needing readjustment for improvement.
Vite Pre-Transform Error Reports: A user raised concerns about encountering numerous repeated ‘[vite] Pre-Transform’ errors during development.
- This issue appears to be affecting others as well, inviting further discussion on potential resolutions.

Stackblitz (Bolt.new) ▷ #discussions (97 messages🔥🔥):

Token Waste Issues, Project Collaboration, Bolt.diy Importing Projects, Payment Integration Discussion, User Experience with Bolt

Users frustrated with token waste: Many users expressed frustration over Bolt wasting tokens, with one even considering taking a break due to its behavior.
- Suggestions included adding a ‘punch the AI’ button to stop token wastage, as members shared their experiences of receiving irrelevant responses.
Collaborative project opportunities: A user is seeking collaboration on a project in Bolt, inviting others to join and build something great together.
- This prompted further discussions about sharing resources and working as a team on upcoming projects.
Importing projects from Bolt.new: Users are curious about importing projects from bolt.new to bolt.diy, with a method discussed of downloading projects as zip files.
- Instructions were given on using the import folder feature to continue working on previously created projects.
Payment Integration Features: There was a discussion about how complex it would be for Bolt to implement various payment integrations like Stripe and PayPal.
- Users highlighted the need for features like dynamic billing and expressed interest in potential future updates on this matter.
User experience and bugs: Frustration with repeated bugs and placeholders emerged, causing project delays for users testing new features.
- Suggestions to address issues included turning off diffs and focusing on managing project files better.

Links mentioned:

Vite + React + TS: no description found
oTTomator Community: Where innovators and experts unite to advance the future of AI-driven automation
GitHub - RealSput/Wenode: WebContainers, except it's a million times easier to use: WebContainers, except it's a million times easier to use - RealSput/Wenode
GitHub - stackblitz-labs/bolt.diy: Prompt, run, edit, and deploy full-stack web applications using any LLM you want!: Prompt, run, edit, and deploy full-stack web applications using any LLM you want! - stackblitz-labs/bolt.diy

Cohere ▷ #discussions (42 messages🔥):

Maya Tool Use, Model Integration Challenges, Sleep Importance, Image Tool Development, Local Model Usage

Push for Maya Tool Use: A member emphasized the need for maya tool use, stating, ‘we NEED maya tool use’ to enhance their model’s capabilities.
- Another member encouraged taking some rest, reminding them that rejuvenation fosters creativity.
Challenges with Model Integration: There was a discussion about the integration of local models, specifically mentioning no native tool use, causing difficulty for a member.
- They expressed uncertainty about their approach, saying, ‘I got no idea wat I’m doing’ amidst the technical challenges.
Importance of Sleep Discussed: A member raised the question: ‘Why is sleep important?’, leading to a general agreement on the necessity of rest for mental health.
- A light-hearted reminder was given, encouraging team members to rejuvenate and balance dedication with well-being.
Image Tool Ideas Shared: Ideas were exchanged about creating a new image_tool that interacts with models for multi-step queries, maximizing output from images.
- This would allow the model to engage directly with tools, enhancing the response generation process when dealing with images.
Technical Glitches Hindered Progress: A member humorously reported that their IDE crashed while loading 71k lines of JSON, causing a pause in their workflow.
- The group shared a laugh over the challenges faced while pushing for development under tight timelines like their goal for Christmas.

Cohere ▷ #announcements (1 messages):

Rate-limit increase for Multimodal Embed-v3 Images, Trial vs Production rate limits, API key options and pricing

Multimodal Embed-v3 Images get a 10x Boost!: Due to community feedback, the rate limit for the Multimodal Image Embed endpoint has increased from 40 images/min to 400 images/min for production keys.
- The trial rate limit will remain at 5 images/min to allow for free testing.
Understanding Trial and Production Rate Limits: In addition to the significant boost for the Embed (Images) endpoint, various other endpoints have their distinct rate limits detailed in a provided chart.
- For instance, the Chat endpoint allows 20 images/min in trial and 500 images/min in production, highlighting the advantages of paying for production keys.
Explore API Keys and Pricing Details: Cohere offers two kinds of API keys: evaluation keys, which are free, and production keys, which are paid and provide higher limits.
- Developers can create and manage their keys on the API keys page and check the pricing details on the pricing docs.

Link mentioned: API Keys and Rate Limits — Cohere: This page describes Cohere API rate limits for production and evaluation keys.

Cohere ▷ #questions (51 messages🔥):

Cohere Reranker Issues, Using Different Embedding Models, Cohere and Nvidia Dependency, TPU in AI Systems, Vector Store for Different Dimensionality

Cohere Reranker doesn’t consistently select relevant chunks: A developer reported that the Cohere Reranker used with the ContextualCompressionRetriever sometimes fails to select the most relevant chunks from the retrieved data, leading to incorrect answers.
- Despite accurate chunking in their RAG application, the reranking behavior is random and often selects less relevant chunks, causing confusion.
Questions on storing embeddings with different dimensions: A user asked whether they should create separate vector stores for embeddings generated by text-3-embedding-large and cohere embed v3, given their different dimensions of 3072 and 1024, respectively.
- The concern was raised because the dimensionality difference might impact the storage strategy when integrating embeddings for text, tables, and images.
Dependency of AI systems on Nvidia products: One participant noted that Nvidia is a core component for most AI systems due to the strong ecosystem provided by CUDA and NCCL.
- While AMD and TPU are alternatives, they are considered more niche in comparison to Nvidia’s widespread adoption in the AI space.
Exploration of TPU usage in AI: There was a discussion about TPUs and their effectiveness as fast vector processors, specifically for matrix multiplication in AI applications.
- While Anthropic utilizes TPUs significantly, the consensus was that the majority of systems still predominantly rely on Nvidia due to its robust ecosystem.
Utilizing diverse computing architectures for AI: A participant shared their past experiences with FPGA for inference, indicating that multiple hardware options exist for AI processing.
- Discussion highlighted the need to consider how ‘turn-key’ a solution should be, weighing the flexibility of alternative architectures against ease of implementation.

Links mentioned:

Structured Outputs — Cohere: This page describes how to get Cohere models to create outputs in a certain format, such as JSON.
Chat — Cohere: Generates a text response to a user message and streams it down, token by token. To learn how to use the Chat API with streaming follow our [Text Generation guides](https://docs.cohere.com/v2/docs/cha...
Chat — Cohere: Generates a text response to a user message and streams it down, token by token. To learn how to use the Chat API with streaming follow our [Text Generation guides](https://docs.cohere.com/v2/docs/cha...

Cohere ▷ #cmd-r-bot (1 messages):

setupisanoun: hey buddy

Cohere ▷ #projects (2 messages):

Product Hunt Launch, Findr App, Digital Memory

Findr App Launches on Product Hunt: Findr has officially launched on Product Hunt, aiming to give humans infinite memory and a searchable digital brain.
- The team is requesting support, as shown in their promotional tweet.
Support from the Community: Community members, including @meor.amer, expressed their congratulations on the launch of Findr.
- This shows a positive response from users who are interested in the app’s innovative concept.

Link mentioned: Tweet from Nishkarsh (@Nish306): we’ve launched on Product Hunt. i would greatly appreciate your support https://www.producthunt.com/posts/findr-remember-everythingwe're giving humans infinite memory and a searchable digital brai…

Cohere ▷ #cohere-toolkit (3 messages):

Cohere Toolkit Deployment, AWS Stream Errors, Docker Logs Inspection

Cohere Toolkit deployed but facing issues: A member successfully deployed the Cohere Toolkit using the provided AWS instructions, but encountered an intermittent stream ended unexpectedly error.
- This issue appears to happen randomly, with some messages working fine at times.
Seeking insights on stream error: The same member inquired if anyone else has experienced the stream ended unexpectedly error with no concrete lead on what might be causing it.
- The issue persisted despite other functions appearing normal at times, prompting the request for shared experiences.
Advice to check Docker logs: Another member suggested checking the docker logs to find more information about the error.
- This recommendation indicates that deeper insights might be found in the application logs related to the deployment.

Modular (Mojo 🔥) ▷ #general (22 messages🔥):

Mojo on Archcraft Linux issues, Installation of Max and Magic, Using the Mojo REPL, Python requirements in magic environment

Congratulations on the new release!: Members celebrated the new release and shared examples available in the GitHub repo for Stable Diffusion.
- One member offered congratulations while providing helpful links for further exploration.
User struggles with Mojo on Archcraft Linux: A user reported issues entering the Mojo REPL on Archcraft Linux, stating it could not find the dynamically linked library, potentially called mojo-ldd.
- Discussion ensued about errors related to mojo-lld, a linker, and its installation requirements.
Max installation process issues: Another member mentioned that the Max installation process was unexpectedly killed, hindering their progress.
- They expressed difficulties in accessing the Mojo REPL despite being able to use Max and Magic.
Externally managed environment error: The same user stated that while in the magic environment, attempts to install Python requirements led to an error indicating they were in an externally managed environment.
- They sought help for this issue, indicating they could not download the necessary requirements.
Tip for problem-solving threads: One member suggested creating a new thread for problem solving to assist others facing similar issues.
- This approach was encouraged to ensure collaborative solutions and continuous assistance from the community.

Modular (Mojo 🔥) ▷ #mojo (57 messages🔥🔥):

Mojo Documentation Updates, Mojo Kernel Terminology, Compute Kernels vs OS Kernels, Discussion on var Keyword, Argmax and Argmin Removal

Clarifying Mojo Documentation Updates: A team member discussed updates in the Mojo docs regarding variable declarations, particularly the use of the var keyword.
- Another member confirmed they are working on an update regarding the var requirement, which is still in progress.
Understanding Mojo Kernel Terminology: Members discussed the term ‘kernel’ in the context of Mojo, clarifying it refers to a function running on an accelerator, not an OS kernel.
- One member humorously noted that the term is used to sound sophisticated, while another explained it as specific blocks of code optimized for hardware.
Distinction Between Compute Kernels and OS Kernels: A discussion emerged redefining compute kernels versus OS kernels, highlighting that Mojo could be used for userspace drivers but still requires improvements.
- Members agreed that while Mojo can help with compile and portability aspects, more work is needed before it matches the capabilities of OS kernels.
Debate Over the var Keyword in Mojo: Members expressed differing views on the necessity of the var keyword, with suggestions for it to be optional but indicated on the code.
- One member discussed how the removal of var might affect structs, while others expressed a desire for clarity in its usage.
Concern Over Removal of Argmax and Argmin Functions: A member inquired about the disappearance of argmax and argmin from the algorithm.reduction, fearing the need to implement them from scratch.
- This sparked conversation about the updates and changes in the Mojo library, indicating the need for a changelog clarification.

Link mentioned: Mojo language basics | Modular Docs: Introduction to Mojo’s basic language features.

Modular (Mojo 🔥) ▷ #max (13 messages🔥):

Custom ops in Mojo, Error handling and documentation, Feature request for custom op messages, Max GitHub repo issues, Session loading with custom ops

Custom ops give trouble in Mojo: There are issues loading a custom op named mandelbrot in Mojo, specifically when trying to import the type.
- Errors indicate that the Mojo kernel for the mandelbrot custom op is not registered, hindering execution.
Documentation updates are needed: Members discussed filing issues on the Max GitHub repo regarding the clarity of error messages, particularly the ‘custom op not found’ message.
- Suggestions include improving error messages and potentially guiding users towards relevant documentation.
Feature request for improved custom op handling: A member initiated a feature request for better handling of custom ops that are not found and clearer error messages.
- This request aims to address user experience issues by directing users to documentation when errors occur.
Confirmed bug reporting: Members expressed gratitude for catching issues and confirmed that they would file bug reports, particularly regarding custom operations.
- Clear communication about GitHub issues is necessary for resolving existing problems with Mojo’s handling of custom ops.
Session loading confusion: There were discussions around loading sessions with custom ops, particularly using paths to kernel directories.
- One member mentioned underlining the specifics when calling session.load with custom_ops_paths for clarity.

Link mentioned: [Feature Request] Single compilation unit kernels and/or improved error messages · Issue #269 · modularml/max: What is your request? This is a 2-part request, but bundled since they both address the same UX issue. Part one is to make the “custom op not found” error message direct users to documentati…

OpenInterpreter ▷ #general (28 messages🔥):

Open Interpreter Errors, Latest OI Version, AI Applications and Models, Truffle-1 Computing Stack, Long-Term Memory in OI

Persistent Open Interpreter Errors: Multiple users reported consistent issues when using Open Interpreter, particularly involving errors related to the --conversations command.
- One member expressed frustration over losing valuable conversations, raising questions on how to resolve these persisting issues.
Inquiry about Latest OI Version: A user was curious about upgrading to version 1.x of Open Interpreter, mentioning they were still on 0.34 and heard about a newer 1.0 release.
- Discussions included whether the OS mode was available in the latest version, as members strategized improvements.
Exploring AI Applications and Models: Discussions emerged on using AI for various applications, including a Raspberry Pi setup and possible voice-to-speech models for home automation.
- Users contemplated how to connect smaller models with larger, more capable systems to enhance functionality.
Introduction of Truffle-1 AI Computer: One member shared details about the Truffle-1, a personal computing stack running multiple models with 64GB of unified memory, priced at a $500 deposit and $115 monthly.
- This personal agentic computing device aims to provide infinite inference time and supports writing and sharing apps, with units shipping in January.
Local Use of OS Mode: A user inquired whether it was possible to use OS mode locally with Open Interpreter.
- This prompted further discussion on the configuration options available for users experiencing issues.

Links mentioned:

Truffle: A Personal Agentic Computing stack — the Truffle-1 runs a mixture of models on-device with 64GB of unified memory
Tweet from simp 4 satoshi (@iamgingertrash): To recap:> $500 deposit authorized today> $115/month for 12 months> Infinite inference time compute> Writing & Sharing your own apps> A glowing orb with a 64GB Orin> We're actual...
First Of All All Things Are Possible GIF - First Of All All Things Are Possible Jot That Down - Discover & Share GIFs: Click to view the GIF

tinygrad (George Hotz) ▷ #general (27 messages🔥):

Benchmarks of Llama Models, Mergeability in ShapeTrackers, Layout Algebra in CuTe, Algorithm Complexity in Merging, Injectivity in Layout Algebra

Request for Llama Model Benchmarks: A member asked if anyone had benchmarks for any Llama models comparing TinyGrad’s OpenCL with PyTorch’s CUDA implementations.
- This highlights an ongoing interest in performance comparisons between AI frameworks.
Challenges of Mergeability in ShapeTrackers: A user discussed the complexity of proving the mergeability of two arbitrary ShapeTrackers in Lean, stating it’s not possible to have a simple criterion similar to a matrix determinant.
- They highlighted the existence of coincidences in strides and shapes that complicate the mergeability checks.
Insights on CuTe Layout Algebra: Inquire into whether mergeability is equivalent to composition in CuTe’s layout algebra, referencing an academic note on layout operations.
- This discussion touched on fundamental abstractions in NVIDIA’s CUTLASS library and the mathematical treatment of layout operations.
Complexity of Layout Algebra: Concerns about proving conditions related to injectivity in layout algebra were raised, with suggestions that such checks might be NP hard.
- It emphasizes the difficulties in establishing sufficient conditions in layout algebra due to potential stride interferences.
Symbolic Functions vs. Layouts: A member pointed out that symbolic integer functions are strictly more powerful than layouts in terms of checking necessity and sufficiency.
- This aligns with the discussions on algorithm complexities in merging views and supports ongoing research directions.

Links mentioned:

A note on the algebra of CuTe Layouts: The core abstraction of NVIDIA’s CUTLASS library for high-performance linear algebra is the CuTe Layout. In this technical note, we give a rigorous, mathematical treatment of the algebra of these l…
Issues · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - Issues · tinygrad/tinygrad
cutlass/media/docs/cute/02_layout_algebra.md at main · NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines. Contribute to NVIDIA/cutlass development by creating an account on GitHub.

Torchtune ▷ #dev (25 messages🔥):

FSDP normalization, Scaling factors in loss computation, Bug reports on trl and HF trainer, Optimizer behavior with weight decay, Updates to PRs

FSDP Normalization Needs Scaling: Discussions revealed that FSDP’s normalization by world_size must be addressed, and scaling by world_size can correct an average operation issue.
- A member suggested opening a PR to implement this fix since it won’t require extensive changes, mostly rotating around the scale_grads function.
Explicit Scaling Preferred in Training: The community highlighted the importance of explicit scaling of the loss within the recipe rather than hiding logic elsewhere, to simplify understanding.
- After evaluations, members agreed to clarify the scaling process in both training and optimization hooks.
Identifying Bugs Across Frameworks: It was pointed out that a similar bug affecting the reduction by a factor of 1/world_size might exist across various libraries, including trl and Hugging Face’s trainer.
- Members commended the HF team for recognizing and addressing these issues in their training framework, as noted in linked GitHub issues.
Handling No Sync Scenario: Members discussed how Hugging Face handles no sync scenarios by avoiding grad accumulation normalization while properly computing loss.
- Specific links were provided detailing their method of arriving at the number of items in the batch to facilitate accurate loss normalization.
Updates Made to the PR: A member confirmed the addition of a scaling factor for the optimizer_in_bwd case in their existing PR to address potential issues.
- The functionality matters as it adjusts how optimizers apply weight decay and ensures better gradients handling in specific cases.

Links mentioned:

torchtune/recipes/full_finetune_distributed.py at 3518492f43a8a5a462cbd604be4101268ff5bd52 · pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.
Fix gradient scaling to account for world_size normalization by mirceamironenco · Pull Request #2172 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here)Please link to any issues this PR addresses.ChangelogW...
torchtune/torchtune/training/memory.py at main · pytorch/torchtune: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.
Add DDP token averaging for equivalent non-parallel training similar to #34191 · Issue #34242 · huggingface/transformers: Feature request Token averaging in gradient accumulation was fixed in #34191 . But token averaging in DDP seems to have the same issue. Expected behaivor With all the tokens contributing to loss in...
GitHub - pytorch/torchtune at 3518492f43a8a5a462cbd604be4101268ff5bd52: PyTorch native post-training library. Contribute to pytorch/torchtune development by creating an account on GitHub.
transformers/src/transformers/trainer.py at 052e652d6d53c2b26ffde87e039b723949a53493 · huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

Torchtune ▷ #papers (2 messages):

Evolutionary Algorithms, Scale Up Evolution, Gradient Techniques

Evolutionary Algorithms Take Center Stage: A member pointed out the interesting use of evolutionary algorithms, showcasing their significance in current discussions.
- This potential innovation invites further exploration into its application in machine learning.
Sakana Aims to Compete with Gradient Techniques: Sakana is attempting to scale up its evolutionary methods to remain competitive with prevailing gradient techniques.
- This move suggests a growing interest in alternative optimization strategies within the community.

DSPy ▷ #show-and-tell (1 messages):

collabin: https://youtu.be/BrvVheleOqc

DSPy ▷ #papers (4 messages):

AI and Knowledge Economy, Coconut - Chain of Continuous Thought, Autonomous vs Non-Autonomous AI

AI Reshaping the Knowledge Economy: This paper introduces a framework analyzing how AI transforms the knowledge economy by reallocating roles between ‘workers’ and ‘solvers’. It highlights that basic autonomous AI displaces humans while advanced autonomous AI leads to larger, more productive firms.
- As autonomous agents gain traction, they predominantly benefit the most knowledgeable individuals, allowing them to efficiently manage routine work, while less knowledgeable individuals benefit from non-autonomous AI like chatbots.
Introducing Coconut - Continuous Thought: A recent paper from Meta proposes a new reasoning paradigm called Coconut, which uses the last hidden state of LLMs for reasoning instead of the traditional language space. The authors argue that traditional methods may not effectively capture the reasoning process and introduce a concept termed “continuous thought.”
- This approach seeks to overcome limitations of language-based reasoning by exploring unrestricted latent spaces, which could enhance LLMs’ performance on complex reasoning tasks.

Links mentioned:

Artificial Intelligence in the Knowledge Economy: The rise of Artificial Intelligence (AI) has the potential to reshape the knowledge economy by enabling problem solving at scale. This paper introduces a framework to analyze this transformation, inco...
Training Large Language Models to Reason in a Continuous Latent Space: no description found

DSPy ▷ #general (11 messages🔥):

TypedReAct integration, RouteLLM maintenance concerns, DSPy evolution with reasoning models

TypedReAct enigma solved: A member shared a new implementation of TypedReAct, questioning whether to submit a PR, but noted potential deprecated issues with TypedChainOfThought in upcoming versions.
- Another member suggested that simply removing the ‘Typed’ prefix would resolve compatibility issues, emphasizing that built-in ReAct is effective without the typing.
RouteLLM’s prolonged slumber: A member expressed concerns regarding the lack of maintenance for RouteLLM, indicating interest in potential DSPy integration.
- The conversation highlighted how critical it is to support development for models with reduced oversight.
Discussing DSPy’s future amidst reasoning models: A member inquired about discussions related to how DSPy might evolve with the rise of reasoning models, emphasizing fine-tuning at the branching level.
- This perspective shifts focus from traditional prompting methods to process reward mechanisms, indicating a potential paradigm shift in model training.

Link mentioned: Agents - DSPy: The framework for programming—rather than prompting—language models.

Nomic.ai (GPT4All) ▷ #general (12 messages🔥):

GPT4All issues, Jinja template functionality, Docker version of GPT4All, Command line interface concerns, Local documents in CLI

GPT4All struggles with Jinja templates: Users expressed frustrations with GPT4All being ‘completely broken’ for side loading, citing issues with Jinja templates that are crucial for model functionality.
- Current problems identified include needing to space elements correctly, fixing new line issues, and unsupported functions like ‘none’ and ‘[1:]’.
Docker version of GPT4All in demand: A user inquired about a version of GPT4All that runs from a Docker container with a web UI, indicating interest in easier deployment options.
- The community has not responded with specific resources or existing solutions for this request yet.
CLI interactions without localdocs: One user is trying to access GPT4All models via the command line but is unable to use local documents with the current setup.
- Another user informed them that the old CLI is not officially supported, but the server API allows programmatic access to localdocs when enabled in the GUI.

LlamaIndex ▷ #blog (2 messages):

AI SDR, Agent Building Crash Course, LlamaIndex Function Calling, Agentic RAG, ReAct

AI SDR Generates Leads with LlamaIndex: Check out this agentic AI SDR that generates leads for you, built using LlamaIndex.
- This tool is highlighted for its capability in automated lead generation linking to several GitHub features.
Learn to Build Agents from Scratch: A crash course from @TRJ_0751 teaches how to build agents with LlamaIndex focusing on function calling to manage real-time data queries.
- Participants will also learn to create an agentic RAG that intelligently routes between vector and summary tools, as well as how to create ReAct.

Link mentioned: composio/python/examples/quickstarters at master · ComposioHQ/composio: Composio equip’s your AI agents & LLMs with 100+ high-quality integrations via function calling - ComposioHQ/composio

LlamaIndex ▷ #general (4 messages):

OpenAIAgent concurrency, RAG evaluation discussions

OpenAIAgent execution may not be concurrent: A member inquired whether OpenAIAgent function execution can be done concurrently in an asynchronous environment, noting it differs from parallel function calling.
- The investigation into this led to observing that even with modifications for async, the function executions remain non-concurrent.
Utilizing async tools for function execution: Another member suggested using async entry points and async tools, stating that this approach should ensure proper execution.
- They provided a code snippet demonstrating how to implement a tool asynchronously with OpenAIAgent.
Looking for RAG evaluation discussions: A member expressed interest in discussing RAG evaluation and invited others to DM if they want to chat.
- This indicates an ongoing effort to engage with peers in the AI community on evaluation strategies.

Link mentioned: Single-Turn Multi-Function Calling OpenAI Agents - LlamaIndex: no description found

Gorilla LLM (Berkeley Function Calling) ▷ #discussion (3 messages):

BFCL Leaderboard, Function Call Demo Issues, Gorilla Benchmark for Structured Outputs

BFCL Leaderboard functionality questioned: A user noted issues with the function call demo on the BFCL Leaderboard stating it was stuck on ‘Loading Model Response…’.
- In response, another member confirmed there is a certificate issue causing the model endpoint to be down.
Inquiry about structured output evaluation: A user expressed interest in using the Gorilla benchmark for evaluating the quality of structured outputs from the model.
- They specifically asked if there are any subtasks dedicated to generating text according to a provided JSON schema or Pydantic model.

Link mentioned: Berkeley Function Calling Leaderboard V3 (aka Berkeley Tool Calling Leaderboard V3) : no description found

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (1 messages):

kallemickelborg: Thank you for that!

Axolotl AI ▷ #general (1 messages):

New Engineer on Board, Reinforcement Learning Support

New Engineer Joining in January: A new engineer is set to join in January to assist with Reinforcement Learning in general.
- They will also provide support with the kto project at that time.
Support for RL and kto: The new engineer’s expertise will enhance the team’s capabilities in Reinforcement Learning.
- Their assistance is anticipated to positively impact the development of the kto as well.

Mozilla AI ▷ #announcements (1 messages):

Developer Hub, Blueprints Initiative

Developer Hub Update Released: A significant update regarding the Developer Hub has been announced, detailing improvements and new features. You can view the full announcement here.
- Feedback from the community is appreciated as they strive to enhance user experience.
Blueprints for Open-Source AI Solutions: A thread discussing the Blueprints initiative aims to assist developers in creating open-source AI solutions has been shared. More details can be found in the thread.
- This initiative is positioned as a resource for developers to kickstart their projects effectively.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama Recap

Other AI Subreddit Recap

AI Discord Recap

PART 1: High level Discord summaries

Codeium (Windsurf) Discord

Cursor IDE Discord

aider (Paul Gauthier) Discord

OpenAI Discord

Nous Research AI Discord

Notebook LM Discord Discord

Unsloth AI (Daniel Han) Discord

OpenRouter (Alex Atallah) Discord

Eleuther Discord

Stability.ai (Stable Diffusion) Discord

Perplexity AI Discord

GPU MODE Discord

LM Studio Discord

Stackblitz (Bolt.new) Discord

Cohere Discord

Modular (Mojo 🔥) Discord

OpenInterpreter Discord

tinygrad (George Hotz) Discord

Torchtune Discord

DSPy Discord

Nomic.ai (GPT4All) Discord

LlamaIndex Discord

Gorilla LLM (Berkeley Function Calling) Discord

LLM Agents (Berkeley MOOC) Discord

Axolotl AI Discord

Mozilla AI Discord

PART 2: Detailed by-Channel summaries and links

Codeium (Windsurf) ▷ #discussion (60 messages🔥🔥):

Codeium (Windsurf) ▷ #windsurf (678 messages🔥🔥🔥):

Cursor IDE ▷ #general (707 messages🔥🔥🔥):

aider (Paul Gauthier) ▷ #general (264 messages🔥🔥):

aider (Paul Gauthier) ▷ #questions-and-tips (18 messages🔥):

aider (Paul Gauthier) ▷ #links (11 messages🔥):

OpenAI ▷ #annnouncements (1 messages):

OpenAI ▷ #ai-discussions (220 messages🔥🔥):

OpenAI ▷ #gpt-4-discussions (3 messages):

OpenAI ▷ #prompt-engineering (4 messages):

OpenAI ▷ #api-discussions (4 messages):

Nous Research AI ▷ #general (210 messages🔥🔥):

Nous Research AI ▷ #ask-about-llms (13 messages🔥):

Nous Research AI ▷ #research-papers (2 messages):

Nous Research AI ▷ #research-papers (2 messages):

Notebook LM Discord ▷ #announcements (1 messages):

Notebook LM Discord ▷ #use-cases (27 messages🔥):

Notebook LM Discord ▷ #general (194 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #general (66 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #off-topic (139 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #help (15 messages🔥):

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenRouter (Alex Atallah) ▷ #general (209 messages🔥🔥):

Eleuther ▷ #general (1 messages):

Eleuther ▷ #research (123 messages🔥🔥):

Eleuther ▷ #lm-thunderdome (6 messages):

Eleuther ▷ #gpt-neox-dev (9 messages🔥):

Stability.ai (Stable Diffusion) ▷ #general-chat (122 messages🔥🔥):

Perplexity AI ▷ #announcements (1 messages):

Perplexity AI ▷ #general (108 messages🔥🔥):

Perplexity AI ▷ #sharing (4 messages):

Perplexity AI ▷ #pplx-api (1 messages):

GPU MODE ▷ #general (41 messages🔥):

GPU MODE ▷ #triton (1 messages):

GPU MODE ▷ #cuda (8 messages🔥):

GPU MODE ▷ #torch (33 messages🔥):

GPU MODE ▷ #cool-links (4 messages):

GPU MODE ▷ #jobs (1 messages):

GPU MODE ▷ #torchao (5 messages):

GPU MODE ▷ #rocm (1 messages):

GPU MODE ▷ #thunderkittens (1 messages):

GPU MODE ▷ #arc-agi-2 (18 messages🔥):

LM Studio ▷ #general (87 messages🔥🔥):

LM Studio ▷ #hardware-discussion (17 messages🔥):

Stackblitz (Bolt.new) ▷ #prompting (6 messages):

Stackblitz (Bolt.new) ▷ #discussions (97 messages🔥🔥):

Cohere ▷ #discussions (42 messages🔥):