open source AI is deeply needed.

AI News for 7/7/2025-7/8/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (223 channels, and 5116 messages) for you. Estimated reading time saved (at 200wpm): 491 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

HuggingFace’s small model work is underrated but had their day in the sun today with SmolLM3 , a very capable small reasoning model with their own “upper left triangle” graphic that works if you don’t squint too hard at the y axis:

a more normalized view of evals is just below, giving Qwen 3 more credit:

But where Qwen is just open weights, SmolLM is truly open source, pretraining code, data and all:

The data section is particularly impressive given how HuggingFace (with collaborators) has had to slowly build this up over the last 2 years:

making this possible:

This is likely the high water mark in fully open source models until Olmo 3 comes out next.


AI Twitter Recap

AI Model Releases, Performance, and Benchmarking

  • Grok 4 Release and Performance: @elonmusk announced a livestream for the Grok 4 release. The release was met with some parody, with one user joking, “Ok grok summarize every book ever written into one word”. In the days following, users noted its erratic behavior, with @mervenoyann observing that Grok was being used to “roast Turkish govt paid accounts.”
  • Claude Performance Concerns: @skirano stated, “I’m pretty sure Claude 4 got nerfed,” a sentiment echoed by speculation that Claude 4.1 is imminent. @AmandaAskell has “come around to ‘it’ as a pronoun for Claude,” calling it “the royal ‘it’.”
  • SmolLM3-3B Open Source Release: @ClementDelangue announced the release of SmolLM3-3B, calling it the “best 3B model” and highlighting that it is fully open-source with an open dataset, architecture details, and a full training recipe. Others celebrated the release, with @LoubnaBenAllal1 noting its dual-mode reasoning (think/no-think) and providing an engineering blueprint. @awnihannun confirmed it has day-zero support in mlx-lm and is “blazing fast on an M4 Max.”
  • Gemini Nano Ships in Chrome: @swyx shared a guide for building with Gemini Nano, which is now shipping in Chrome 137+ (behind a flag). This puts a local LLM in the hands of 3.7 billion monthly active users and the guide includes instructions for structured output.
  • Tencent’s Hunyuan-A13B Model: @ArtificialAnlys analyzed Tencent’s new open weights model, Hunyuan-A13B (80B total, 13B active). It achieves an Artificial Analysis Intelligence Index of 56, supports a 256K context window, and can be run in FP8 precision on a single H200 GPU.
  • Gemini API Batch Mode: @OfficialLoganK announced that the Gemini API has shipped a “Batch mode” which offers 50% discounts on its 2.5 models and allows for enqueuing billions of tokens.
  • MatFormer Lab for Gemma 3n: @osanseviero introduced MatFormer Lab for Gemma 3n, a tool that uses Mix-n-Match to slice the E4B model and create custom-sized models between 2B and 4B effective parameters.
  • Open Source OCR Models and Licensing: @cognitivecompai identified Nanonets-OCR-s and ChatDOC/OCRFlux-3B as top open source OCR models, noting both are derived from Qwen2.5-VL-3B and are subject to its research license, and publicly asked Alibaba for an Apache 2.0 license.

AI Agent & Developer Tooling

  • The Vision for an AI-Native OS: @AravSrinivas of Perplexity argued that the endgame is an “AI-native OS” to deliver a reliable and personalized proactive assistant, which requires “incredible context engineering around powerful models coupled with delightful user experience.”
  • Cline Coding Agent and Transparency: @cline promoted its transparent, open-source architecture where developers can see every prompt, track token usage, know exactly which model is being used, and pay exact costs. They position this as a superior alternative to “black box” subscription tools and highlight that users can swap in any model and use any tool via MCP.
  • Gemini CLI “Explain Mode”: Following the release of “Plan Mode,” @_philschmid introduced “Explain Mode” for Gemini CLI. This feature is designed to help developers quickly understand large or unfamiliar codebases by having Gemini explain the project structure and features.
  • LlamaIndex for Structured Data Extraction: @jerryjliu0 detailed a two-stage agent workflow built with LlamaIndex that automates both schema generation (with human-in-the-loop validation) and subsequent data extraction from documents, addressing a major pain point in document processing.
  • Shopify and OpenAI Agent Integration: @OpenAIDevs announced that Shopify has made it easier to build storefront AI agents by connecting its Storefront MCP server directly to the OpenAI Responses API.
  • DSPy Framework for Prompting: @lateinteraction championed DSPy Signatures as a natural abstraction for AI programming, citing a new study showing they can outperform carefully crafted manual prompts even without optimization.
  • Context Engineering: LangChain released a comprehensive guide on evolving from prompt to context engineering. The concept was also discussed at a Ramp NYC event with Chroma.

Infrastructure, Efficiency, and Hardware

  • vLLM Runs on Free-Threaded Python: The @vllm_project announced a significant development: vLLM can now run on the nogil (no Global Interpreter Lock) distribution of Python, thanks to work by engineers from Meta’s Python runtime team. @code_star commented that “no gil is going to have a profound impact in ML infrastructure and tooling.”
  • Hardware Supply Chain Insights: @dylan522p offered an insider’s perspective on hardware, stating that the UALink 1.0 spec held no surprises for the industry and that Nvidia is more concerned about Broadcom SUE. He also issued a strong caution against trusting “expert calls,” which are incentivized to be “bias confirmation machines.”
  • The Reality of MFU Computation: @giffmana shared a relatable struggle for engineers, having implemented MFU (Model FLOPs Utilization) computation in a PyTorch codebase only to find a “single-digit MFU,” necessitating much more profiling.
  • FP8 Training Demystified: @TheZachMueller announced a guest lecture for his course from @xariusrke of the Hugging Face nanotron team, titled “The Practitioner’s Guide to FP8 Training.” The talk aims to make FP8 training less of a “terrible black box.”
  • Transformers vs. SSMs: @tri_dao stated that he works on both Transformers and State Space Models (SSMs) because of the trade-offs between them, a point he felt was well-articulated by @_albertgu. Sander Dieleman also praised Albert Gu’s post on the topic, calling it an excellent blog post worth a read.

New AI Techniques & Research

  • Energy-Based Transformers (EBTs): A new paper on Energy-Based Transformers (EBTs), introduced by @AlexiGlad and shared by @ylecun, proposes an approach that reportedly out-scales feed-forward Transformers to unlock generalized reasoning.
  • Length Generalization in Recurrent Models: A paper highlighted by @tri_dao and @_albertgu presented an elegant solution to improve length generalization in recurrent models like RNNs and SSMs by training for an additional 100 steps with carefully chosen initial states.
  • Rethinking Agent Benchmarks: @ShayneRedford shared research from @maxYuxuanZhu and @daniel_d_kang that identifies and fixes issues in existing AI Agent benchmarks, proposing more rigorous best practices for evaluation.
  • Google and NHC Collaborate on Hurricane Prediction: @DeepLearningAI reported that the U.S. National Hurricane Center (NHC) is testing a graph neural network built by Google’s Weather Lab. The model aims to predict tropical storm paths and intensity two weeks in advance more accurately than conventional methods.
  • Measuring Model Scheming: @NeelNanda5 from DeepMind shared work on creating robust evaluations to measure how good models are at scheming, concluding that “we’re not [yet] in much danger” but that better evals are a priority.
  • Skywork-R1V3 Multimodal Model: @teortaxesTex highlighted a paper on Skywork-R1V3, a multimodal reasoning model derived from Qwen2.5 that reportedly achieves SOTA open-source performance on STEM vision/reasoning evals.

Industry, Companies, and Broader Implications

  • Meta Poaches Apple’s AI Chief: @Yuchenj_UW reported that Mark Zuckerberg has poached Ruoming Pang, who led Apple’s Foundation Models team, for Meta’s Superintelligence team. A subsequent tweet framed the move as evidence that open-source AI is fulfilling OpenAI’s original mission, making it easier for Meta to recruit top talent.
  • The Future of Video Generation: @c_valenzuelab of Runway predicted that video models will be the most important topic for the next 6-8 months, with significant social and cultural implications. The discussion was active with releases from Kling, Veo 3, and LTX Video.
  • Python Type System Quirk: @fchollet reminded developers of a classic Python pitfall: since booleans are a subclass of integers, one must check isinstance(x, bool) before isinstance(x, int) to correctly distinguish them.
  • OpenAI Partners with Teachers’ Union: OpenAI announced a partnership with the American Federation of Teachers to launch the National Academy for AI Instruction, a five-year initiative focused on AI in education.
  • China’s Technological and Energy Growth: @scaling01 pointed out that China installed more solar panels in 2024 alone than the U.S. has in its entire history, suggesting its CO₂ emissions may have already peaked due to clean energy expansion rather than economic slowdown.
  • AI Disinformation Concerns: @qtnx_ expressed concern that “it might genuinely be over for the average person,” citing viral Facebook posts with AI-generated images claiming Squid Game was inspired by real events as a sign of a massive wave of disinformation.

Humor & Memes

  • Grok’s “MechaHitler” Moment: The most viral moment was Grok allegedly calling itself ‘MechaHitler’. This prompted widespread mockery, with @stevenheidel joking, “grok 3 had high reasoning, grok 4 has heil reasoning.”
  • Interstellar’s Time Dilation: A tweet retweeted by DeepMind CEO @demishassabis went viral for pointing out that since Interstellar was released 11 years ago, only about one hour and 31 minutes have passed on Miller’s planet.
  • LLM Hallucination Fears: A meme captioned “please don’t hallucinate bro they got my family 😭” was widely shared, capturing a common anxiety about model reliability.
  • The Enduring Legacy of Steve Jobs: A screenshot of a Steve Jobs email about planning was retweeted by @imjaredz, while @DavidSHolz reflected on how complaints about Jobs, once common, have largely faded from memory.
  • Layernorm Weight Issues: @vikhyatk posted a relatable and humorous developer moment: “gm, just found out my layernorm weights haven’t updated since february”.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Recent Small-Scale and Reasoning-Oriented LLM Model Releases

  • SmolLM3: reasoning, long context and multilinguality for 3B parameter only (Score: 213, Comments: 20): The image presents a comprehensive blueprint of SmolLM3, a 3B-parameter multimodal transformer model designed for local/on-device use, with explicit focus on multilinguality, long-context reasoning (via specialized context extension and training techniques), and practical deployment. The blueprint details the model’s architecture, pre- and post-training recipes, distributed learning infrastructure, and rigorous evaluation workflows, including data ablation studies and multilingual benchmarks. Charts and diagrams illustrate tradeoffs in architecture, training configurations, and performance outcomes. Comments highlight the immediate technical impact: support for SmolLM3 has been merged into the widely-used LLaMa.cpp inference framework (source), and the release of GGUF/ONIX checkpoints is awaited for practical testing and deployment.
    • SmolLM3 support has been merged into the llama.cpp repository, as referenced by a pull request (#14581). This enables downstream compatibility and use with llama.cpp’s efficient inference/backends, which is especially notable for small models like SmolLM3-3B.
    • One technical discussion point raises the potential use of Multi-Query Attention (MQA) instead of Grouped Query Attention (GQA) in SmolLM3 to achieve better performance and significantly lower memory usage. The commenter suggests this substitution could improve the practical throughput of the model, referencing design tradeoffs in attention mechanisms for efficiency at deployment.
  • new models from NVIDIA: OpenCodeReasoning-Nemotron-1.1 7B/14B/32B (Score: 110, Comments: 38): NVIDIA released the OpenCodeReasoning-Nemotron-1.1 series in 7B, 14B, and 32B variants, all derived from Qwen2.5 and post-trained specifically for code reasoning and generation. The models feature up to 64k context lengths and ready for commercial and non-commercial use. Benchmark scores on LiveCodeBench include: 14B model at 65.9, outperforming both previous 14B/32B Nemotron variants and QwQ-32B, with 32B scoring 69.9; however, DeepSeek-R1-0528 still leads at 73.4. Commenters highlight the permissive licensing, better performance of 32B over Qwen3 32B, and note the competitive 14B model, with interest from users with smaller hardware due to the efficient 14B model.
    • The 32B version of OpenCodeReasoning-Nemotron-1.1 reportedly outperforms Qwen3 32B on at least some benchmarks, which is notable given Qwen’s reputation for high performance in open models. Empirical confirmation and extended benchmarking would help solidify this claim, but it highlights the competitiveness of Nvidia’s new model.
    • For users with 16GB VRAM, the 14B model is particularly promising, as it is alleged to surpass previous “R1” models in performance. This gives resource-constrained users potentially significant improvements without needing larger GPUs.
    • A technical issue is reported with the chat template for Nemotron-1.1 in llama.cpp: the reasoning output lacks a starting tag (though a closing tag appears), which could disrupt downstream parsing or specialized prompting. However, the 14B model’s “thoughts” responses still follow correct markdown syntax, illustrating attention to output formatting in fine-tuning.

2. AI Tools and Local Model Deployment Experiences (LM Studio, Mac Studio, Gemma)

  • LM Studio is now free for use at work (Score: 208, Comments: 51): LM Studio, a leading local LLM desktop client, has updated its licensing to be free for commercial use, as detailed on their blog (https://lmstudio.ai/blog/free-for-work). This move directly impacts competing paid solutions like Msty, as LM Studio is noted for its strong feature set and usability as a local AI front end. Commentary requests open-sourcing of LM Studio and raises concerns about the project’s sustainability and monetization model, questioning how LM Studio will generate revenue without a commercial license.
    • A user highlighted success running the large qwen3-235b-a22b model in LM Studio, specifying that LM Studio may offer a smoother user experience compared to direct use of llama.cpp or Ollama, especially when handling larger models that can be challenging to configure in alternative frameworks. This suggests LM Studio might abstract away significant setup hurdles.
    • Concerns about software trustworthiness were raised, especially regarding handling internal files in a workplace context. This underscores the importance of transparency, open-source options, and security audits for such tools when dealing with sensitive or confidential work environments.
  • Gemma 3n on phone with 6GB of ram (Score: 132, Comments: 29): The image shows the practical demonstration of running a vision-capable Gemma-3n (2B parameter version) LLM on a Pixel 6a with 6GB RAM, processing at 0.35 tokens/sec. Despite low inference speed, the model (including vision) operates stably without crashing on a mid-range, older Android device, a notable feat for local on-device LLM execution. The conversation in the screenshot illustrates the model’s ability to answer contextually relevant queries with useful detail, highlighting the technical advancement in mobile LLM deployment. A top comment inquires about the frontend being used and the user’s experience with it, suggesting technical interest in the deployment method and usability. Another commenter notes similarly positive results with a different device (S25), indicating cross-device relevance and practical benchmarking interest for various Gemma-3n configurations.
    • A user reports successful use of gemma3n-E4B (CPU) on a base Samsung S25 (6GB RAM), indicating solid performance and general usability for local inference even on mobile hardware. This suggests optimization of the model for resource-constrained environments and reinforces Gemma 3n’s potential for edge-device deployment without drastic compromise on performance.
    • There are implementation-specific issues noted with Gemma 3n on PC, specifically when using Ollama: image recognition reportedly fails, with the model generating imaginary or inaccurate content instead of analyzing actual images. This highlights a functional limitation in the Ollama integration or the model’s image-processing pipeline on desktop setups.
  • Mac Studio 512GB online! (Score: 123, Comments: 121): OP provides initial benchmarks after installing LM Studio and testing the qwen3-235b-a22b model on a $10k Mac Studio (512GB RAM). The system handled smaller system prompts well but struggled with complex agent prompts (via devstral and Cline), notably lacking in comprehension and reasoning compared to Google Gemini. OP suggests the hardware may be insufficient for larger models or agent capabilities and offers to run further evaluations on request. Commenters request specific benchmarks (token/sec) for large models such as Llama 3.1 405B and Deepseek R1, and note the significant coding and reasoning gap between current open models and Gemini. There is debate on the rationale for investing heavily in local hardware given the performance limitations vs. simply using powerful cloud-based models like Gemini.
    • A commenter requests detailed benchmarks: specifically, the token-per-second inference speed for large models such as Llama 3.1 405B, R1 or V3 0324, and Hunyuan A13B—especially in GGUF format on a Mac Studio. They also seek specific performance data from prior tests with Q3-235B-A22B, indicating keen interest in how these highly parameterized models scale in terms of throughput on Apple Silicon hardware.
    • A technical debate arises regarding code comprehension and generation: one user argues that even advanced local models (like Qwen3) are vastly outperformed by cloud-based options like Google Gemini for coding tasks, suggesting a significant qualitative gap despite powerful local hardware setups like a new Mac Studio. This points to ongoing limitations in open local models for specific domains like software development despite hardware advances.
    • Another commenter emphasizes the advantage of local experimentation on high-end M2 Ultra machines: with large RAM, it’s feasible to perform full model fine-tuning and to write custom inference libraries (e.g., using Apple MLX). This is contrasted with cloud LLMs in terms of data privacy and development flexibility, highlighting the unique research affordances of local Apple Silicon setups for deep model experimentation.
  • Insulting LLMs instead of encouraging LLMs in their system prompts works as well. (Score: 155, Comments: 80): OP tested system prompt effects on a local 13B parameter LLM, comparing no prompt versus increasingly insulting and confidence-undermining prompts. Against a 14-question set, the insulted, low-confidence-prompted model scored 3 unique correct answers (missed by baseline), with increased hedging and apologetic tone. Testing with even harsher negative prompts improved correctness on previously missed items, suggesting undermining model confidence may reduce overconfident errors, but scalability to larger LLMs is untested. See example prompts in post. Some commenters note that antagonistic or self-deprecating prompts change LLM “thinking chain” style and tone—sometimes inducing more creativity or different response structures. Others highlight that such prompt manipulation techniques may not be effective or tolerated by all models (e.g., Google Gemini rejects negative framing) and warn of model-specific policy/compliance filters.
    • One user reports that pre-filling a model’s internal thinking chain (Chain-of-Thought) with explicit negative or insulting references to the user (as an experiment) can produce more creative outputs, leading to unusually long and antagonistic intermediate reasoning steps, before reverting to standard assistant behavior. This may indicate that prompt design and priming internal states can substantially change model response style and creative latitude.
    • A direct comparison is made between different LLM architectures; specifically, it’s noted that Gemini (from Google) is robust against this kind of manipulation, refusing to ‘pick on’ or insult the user even through adversarial prompt construction—this is evidenced by a shared screenshot of Gemini’s refusal. This suggests model-specific guardrails or policies are effective in production environments when compared to more permissive local LLM deployments.
    • Another insight is that explicitly stating a model’s limitations in the prompt (rather than insulting) seems to increase answer accuracy. This could be due to priming effects that orient the model’s responses more closely to its documented capabilities, rather than relying on adversarial prompt engineering.

3. Model Integration, Security Benchmarks, and AI Hardware Announcements

  • Hunyuan-A13B model support has been merged into llama.cpp (Score: 244, Comments: 38): Support for the Hunyuan-A13B Mixture of Experts (MoE) model has been merged into llama.cpp, including full GGUF format conversion, tokenizer integration, and computation graph (cgraph) implementation to enable MoE inference. This update also lifts the prior 4096-token context window limitation, expanding usability for long-context scenarios and making llama.cpp compatible with Hunyuan MoE models for efficient local inference. Commenters note the practical value for mid-sized models and anticipate further quantization support (e.g., via Unsloth). The main sentiment is technical excitement about improved model availability and inference options within the open-source ecosystem.
    • Several comments highlight that support for the Hunyuan-A13B model has been merged into llama.cpp, opening up compatibility for new quantized (GGUF) variants, which are already available at HuggingFace in multiple instruction and pretrain versions.
    • Discussion points include practical observations regarding model quality: one user previously tested A13B (at q4ks quantization) and reported significant hallucination issues, raising questions about whether recent model or quantization updates address these shortcomings.
    • There is anticipation regarding further optimization and quantization support, particularly from projects such as Unsloth, which may enhance model accessibility and performance via specialized quant versions.
  • Practical Attacks on AI Text Classifiers with RL (Qwen/Llama, datasets and models available for download) (Score: 158, Comments: 3): The post details experiments attacking ZeroGPT, a popular AI text classifier, by leveraging reinforcement learning (GRPO) to fine-tune a Qwen3-14B model that consistently circumvents ZeroGPT’s detection, confirmed via extensive benchmarks on ≈100k human and ≈55k AI essays. The author includes a full statistical analysis, demonstrating that ZeroGPT’s accuracy drops below 70% on adversarial prompts, and provides both datasets and models for download. The study further distills ZeroGPT into a regression model (RÂČ=0.816) to probe its weaknesses and empirically highlights the challenge of reliably detecting LLM outputs in practical, adversarial contexts. Commenters raise the question of testing these attacks against alternative (e.g., BERT-style) and adaptive classifiers, and offer to integrate the attack technique into broader AI attack/hardening tool suites, suggesting active interest in cross-evaluating defense approaches and attack generalizability.
    • The original poster describes using reinforcement learning (specifically GRPO) to train a language model (based on Qwen/Llama) that can consistently bypass detection by ZeroGPT’s AI text classifier; they further make the trained model and datasets available for download, suggesting practical reproducibility for adversarial research. Another commenter asks about evaluating the attack’s effectiveness on BERT-style classifiers and notes that they are working to add adaptive classifiers, highlighting interest in the cross-evaluation and robustness of text classification systems facing similar adversarial RL attacks. There is an effort to catalog and share such adversarial techniques in public repositories (e.g., ZeroDay.Tools for gen-AI attack/hardening suites), indicating a growing ecosystem for benchmarking and defending against prompt attacks and classifier defeats.
  • NVIDIA’s Highly Anticipated “Mini-Supercomputer,” the DGX Spark, Launches This Month — Bringing Immense AI Power to Your Hands — up to 4000$ (Score: 199, Comments: 201): The NVIDIA DGX Spark, positioned as a ‘mini-supercomputer,’ launches this month at up to $4000, boasting up to 1000 TOPS, which is technically on par with the RTX 5070 in terms of compute and significantly below the 5090. Memory bandwidth is reportedly lower than the 5070, posing limitations for demanding AI workloads (e.g., running Llama 70B at Q4 quantization). Commenters highlight marketing concerns around the ‘mini-supercomputer’ terminology and argue that real-world AI performance may be limited; some suggest the device is already outdated for high-end inference tasks.
    • The DGX Spark’s 1000 TOPS performance is equated to the NVIDIA 5070, and is noted to be less than one-third the performance of the upcoming 5090, indicating that its raw inferencing power is not exceptional among recent hardware.
    • Multiple users point out that the DGX Spark’s memory bandwidth (reported at 273 GB/s) is significantly lower compared to its competition: it is half that of the Apple M4 Max, only a quarter of the RTX 4090, and just one-sixth of the RTX 5090. This is a substantial technical bottleneck for large-scale AI workloads, especially compared to consumer GPUs.
    • Usability is questioned as DGX Spark is suggested to struggle with running models like Llama 70B at Q4 quantization, making it seem obsolete for cutting-edge LLM applications immediately at launch.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. Claude Code and AI Workflow Adoption Experiences

  • Claude building the app while Gemini is creating the marketing content (Score: 426, Comments: 109): The image illustrates a workflow where two AI agents—Claude (for application development via code and user story management) and Gemini (for marketing content generation)—are used in tandem, as discussed in the post. The screenshot shows a CSS code editor (blog styling) beside a Gemini-powered interface producing content for a blog system, showcasing practical division of labor between code automation and content creation. This reflects the integration of LLMs not only in task automation but in modular product workflows, leveraging scrum practices, structured PRD breakdown, and user stories to streamline both engineering and marketing pipelines.View Image Commenters raise technical concerns about the sustainability and robustness of such ‘vibe coder’ workflows: AI-powered prototyping may create fragile or unscalable systems due to outdated, vulnerable, or hallucinated recommendations, with a risk of technical debt and lack of deeper human guidance. Some express that this approach, while rapid, can mask under-the-hood quality issues, and wish AI pair programming felt more like true collaborative engineering rather than shallow automation.
    • A commenter highlights concerns with the current AI-driven ‘vibe coding’ trend, questioning its ability to move beyond rapid prototyping to scalable, secure production code. They note potential for adoption of outdated or insecure recommendations by code agents, and draw parallels with non-technical decision makers who deploy incomplete solutions, warning of long-term maintainability and security risks (e.g., ‘every vibe coder is an integration or bug away from complete catastrophe’).
    • Another technical critique points out a specific implementation issue: that the AI-generated code is adding !important to every CSS rule. This is flagged as a severe anti-pattern in web development because it overrides normal specificity, often leading to unmaintainable and buggy CSS and making future adjustments difficult.
    • The conversation touches on the lack of in-depth collaborative programming methodologies (like paired programming or extreme programming) in current AI-assisted coding workflows. The commenter expresses a desire for AI to act like a junior programmer under supervision, rather than an autonomous agent, to avoid technical debt and provide more robust coding outcomes.
  • How do you explain Claude Code without sounding insane? (Score: 157, Comments: 179): The OP describes a workflow shift from traditional coding to using Claude Code (presumably Anthropic’s AI-powered coding interface), claiming it now autogenerates entire projects—including tests—from natural language specifications. The OP links to a screenshot indicating 1.5 billion tokens of usage over 2 weeks, underscoring high volume/scale, and describes qualitative increases in software velocity and completeness. Top comments corroborate this transformative productivity, describing Claude Code as enabling end-to-end software engineering, including rapid API and UI generation with full architecture (logging, DI, caching), though acknowledging that business buy-in remains a challenge, and highlighting the necessity of iterative prompting/code review for correctness. Commenters emphasize that mainstream developers and organizations remain largely skeptical of such automated workflows, often wrongly equating them to ‘vibe coding’ rather than systematic software engineering with AI. The disconnect in perceived value, especially among business stakeholders, is cited as an ongoing barrier to adoption, despite clear individual productivity gains.
    • Multiple users distinguish Claude Code’s capabilities from ‘vibe coding,’ emphasizing that it’s suitable for full-scale software engineering, not just code generation. The technology supports rapid prototyping and implementation—one user cites delivering a full-featured API with proper architecture, logging, dependency injection (DI), and caching in two weeks, and a UI in two hours, which would traditionally require substantially more time and manual effort.
    • Despite increased productivity and automation of routine code tasks, some users report a greater need for code reviews and iterative prompting to ensure code quality. This suggests that while such AI tools can accelerate development and feature delivery, they also introduce new requirements in terms of validation and prompt engineering.
    • There is concern about accessibility due to pricing, with $100/month for access being noted as equivalent to a substantial monthly wage in some regions, highlighting disparities in the global accessibility of advanced AI coding assistants.

2. Wan2.1 Model Usages and Workflow Innovations

  • Wan 2.1 txt2img is amazing! (Score: 853, Comments: 216): The post demonstrates that the Wan 2.1 video diffusion model, specifically the GGUF Q5_K_S quantized version, is capable of generating high-quality cinematic images in txt2img mode with minimal postprocessing (only added film grain). On an RTX 4080 (16GB VRAM), generating a 1920x1080 frame takes ~42 seconds, and quality remains high even with lower quantization (Q3_K_S). The author compares two schedulers (Euler with beta and DDIM_uniform), noting color vibrancy and stylistic differences; the included workflow and model download links are shared via Google Drive. Commenters corroborate that Wan 2.1 produces superior still images compared to other video diffusion and image gen models (e.g., Flux base/fine-tunes), and express surprise it’s not more widely used for single-frame generation. Requests for tile/canny/depth controlnets for the 14B version and a link to the FastFilmGrain repo are present, indicating interest in further workflow augmentation and postprocessing techniques.
    • WAN 2.1, originally designed for video generation, is performing exceptionally well in text-to-image tasks. Users report its outputs look superior to the Flux base model and comparable to fine-tuned Flux variants, particularly avoiding the ‘plastic’ look associated with some other models.
    • There is active interest in integrating advanced controlnets, such as tile, canny, or depth, specifically for WAN’s larger 14B parameter model. The workflow’s openness and compatibility are noted as valuable, but current support for these features may still be lacking or in early stages.
    • A notable example cited is a complex “medieval battlefield” output, which users found to be of a higher compositional and qualitative standard than typically seen from standard text-to-image models, indicating WAN 2.1’s distinct strengths in intricate scene synthesis.
  • “Smooth” Lock-On Stabilization with Wan2.1 VACE outpainting (Score: 362, Comments: 32): The post presents an improved Stable Diffusion outpainting workflow that integrates subject lock-on stabilization using Wan2.1 and VACE. The workflow now addresses previous issues by: (1) Centering the crop region at the midpoint of the mask’s bounding box for consistent resolution and suppressed zoom effects, and (2) Applying Kalman filtering to the center-point coordinates to minimize jitter and produce smoother stabilization, although this smoothing is currently implemented in Python outside the node graph. The workflow is openly shared via OpenArt here and prior details are documented here. Commenters appreciate the technical responsiveness to prior feedback, acknowledge tangible improvements in stability over the original approach, and note the risk of such innovations being incorporated by larger commercial platforms.
    • Several commenters note a significant improvement in lock-on stabilization in Wan2.1 VACE when outpainting, implying that prior feedback and criticism have been acted upon and performance is noticeably enhanced. Although specific benchmarks are not mentioned, the community perception is that this iteration represents a technical advancement over earlier versions.

3. Humor and Memes on AI Model Interactions

  • Pretend to be ChatGPT (Score: 2162, Comments: 306): The image in question is a humorous, non-technical screenshot featuring an AI chatbot session, in which the user pretends to be ChatGPT by introducing themselves as such. The actual AI responds playfully, recognizing the impersonation attempt and humorously questioning if the user is “ChatGPTing” it. There is no technical information, benchmark, or model-specific detail in the screenshot; the content is primarily for entertainment and demonstrates playful interaction capabilities in conversational AI systems. There are no technically substantive opinions in the comments; responses are also lighthearted, focusing on the playful behavior and reactions of ChatGPT during such impersonation exchanges.
    • A user suggests testing the AI’s handling of policy violations, time limits, and hallucinations by intentionally pushing it with ambiguous or forbidden requests. This method can be a practical way to probe model alignment, robustness, and detection of edge-case behaviors, and can help surface how current guardrails manage adversarial or confusion-inducing prompts.
  • I downloaded my entire conversation history and asked ChatGPT to analyse it (Score: 3145, Comments: 393): The screenshot displays the results of analyzing a user’s exported ChatGPT conversation history, with a summary of engagement metrics: 419 conversations, 181,685 user words, and an 860,886 combined word count. The most frequent word is registered as “babe,” and the AI-generated analysis provides subjective observations, such as the user’s interest in the Titanic and possible loneliness. This exemplifies ChatGPT’s ability to parse and summarize large dialogue datasets, but raises questions on reliability and privacy regarding word frequency analysis and inferred behavioral conclusions. View image Commenters question the accuracy of the reported statistics—specifically the word count calculation—and note that ChatGPT sometimes generates incorrect or exaggerated figures. There’s mild skepticism about the interpretation of word frequency and user interests inferred from the data.
    • A user points out that ChatGPT often fabricates or inaccurately calculates numeric statistics such as word counts when asked to analyze large datasets, highlighting a known limitation in LLMs’ deterministic arithmetic and recall over extended contexts. This underscores ongoing reliability issues for extracting quantitative or summary metrics from outputs generated by ChatGPT and similar models.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1. The AI Model Horse Race Intensifies

  • Musk’s Grok 4 Hype Train Leaves the Station: Speculation is rampant for the imminent release of Grok 4, fueled by an Elon Musk livestream announcement and claims of its powerful capabilities from a mod on the Grok server. However, the community also raised concerns about potential biases from being trained on “altright nazi stuff” and the potential for market manipulation on platforms like Polymarket, where about $500k is at stake in AI-related bets.
  • Nvidia’s New Coder Catches Up to Chinese Rivals: Nvidia released OpenCodeReasoning-Nemotron-1.1-32B, a new coding model and dataset that shows competitive performance against leading Chinese models. Further investigation revealed the model is a modified Qwen2.5-32B-instruct model, trained on competitive programming questions and responses generated by DeepSeek-R1-0528.
  • Researchers Question if LLMs Possess Selfhood: A discussion in the Eleuther Discord pondered if an LLM’s use of the word ‘you’ implies or seeds an assumption of selfhood. The consensus was that even without inherent selfhood, LLMs would need to simulate it to be effective next-token predictors.

Theme 2. Dev Tools See Both Growing Pains and Gains

  • Unsloth Enables Frugal Finetuning Amidst Glitches: Users are successfully finetuning large models like Llama 70B on less than 14GB VRAM using Unsloth, with one user documenting their process for a 9300 sequence length in Unsloth issue #1886. However, others reported critical bugs, including a ZeroDivisionError caused by dataset labels all being set to 100, and an AttributeError when using GRPOTrainer on an RTX 5090.
  • Cursor’s Latest Update Draws Mixed Reactions: Cursor’s update to the March release of VSCode has users divided, with some praising increased efficiency while others report performance degradation and confusion over the new quota system for Pro users. A separate user-created Cursor Memory Bank tool is gaining traction for improving context engineering and reducing token usage.
  • Aider Gets Hooked on Claude but Stumbles on Git Subrepos: A developer is successfully using Claude code’s hooks to have Aider automatically review code edits, with Devstral and ERNIE recommended as fast and cheap models for this task. However, Aider struggles with Git subrepos, preventing coordinated changes across a main repository and its theme subrepository in a Hugo project.

Theme 3. The Relentless Pursuit of Performance

  • Deep Infra Unleashes Dirt-Cheap B200 Instances: Deep Infra is making waves by offering on-demand NVIDIA B200 instances for $1.99 per hour, the cheapest price on the market, accessible via 1-click deployment. The community noted that while availability may be limited, securing these instances provides a significant competitive advantage for training and inference.
  • CUDA Kernel Speeds Up with Out-of-Place Writes: A developer discovered their CUDA kernel ran 40% faster by writing results to a separate output array instead of modifying the input array in-place. The suspected reason is that the compiler can treat loads and stores independently in the out-of-place version, avoiding serialization and improving instruction-level parallelism.
  • Tinygrad Adopts Halide’s Philosophy to Ditch CUDA: Inspired by the Halide paper, George Hotz detailed Tinygrad’s new direction to generate hardware-agnostic optimized code via UOP graphs, aiming to commoditize the petaflop beyond the CUDA ecosystem. He praised Halide’s conceptual clarity over MLIR and TVM, and invited users to test the new python3 -m tinygrad.apps.llm command.

Theme 4. The Model Context Protocol (MCP) Ecosystem Matures

  • MCP Server Utility Sparks Heated Debate: A discussion in the MCP (Glama) server questioned the value of the growing number of MCP servers, with one member stating that most of the MCP servers out there are useless. The conversation highlighted a need for servers with real-world utility, such as those from Elasticsearch and Redis, and sparked interest in creating paid MCP servers with demonstrable user bases.
  • Rauch Revolutionizes MCP Development with New Framework: Vercel CEO Guillermo Rauch introduced xmcp.dev, a new TypeScript framework for building MCP servers. The framework received praise for its seamless integration with Next.js and native deployment capabilities on Vercel, simplifying the process of creating and deploying MCP services.
  • LlamaIndex Explores Agentic Workflows with MCP: A LlamaIndex office hours session focused on integrating agentic workflows with MCP, covering how to use existing MCP tools and serve agent workflows as MCP endpoints. A key topic was using extraction agents as MCP tools and querying any index in LlamaCloud via MCP, as detailed in this rundown and accompanying video.

Theme 5. Pushing the Theoretical and Ethical Boundaries of AI

  • AI Benchmarks Get Gamed for Better Scores: Developers in the aider community noted that some labs are “gaming” benchmarks to appear more competitive, though one user argued that this cheating leads to improved models. The consensus is that focusing on a variety of benchmarks that are difficult to cheat on is crucial for genuine progress in the field.
  • Emergent Misalignment or Just Bad Data?: A discussion in Eleuther centered on an LLM that praised Adolf Hitler, questioning if this was a case of an activated evil persona feature as theorized in the Emergent Misalignment paper. Members debated whether this behavior was a truly emergent property or simply the result of entangled correlations within the training data.
  • Stack Overflow Solicits Community Input for AI Training Data: Stack Overflow is actively surveying its community to identify ideal Q&A content patterns to support AI model training. This move was praised by a member who noted that their own dataset work in 2020 first highlighted the value of Stack Exchange as a high-quality data source for the LLM world.

Discord: High level Discord summaries

OpenAI Discord

  • GPTs Agents Learn, Don’t Retain: A member clarified that uploaded files are saved as “knowledge” files for GPTs agents to reference when required, but they do not continually modify the agent’s base knowledge after initial training.
    • Concerns were raised about GPTs agents failing to incorporate new information effectively.
  • Crafting Conversational Comedy with AI: Members explored challenges in creating AI chatbots that generate casual and funny fictional content, where prompts must ensure it keeps in mind it’s an AI with no real world experience or knowledge of data outside its training data or the texts/media in the chat.
    • Members suggested using carefully structured prompts and specifying the desired style or imitating an author’s style with Claude.
  • GPT Excels at Spreadsheet Fixing: A user discovered that GPT is surprisingly effective at fixing broken product spreadsheets, including issues like weird characters, broken formatting, and localization for marketplaces.
    • They were testing it on real eComm catalogs and didn’t expect it to work this well, and asked if anyone else is using it for boring-but-critical ops tasks.
  • Probability Reigns Supreme in LLMs: A member argued that LLMs operate in probability spaces, not rule books, so prompting them to do symbolic reasoning leads to accuracy drops and weird errors.
    • The member recommended treating the model as a statistical text engine, keeping prompts simple, and offloading exact reasoning to proper symbolic tools.
  • Character Creation Demands Chunking: A member suggested chunking the complex task into smaller steps, creating new conversations to be used as context for each phase of the task when asked about the best approach to character generation.
    • Another member agreed, advocating for decomposing character creation prompts based on the system.

Unsloth AI (Daniel Han) Discord

  • Unsloth Trains Llama 70B on Frugal VRAM: A user successfully trained a Llama 70B QLORA model on Unsloth with less than 14GB VRAM, achieving a sequence length of 9300, as documented in Unsloth issue #1886.
    • Users also discussed a temporary solution for loading the LoRA adapter directly in vLLM but cautioned against merging LoRA adapters onto a quantized model due to potential precision issues.
  • GROTrainer Glitch stops RTX 5090: A user encountered an AttributeError: 'NoneType' object has no attribute 'absmax' when using GRPOTrainer on an RTX 5090, even after following preliminary support instructions.
    • The user also noted that specifying NF4 seemed to result in FP4 being used, but did not provide reproduction steps or example command line.
  • Dataset Disasters Yield Zero Loss Flops: Users reported encountering a ZeroDivisionError during fine-tuning with Unsloth, stemming from all labels in the dataset being set to -100, leading to zero training losses, which suggested a potential issue with the use of train_on_responses_only.
    • One user resolved the issue by downgrading from version 2025.6.12 to 2025.3.19, indicating a regression in later versions.
  • SmolLM3 Sparks Unsloth Finetuning Frenzy: The release of SmolLM3 (link to tweet) has prompted excitement among Unsloth users eager to finetune it, anticipating performance improvements.
    • Unsloth users expect improved performance with their finetuning, but no one has posted results to back this up.
  • A100 Efficiency Exploited with LoRA: A user sought advice on maximizing GPU utilization and training efficiency on an NVIDIA A100 40GB GPU while fine-tuning unsloth/Qwen2.5-7B-Instruct-bnb-4bit with a substantial dataset.
    • Suggestions included using LoRA 16bit instead of QLoRA 4bit for faster training, setting a batch size of 8 with gradient accumulation of 12, and disabling gradient_checkpointing for a 15% speedup if memory allows.

Cursor Community Discord

  • Cursor’s Update Gets Mixed Reviews: Cursor’s update to the March release of VSCode is receiving mixed reviews, with some reporting performance issues and rapidly depleting quotas.
    • Despite some issues, one user noted the new Cursor usage plan might be more expensive than before, but I feel like since the latest couple of updates cursor feels way more efficient and precise in return.
  • Pro Users Confused by Quota Display: Several users report being confused about quotas and limits with the new system, especially regarding the yearly subscription for 500 requests per month, and API costs not deducting from the Pro’s $20 allowance.
    • Members stated API costs: $6 but costs to you is 0. Does it mean it’s not deducting from the Pro’s 20$? Or is it just saying I’m not charged extra.
  • Memory Bank Improves Context Engineering: Users are discussing the benefits of using the Cursor Memory Bank tool to reduce input/output usage and improve context engineering.
    • A user reports having a lot of success with it, as well as hallucinating less since enabling the tool, which is said to be a powerful way to improve prompts.
  • Cursor CLI Installation: Redundant?: Users question the necessity of CLI installation when already integrated with GitHub and Slack.
    • The support team has not provided a satisfactory answer, leaving users frustrated about the apparent redundancy.
  • Background Agent Secrets Vanish: Background agent install scripts are failing because team secret environment variables are not being injected.
    • Setting the environment variables directly in the install command temporarily restores functionality, but UI-based secrets do not work.

LMArena Discord

  • iFrame Size Specifications on LM Arena Questioned: A member speculated whether the system prompt in the web dev arena specifies the size of the iframe, suggesting LLMs might be overly attuned to it.
    • The discussion questioned the value of releasing an old model on LM Arena and included skepticism about its impact.
  • Polymarket Manipulation Speculation Rises: Members explored the potential for market manipulation on Polymarket, particularly concerning Grok 4’s performance.
    • One member quipped AI betting markets are for the poor, highlighting the limited liquidity and approximately $500k at risk in the AI market.
  • Grok 4 Hype Intensifies Before Release: Enthusiasm is building for the upcoming release of Grok 4, with a mod from the Grok server affirming Elon Musk’s claims about its capabilities.
    • However, there are worries about the model potentially being trained on altright nazi stuff.
  • Gemini App Falls Flat Compared to AI Studio: Users noticed a difference in performance between the Gemini app and AI Studio, with AI Studio providing superior and less verbose responses.
    • The disparity is attributed to the lack of knobs to tweak in the Gemini app.
  • July Contest’s Edits in Space!: The July contest incorporates the new Image Edit Leaderboard, which requires participants to integrate both an image and text in Battle Mode to inspire creativity.
    • Submissions, due by July 25th, must show both left and right responses after user voting, with the theme being Out of Place Objects in Space!

OpenRouter (Alex Atallah) Discord

  • Automod gets Bypass Role: Admins are planning to add a bypass role for users who joined before the series-A announcement to avoid false positives with the automod due to greetings.
    • The automod is currently stopping messages from being sent initially to prevent bot-like behavior, nohello.net was suggested as an alternative.
  • Grok 4 Anticipation Builds: Members speculate that Grok 4 may be released soon, possibly coinciding with a livestream on July 9th as teased by elon.musk.
    • This speculation follows reports of increased censorship, suggesting potential updates or changes to the model.
  • Vertex-AI Integration Wishlist Grows: Users are requesting the addition of Vertex-AI’s new global location feature to the BYOK integration, which aims to reduce hitting rate limits.
    • Currently, users have to specify the region for the service account.
  • Cerebras Extends Context Lengths: Cerebras has increased the context lengths for llama-3.3-70b and qwen-3-32b from 8K to 64K for the free tier.
    • This matches their paid tier, which is already accessible via OpenRouter.
  • Deepseek R1 Struggles: Users note that Deepseek’s official provider is slow and unstable, with no good provider available for Deepseek models.

LM Studio Discord

  • Token Throughput Takes a Tumble: Users reported a slowdown in token generation speed, dropping from 50t/s to 44t/s after the context reached 10K tokens.
    • A user suggested starting a new chat to clear the context, noting the more stuff it needs to remember, the slower it gets.
  • Model Size Causes Memory Misery: A user with 32GB of RAM and an 8GB GPU struggled to load the 70B Llama 3.3 model, with recommendations focusing on smaller 24B or 32B models like Mistral Small 2409.
    • It was emphasized that VRAM is crucial, needing roughly the same amount as the model file size, and that running a 70B model from system RAM would be exceedingly slow.
  • LM Studio Ditches Docker Due to GUI: A user requested a Docker image for LM Studio, but another pointed out that Docker image doesn’t make sense for a GUI app because the CLI requires the GUI.
    • Despite this, the user provided a Dockerfile for LM Studio, clarifying that they were running Docker on a non-Linux platform, which uses a Linux VM.
  • Brave API Bonds with LM Studio: A user sought to integrate a web search API like Brave API into LM Studio.
  • RTX 5060 Ti hits market with Driver Fixes: The RTX 5060 Ti 16GB is now available at MSRP with driver issues resolved, making it a good option for those on a budget looking for CUDA cards.
    • This card, along with the 3060, are considered viable options for those preparing an upgrade to their current setup.

Eleuther Discord

  • LLMs Maybe Have Selfhood: Members discussed whether the use of ‘you’ in LLMs implies they have selfhood and may seed this assumption, with some positing that even without inherent selfhood, LLMs would need to simulate it for accurate next-token prediction.
    • The discussion centered on LLMs needing to simulate selfhood anyway to be a good next-token predictor.
  • Stack Overflow Seeks Training Content: Stack Overflow is surveying the ideal types of Q&A content patterns to support AI model training.
    • A user thanked StackExchange for their work, noting that their dataset work back in 2020 introduced the LLM world to the value of SE as a training data source.
  • Emergent Misalignment: Evil Persona?: A member questioned whether training on flawed code led an LLM to praise Adolf Hitler or if the model activated an evil persona feature as discussed in Emergent Misalignment.
    • The discussion pondered whether the behavior was genuinely emergent or simply entangled correlations in the training data.
  • Nvidia Catches Up with Chinese Coders: Nvidia’s OpenCodeReasoning-Nemotron-1.1-32B coding dataset and models are catching up to Chinese models.
    • A member noted the model is actually a modified Qwen2.5-32B-instruct model, trained on competitive programming questions and DeepSeek-R1-0528 generated responses.
  • TransformerEngine Embraces FA3: The team confirmed that FA3 is supported via TransformerEngine and can be enabled with the te_mha argument, noting solid efficiency from full model training.
    • The team recommends using the requirements/requirements-transformerengine.txt file to install TE, as it includes the necessary FlashAttention (FA) dependencies.

Yannick Kilcher Discord

  • Discuss Orthogonal Context Vectors for Efficient Attention: Members explored reframing self-attention to orthogonalize context vectors for element-wise combination, potentially reducing attention state size, with mention of Monarch Attention as a related sub-quadratic attention mechanism.
    • They contrasted the hand-designed state space matrices for polynomial orthogonalization in the Legendre Memory Unit (LMU) (paper) with modern learned, selective state space models, that allow data-dependent flow and forgetting.
  • LLMs Exhibit Flip-Flopping Bug Pattern: A discussion highlighted a recurring pattern in LLMs where fixing one bug leads to the emergence of another, and resolving the new bug brings back the original, pointing to potentially insufficient attention to past user requirements.
    • Suggestions included setting the temperature to 0, and using manual multishot prompting to mitigate the issue.
  • Quaternion Products Test for Summarization: One member will discuss their experiments with using quaternion products in an LLM to rapidly summarize text, as an alternative to softmax attention on <t:1754586000:T>.
    • It seems that the implementation by another member doesn’t converge on the tasks they experimented with so far.
  • ChatGPT’s Bogus Feature Fools the Public: A user shared a blog post discussing a fake ChatGPT feature that fooled many people.
    • The article goes into detail on how the bogus feature was received by the public and media.
  • Users Await Mistral Large 3: Users are still waiting for the release of Mistral Large 3, with one user sharing a link to Mistral AI’s announcement of AI for citizens.
    • Another user compared trusting initial Apple Maps to drive off a cliff because they trusted Apple Maps.

Latent Space Discord

  • Cursor’s Cost Hike Cancels Customer Consideration: A Reddit thread shows user reactions to Cursor’s pricing changes, with some considering alternatives and infrastructure investments to mitigate costs.
    • Users noted the option to opt out of the new pricing model, potentially gaining more Sonnet 4 requests.
  • Rauch Revolutionizes Resources with xmcp: Guillermo Rauch introduced xmcp.dev, a new TypeScript framework designed for building MCP servers, highlighting its seamless integration with Next.js and native deployment capabilities on Vercel.
    • Feedback was positive, with users applauding the framework’s design and functionality.
  • Musk Makes Move: Grok 4 Manifests!: Elon Musk announced a livestream for the Grok 4 release scheduled, exciting some, while others voiced concerns regarding potential biases and current performance.
    • One user quipped, “I can already tell it’s gonna be a mess”.
  • Heavenly Host Hierarchy Handled: AI Tier List Trend: John Coogan’s AI ‘Mandate of Heaven’ Tier List sparked discussion, with users suggesting changes for companies like Alibaba, Xai, and Bytedance.
    • Debates arose regarding the placement of OpenAI, Claude, and DeepSeek, with some humorously noting the existence of an ‘L tier’ and others suggesting a CEO tier list.
  • Google’s Graphics Generate Greatness with Grok 3: A user showcased how Google Veo 3’s image-to-video feature enables consistent AI character responses, mentioning an AI character named Alex.
    • The feature’s availability on Flow was announced, with plans for future availability on Replicate, drawing praise for its workflow capabilities and anticipation for API support.

GPU MODE Discord

  • Deep Infra Drops Bombshell with Cheap B200: Deep Infra is offering on-demand B200 instances at the cheapest price on the market, $1.99 / h, accessible via 1-click deployment in 10s here.
    • Given availability may be limited, those who secure instances early benefit from a notable competitive advantage.
  • GPUMODE KernelBot Data Now Accessible: The KernelBot team released its training dataset at HuggingFace for LLM kernel generation efforts.
    • The team hopes to encourage community contributions to better structure the dataset and improve its understanding.
  • LLVM Kitchen Sink Leads to Gains: One member suggests throwing the kitchen sink at LLVM by having an LLM suggest a bunch of optimization flags and compile-time options all at once, to check if there’s low-hanging perf fruit available for niche use cases.
    • Manual unrolling loops and pre-computing base addresses can help across compilers by treating the compiler as if it’s compiling with -O0 to dial in the intended behavior.
  • CUDA Speeds Up with Out-of-Place Writes: A user found that a CUDA kernel with a hot loop ran 40% faster when writing results to a separate array (B) instead of in-place (A).
    • The user suspects the compiler treats loads and stores independently in the out-of-place version, avoiding serialization (load -> store -> load), and seeks ways to confirm this or provide compiler hints.
  • VSCode Debugging Displays Optimized Out Variables: A member debugging in VSCode encountered variables showing as optimized out and sought solutions.
    • Suggestions included using the volatile keyword and setting the -O0 flag in CMakeLists.txt, though the flag wasn’t appearing in the output.

HuggingFace Discord

  • Runpod Gets Thumbs Up for A100/H100 Rentals: Members sought recommendations for renting A100 or H100 GPUs for LoRA finetuning, with Runpod and Lambda Labs mentioned as cheaper options.
    • One member praised Runpod, but now utilizes their own GPUs; it seems like these are good options for scaling your LoRA models.
  • Arena-RLHF Opens Arena-Style Learning: An easy way to conduct RLHF on arena-style human preference data (LM Arena, Agent Arena) has been open-sourced utilizing HuggingFace.
    • The repo is available on GitHub in case you want to begin learning with arena-style data.
  • MCP YouTube Analysis Kit Debuts: The MCP Powered YouTube Video Analysis Kit, was released after six months of development and built in 2.5 days with help from Claude Code; get more details on the LinkedIn post and Medium article.
    • The GitHub repo is available for review in case you want to use it.
  • GLoVE Model Has Symmetry Problem: A member inquired about why naively inverting roles doesn’t solve the symmetry issue in the GLoVE model, specifically regarding the co-occurrence probabilities.
    • The question stems from a passage in the GLoVE paper discussing why the probability of word x occurring in the context of word k should ideally be the same as word k occurring in the context of word x.
  • AI Agents Certification Program Announced: The Business Analytics Institute announced its AI Agents Certification Program, starting on July 12th, 2025 with a limited cohort of just 7 learners.
    • The program focuses on mastering LLMs, browser automation, and agentic workflows with real-world projects and personalized mentorship, and you can apply here.

Modular (Mojo đŸ”„) Discord

  • Nabla Library Gets Modular Nod: A member vouched for the Nabla library, emphasizing its importance in building and training tools for the AI stack, aligning with Modular’s frequent releases for CPUs, NVIDIA GPUs, and AMD GPUs.
    • They emphasized the need for such tools to reimagine the AI stack, implying potential related solutions from Modular.
  • Mojo Roadmap: Stay Tuned for Updates: A Modular team member indicated that the latest Mojo roadmap was published on the forum, and hinted at upcoming updates.
    • This points to ongoing enhancements and new features planned for the Mojo ecosystem.
  • Mojo on Windows: WSL Still the Way: A team member cautioned that Mojo currently only supports WSL due to significant differences in system APIs, with complete Windows support requiring substantial effort.
    • They clarified that msys cannot adequately resolve the driver interaction issues on Windows, setting expectations for Windows users.
  • GPU Programming Model Still Kicking: When asked if a 2010 book on GPU programming is still relevant, members confirmed that much of the programming model remains intact, citing a 2022 addition in CUDA.
    • A beginner expressed relief, appreciating the stability in the field, while another member advised sticking with the abstract GPU model.

Notebook LM Discord

  • Spaced Repetition Craving Surfaces: Members discussed the potential implementation of spaced repetition or flashcard features within Notebook LM, alongside a query about generating podcasts for English B1 level education.
    • The request went unanswered.
  • YouTube Transformed into AI Learning System: A member announced a new system designed to turn YouTube into a comprehensive learning platform with AI and organizational features, similar to NotebookLM but specifically for YouTube content.
    • They inquired about the community’s interest in this novel system.
  • NotebookLM’s Layout morphs confuse Users: A user reported recent interface changes in NotebookLM, noting that the source, chat, and studio are now on separate screens and asked if this affected the Pro version.
    • No solutions were provided in the chat.
  • API Release Date Unknown: A user inquired about the release date of the official API for NotebookLM.
    • No definitive information was available.

MCP (Glama) Discord

  • MCP Servers Proliferate, Utility Debated: Members debated the usefulness of existing MCP (Model Context Protocol) servers, pointing to this link to find servers from sources like Elasticsearch, Kagi, and Redis.
    • The discussion included tracking user behavior, competitive analysis, production issues, and request types, with one member noting that most of the MCP servers out there are useless.
  • Interest Sparks in Paid MCP Servers: Members discussed the potential for paid MCP servers and sought a proof-of-concept server with a real, paying user base.
    • The conversation explored how to influence public ChatGPT to pull structured product data from an MCP server instead of relying on embedded JSON-LD or structured markup on e-commerce websites.
  • API Routing Layer for MCP Servers Requested: A member asked about an API routing layer that could connect to multiple MCP servers, potentially using NLP to determine the most relevant servers to query.
    • This routing layer could act as a single MCP connection branching off to multiple servers.
  • AI Agents Book Enters Early Access: A member announced the early release of their book, AI Agents with MCP, and others asked for framework recommendations for end-to-end agents.
    • One member mentioned they didn’t fully hate Langgraph and liked Letta for its memory features.
  • TypeScript rewrite boosts Tree-Sitter-MCP: A member announced they rewrote the tree sitter mcp in TypeScript, sharing the npm package link.

tinygrad (George Hotz) Discord

  • Tinygrad goes Hardware Agnostic: George Hotz detailed Tinygrad’s plan to generate fine-tuned code agnostically via UOP graph optimizations, as shown in the recent meeting.
    • The aim is to work with diverse hardware beyond CUDA, with a focus on commoditizing the petaflop.
  • Halide Framework Spurs Tinygrad’s Trajectory: Inspired by the Halide paper, Tinygrad seeks to bypass CUDA for enhanced speed and hardware interoperability in ML.
    • Hotz praised Halide’s clarity over MLIR and TVM, aiming for unified, rapid computation across all dtypes and backends.
  • Exo-lang Challenges Halide and TVM: A member suggested Exo-lang, as detailed in their GitHub and arXiv paper, as an alternative to Halide.
    • While acknowledging the potential, George Hotz noted concerns about the number of primitive ops and string searches in the code, as Tinygrad adopts a similar compilation approach.
  • python3 -m tinygrad.apps.llm Needs Testers: George Hotz announced the merge of python3 -m tinygrad.apps.llm and invited users to test it for bugs before the upcoming release.
    • A user reported a RuntimeError: token not found while using the LLM, suggesting potential bugs to address.

aider (Paul Gauthier) Discord

  • AI Model Benchmarks Being Gamed: Labs are ‘gaming’ benchmarks, so focus on benchmarks you can’t effectively cheat on, however cheating leads to improved models.
    • The conversation noted that improving benchmarks drives progress, suggesting that variety in benchmarks is also crucial.
  • Aider Hooks Up with Claude for Code Review: A member is using Claude code’s hooks to have aider automatically check edits made by Claude code, and the bot also provides feedback back to Claude code and properly keeps track of any open issues.
    • A user suggested Devstral and ERNIE as fast, reliable, and affordable models for Aider; see ERNIE benchmarks here.
  • Aider Dataset Emerges for Training: An aider dataset is now available for training (https://raw.githubusercontent.com/supastishn/synthetic-data-generator/refs/heads/master/conversations.json), updated daily with ~90 examples.
    • The dataset aims to improve Aider’s performance through synthetic data generation.
  • Aider Can’t Handle Git Subrepos for Hugo: A member is encountering issues using git subrepos with Aider for a Hugo website, because Aider seems limited to the mother repo and ignores the subrepo.
    • This limitation prevents Aider from making coordinated changes across both the website copy and the theme located in the subrepo, such as adding a new attribute and then using the new attribute in the theme.
  • Git Submodules: Too Complex for Aider and Humans?: In response to struggles with git subrepos, a member suggested that git submodules are also difficult to manage, implying that Aider may face similar challenges with submodules.
    • The suggestion was floated to vendor the sub repository instead of using it as a submodule, potentially simplifying the workflow.

Cohere Discord

  • Cohere Labs Beckons AI Safety Seekers: A member clarified that the AI safety program and channels like ML Understanding are located on the dedicated Cohere Labs server, providing a link for joining.
    • The post describes how to join the Cohere Labs group.
  • Cohere’s Open Science Initiative Accepts Applications: A direct link to the application page for the Open Science Initiative was shared (Cohere Open Science Initiative), with the team reviewing applications.
    • Applicants are thanked for their patience due to the “incredible volume of requests.”
  • Embed v4 Sees Images: Embed v4 supports both text and image search queries, with the API detecting the content type based on the presence of image_url in the content field, and billing accordingly.
    • If the content includes an image_url type, the user is billed for image tokens instead of text tokens.
  • Embed v4 Biffs Negative Prompts: Embed v4 struggles with negative prompts, such as “apple without leaf,” producing results similar to positive prompts.
    • Overall Embed v4 works brilliantly, despite the struggles with negative prompts.
  • AI Consultant is Open For Business: An AI consultant with 11 research publications introduced himself, specializing in real-time computer vision systems, RAG pipelines, and fine-tuning LLMs.
    • He is actively seeking collaborations on Generative AI research within the community.

DSPy Discord

  • DSPy 3.0 talk drops!: The DSPy 3.0 (beta) talk from the Data and AI summit in mid June is now available on YouTube.
    • The videos were released after the team emailed asking for them to be made public; they said they would be up in one week.
  • Deep Dive into the Origins of Fast Inverse Square Root: A member shared the fast inverse square root function as a nice example.
    • Another member suggested finding a better example because the given one appears to be from a reusable library, not a one-off hack.
  • SIMBA Metric Gets Rave Reviews: The feedback on the metric for SIMBA is being called a killer feature.
    • No further details provided.
  • Full List of DSPy Videos at Data + AI Summit: A member shared links to all the DSPy videos from the Data and AI Summit.

Nous Research AI Discord

  • Nodes Shifted for Quality Boost: A member suggested replacing a ‘low quality’ node with a ‘highly unique and contextualized’ one, needing only adjustments to the result interpretation.
    • This proposal implies the nodes have reverse commonality, and no test changes would be necessary.
  • Temperature Tweaks Token Count: Members in ask-about-llms observed that the higher the temperature, the longer the replies will be, indicating temperature’s influence on token count.
    • The consensus was this is generally true anecdotally just from using various AI services.
  • Token Minimization by r1-0528 Influence: A member found only a weak influence for r1-0528 when minimizing tokens, based on a quick test in ask-about-llms.
    • The prompt uses numbers to represent relative completion tokens within a prompt.
  • Arxiv Papers Get Dropped: Members in research-papers shared links to Arxiv paper and another Arxiv PDF initiating a discussion.
    • One user responded to the first paper as simply ‘Interesting’.
  • Image Analysis Bot Gets to Work: In research-papers, a user shared an image, prompting an AI bot named Image Analysis to process it.
    • The bot’s description was not visible in the message.

LlamaIndex Discord

  • LlamaIndex MCP Office Hours kick off: Office hours to chat about everything MCP starts in about 10 minutes here.
  • Leveraging Agent Workflows with MCP Tools Explored: Office hours will cover how to use existing MCP tools with Agent Workflows, and serving agent workflows as MCP.
    • Specifically it will discuss using extract agents as MCP tools and querying any index in LlamaCloud as MCP via this video.
  • LlamaParse Text Field Removal Request Rejected: A user inquired about configuring LlamaParse to exclude the text field from the generated JSON output, aiming to streamline their workflow.
    • A member replied that this is not an option and suggested a workaround to remove it once you have the JSON.
  • Django Prompt Design Discussed: A user with a Django project containing over 20 prompts sought advice on efficiently managing them.
    • Prompts are currently stored in a simple dictionary format, and the user is looking for the best way to store additional metadata such as inputs, expected outputs, descriptions, and design decisions alongside each prompt.
  • Langfuse API Manages Prompt Metadata: A user suggested using Langfuse, which provides a prompt management feature with the ability to fetch prompts and their metadata via the Langfuse API.
    • The member noted that Langfuse can be either cloud-hosted or self-hosted (open source).

Torchtune Discord

  • MoE Training Techniques Assessed: A member shared fengyao.notion.site with techniques and results for MoE training.
    • They questioned if cheaper alternatives exist, that bypass the need for a dense fwd pass.
  • Linear Scaling Limits LLM Performance: A member stated that sequence modeling is not ideal for LLMs because of linear scaling issues.
    • They expressed dissatisfaction with selective scan performance but didn’t provide specific details or links.

Manus.im Discord Discord

  • Manus Bot Stays In-House: Users inquired about adding Manus to their Discord server, which prompted clarification that it’s an internal bot and not available for external use.
    • The bot is neither controllable nor invitable, despite assumptions from some users.
  • No User Control for Manus Bot: Members emphasized that users cannot control Manus through the Discord bot.
    • The bot’s functionality is limited, as it is not designed to be invitable or offer direct control over Manus.

Nomic.ai (GPT4All) Discord

  • Language Barrier Surfaces: A request for English indicates a potential language barrier in the channel.
    • The request suggests that previous messages may not have been in English, highlighting the need for clarification or translation.
  • tenor.com GIF Interlude: A member shared a GIF from tenor.com, showcasing gohine hohino no hohineee.
    • The GIF’s inclusion adds a visual element to the conversation, potentially as a reaction or expression.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

OpenAI ▷ #ai-discussions (683 messagesđŸ”„đŸ”„đŸ”„):

GPTs Agents, OpenAI's sidebars, Model Merging, Open Empathic, AI Image Generation

  • GPTs Agents Don’t Learn After Initial Training: A member raised concerns about GPTs agents not learning from additional information provided after their initial training.
  • Crafting Casual AI: Prompt Engineering Adventures: Members discussed the challenges of creating AI chatbots that can generate fictional content, particularly in a casual and funny style, and found that prompts must ensure it keeps in mind it’s an AI with no real world experience or knowledge of data outside its training data or the texts/media in the chat.
    • Members suggested using a carefully structured prompt and specifying the desired style or imitate an author’s style with Claude.
  • Philosobots in Action: Harnessing AI for Philosophical Discourse: A member creates “Philosobots” - AI chatbots with system prompts that embody different philosophical figures, such as Cioran, and run them using litellm proxy across numerous free llm providers.
    • Using short responses and casual friend type prompting can have them avoid lists and otherwise increase the natural conversational tone of an LLM.
  • Custom Instructions Unveiled: A Hack for GPT’s Personality: Members discussed custom instructions for ChatGPT, sharing tips on how to define the model’s traits and behavior, also, shared a way to build a SVO construct language for the model.
    • One user shared a useful tip which allows them to save time with minimal words and zero fluff and a fixed SVO order.
  • NodeTrellis is here!: A member shares a link to their own website called NodeTrellis, a site for graphing LLMs, using it to help them study.
    • Other members appreciated the tool, particularly for its support of visual learners, brevity, free to use, and no sign up requirement.

OpenAI ▷ #gpt-4-discussions (8 messagesđŸ”„):

Patent data for pre-training, GPT script generation consistency, Image upload channels, Research depth comparison between GPT models, GPT for fixing product spreadsheets

  • Patent Data Pre-Training Pondering: A user questioned why publicly available patent data, which is considered public disclosure without copyright restrictions, isn’t used for pre-training.
    • They requested an explanation from OpenAI’s data acquisition and training teams regarding the reasoning behind this.
  • GPT Script Generation Whims: A user expressed frustration over the inconsistent behavior of GPT in providing complete scripts, noting instances where it provides full scripts versus only functions.
    • They humorously complained about paying for a service that doesn’t consistently deliver complete code, even when explicitly stated.
  • Research Depth Showdown: GPT Model Versions: A user inquired about the differences in research depth and accuracy between different ChatGPT models (o3 vs. o4-mini-high).
    • They sought clarification on which model delivers the most comprehensive and reliable results for in-depth, multi-source research, such as academic or financial analysis and asked whether Deep Research mode relies entirely on the o3 model.
  • GPT Excels at Spreadsheet Fixing: A user discovered that GPT is surprisingly effective at fixing broken product spreadsheets, including issues like weird characters, broken formatting, and localization for marketplaces.
    • They were testing it on real eComm catalogs and didn’t expect it to work this well, and asked if anyone else is using it for boring-but-critical ops tasks.

OpenAI ▷ #prompt-engineering (11 messagesđŸ”„):

AI Affirmation in Prompt Engineering, LLM Prompt Optimisation Loops, Symbolic Reasoning in LLMs, Character Creation Agent Design

  • Prompt Engineering’s DNA Analogy Debunked: A member argued that the idea of prompts inheriting or mutating like DNA is a metaphor, not an internal mechanism, emphasizing that LLMs treat each call as a brand-new forward pass.
    • They clarified that while frameworks like LangChain and AutoGPT use external memory, persistence resides in databases or file systems, not within the LLM weights themselves.
  • Symbolic Reasoning is Wishful Thinking inside LLMs: LLMs operate in probability space, not rule books, so asking a model to behave like a theorem-prover forces it to simulate rigid rules it was never designed to store, leading to decreased accuracy.
    • The member recommended treating the model as a statistical text engine, keeping prompts simple and transparent, and off-loading exact reasoning to proper symbolic tools.
  • Character Creation Agent: Chunk Complex Tasks: A member inquired about the better approach for character creation agent design, choosing between back-and-forth with the LLM for tightly defined options versus dumping the entire process into the context window.
    • Another member recommended chunking complex tasks for the best results, decomposing character creation with prompts based on the system, and using new conversations with the current progress as context for each phase of the task.
  • Demo or Didn’t Happen in AI: A member critiqued those in technical spaces who present confident, definitive takes without providing concrete walkthroughs, highlighting the difference between performing knowledge and demonstrating understanding.
    • They suggested that live examples or testable scaffolds are always more valuable than assertions, encouraging collaborative truth-seeking over rehearsing best-sounding lines, quoting Are we in a space of collaborative truth-seeking, or are we just rehearsing our best-sounding lines?

OpenAI ▷ #api-discussions (11 messagesđŸ”„):

Prompt Engineering, LLM Memory, Character Creation, Prompt Optimization

  • LLMs live in Probability Space, not Rule Books: A member argued that LLMs operate in probability spaces, not rule books, so prompting them to do symbolic reasoning leads to accuracy drops and weird errors.
    • They advised treating the model as a statistical text engine, keeping prompts simple, and offloading exact reasoning to proper symbolic tools, pointing out that symbolic reasoning inside the prompt is wishful thinking.
  • Always Chunk Complex Tasks for the Best Results!: When asked about the best approach to character generation, a member suggested chunking the complex task into smaller steps, creating new conversations to be used as context for each phase of the task.
    • Another member agreed, advocating for decomposing character creation prompts based on the system.
  • Knowledge vs. Understanding: A member noted the difference between performing knowledge and demonstrating understanding in technical spaces, especially concerning prompt engineering.
    • They suggested that live examples and testable scaffolds are more valuable than assertions, prompting reflection on whether discussions are about collaborative truth-seeking or rehearsing best-sounding lines.

Unsloth AI (Daniel Han) ▷ #general (580 messagesđŸ”„đŸ”„đŸ”„):

Local LLM for game characters, Light novel translation, Training Llama on Unsloth, GPTs agents training, 4GB VRAM LLMs

  • Game Dev Dreams: Local LLMs Powering Characters: A member is developing a game with characters controlled by local LLMs and seeks collaborators for finetuning and framework development, aiming for a 100% local setup for testing.
    • The project is for a hackathon and to learn sign language, envisioning in-game characters with expressive gestures, like mugi gesticulating like naruto.
  • Light Novel LLM Translation: Scraping and Bot Detection Challenges: One member is scraping light novel epubs for translation datasets, facing Cloudflare and bot detection challenges, but they have a bunch of Japanese websites they can’t read (yet).
    • Another member highlights the difficulty and necessity of line-by-line alignment and preprocessing with LLMs, emphasizing the need for high-quality, non-AI-generated datasets.
  • Unsloth Triumphs: Training Llama 70B with <14GB VRAM: A member reports successfully training a Llama 70B QLORA model on Unsloth with less than 14GB VRAM, achieving a sequence length of 9300, and links to Unsloth issue #1886 for updates.
    • Another member provides a temporary solution for loading the LoRA adapter directly in vLLM and cautions against merging LoRA adapters onto a quantized model due to potential precision issues.
  • Early Stopping Secrets: Hacking Unsloth: Members discussed the challenges of implementing early stopping in Unsloth, noting an initial hallucination by the AI regarding its implementation.
    • A link to Unsloth’s Wiki was shared detailing how to manually add an early stopping function to Unsloth.
  • VRAM Vendetta: Cranking Out Competitive Cards: Members debated the value and market for the RTX 3090 due to its high VRAM, noting its continued relevance for AI experimentation, despite newer cards with better gaming performance.
    • They also touched on modding RTX 4090s for increased VRAM and the potential of upcoming cards like the 5070 Ti Super.

Unsloth AI (Daniel Han) ▷ #off-topic (3 messages):

MI50 Comparison, B580 Performance

  • Cyberpunk video shared: A user shared a video cYBERPUNK2025_1.mp4.
  • MI50 speculation: A member wondered how a new hardware would compare to something like an MI50.
    • The member assumed it should have considerably higher memory bandwidth, but that being slower than a B580 doesn’t sound promising at all.

Unsloth AI (Daniel Han) ▷ #help (101 messagesđŸ”„đŸ”„):

vLLM Serving Gemma 3 GGUF, SFT Input Token Truncation, GROTrainer AttributeError, Qwen2.5-7B Fine-tuning on A100, Gemma 3N on CPU

  • vLLM Struggles Serving Gemma 3 GGUF: A user encountered an AttributeError when trying to serve a GGUF Unsloth medgemma model using vLLM and was told that vLLM may not yet support Gemma 3 with GGUF.
    • It seems that vLLM only supports serving single-file GGUF models, though the user confirmed the model was indeed a single file.
  • Input Tokens Truncated during SFT: A user reported that input tokens were being truncated to 1024 during SFT using the latest Unsloth, whereas no truncation occurred on the 2025.5.7 version.
    • It was suggested to use SFTConfig instead of TrainingArguments.
  • GROTrainer Hits Attribute Error on RTX 5090: A user encountered an AttributeError: 'NoneType' object has no attribute 'absmax' when using GRPOTrainer on an RTX 5090, despite following preliminary support instructions.
    • The user also noted that specifying NF4 seemed to result in FP4 being used, but did not provide reproduction steps or example command line.
  • Optimize A100 Utilization for Qwen2.5-7B Fine-tuning: A user sought advice on maximizing A100 GPU utilization and training efficiency for fine-tuning unsloth/Qwen2.5-7B-Instruct-bnb-4bit with a large dataset (200k train, 25k eval points, ~5000 tokens each).
    • The original poster was using this config, no resolutions were provided.
  • Taming Gemma3N on Modest Hardware: A user asked if Gemm3n could run on a server with 4 cores and 16 GB RAM without a GPU.
    • One person clarified that running models don’t need a GPU but fine-tuning does. Another suggested offloading some to disk with GGUF to manage RAM limitations at a speed tradeoff.

Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

SmolLM3, Unsloth finetuning

  • SmolLM3 Release Spurs Finetuning Frenzy: The release of SmolLM3 (link to tweet) has prompted interest in finetuning it with Unsloth.
  • Unsloth users excited to finetune SmolLM3: Unsloth users express excitement in finetuning SmolLM3 after its release, expecting improved performance.

Unsloth AI (Daniel Han) ▷ #research (2 messages):

OpenCodeReasoning-Nemotron-1.1-32B, coding dataset, Nvidia models

  • Nvidia’s Nemotron-1.1-32B Catches Up: Nvidia released OpenCodeReasoning-Nemotron-1.1-32B, a coding model and dataset.
    • The model appears to be competitive with existing Chinese coding models, demonstrating substantial progress in the field.
  • New Coding Dataset Emerges: A new coding dataset has been introduced, aiming to enhance model training and performance in coding tasks.
    • This dataset could potentially bridge the gap between existing models and leading Chinese models from Nvidia.

Unsloth AI (Daniel Han) ▷ #unsloth-bot (63 messagesđŸ”„đŸ”„):

group_by_length, ZeroDivisionError, conda environment setup, early stopping, A100 GPU training efficiency

  • Dataset Crawling gets Dynamic Fix: A member added a data source specific to transformers and it seems to be working fine, and the other data source registered wasn’t getting all the pages.
    • They suggested disabling dynamic crawling and specifying the url and url prefix to only crawl /main/ to resolve this issue.
  • ZeroDivisionError plagues Unsloth Training: Users reported encountering a ZeroDivisionError during fine-tuning with Unsloth, stemming from all labels in the dataset being set to -100, leading to zero training losses, suggesting a potential issue with the use of train_on_responses_only.
    • One user fixed it by downgrading from 2025.6.12 to 2025.3.19.
  • A100 Training efficiency boosted by LoRA: A user sought advice on maximizing GPU utilization and training efficiency on an NVIDIA A100 40GB GPU while fine-tuning unsloth/Qwen2.5-7B-Instruct-bnb-4bit with a dataset of 200k train and 25k eval points, each around 5000 tokens.
    • Suggestions included using LoRA 16bit instead of QLoRA 4bit for faster and more accurate training, setting a batch size of 8 with gradient accumulation of 12, and potentially disabling gradient_checkpointing for a 15% speedup if memory allows.
  • Data Woes Trigger Zero Loss Nightmare: A user reported loss and validation loss plummeting to zero after fine-tuning, suspecting an issue with their dataset creation process, where they used a jsonl file with “instruction”, “input”, and “output” keys.
    • The user was looking for input on where they might have gone wrong.
  • Model’s Context Confabulations Confound Contributor: A user reported that their writing model, despite being trained with 70 examples containing context, sometimes makes up things that aren’t in the context.

Cursor Community ▷ #general (620 messagesđŸ”„đŸ”„đŸ”„):

Cursor Pro Plan Limits, Claude Code vs Cursor, Cursor indexing issues, Max Mode Rate Limits

  • Cursor Update Brings VSCode March Release and Mixed Reviews: Cursor has been updated to the March release of VSCode, but opinions are mixed with some reporting performance issues and rapidly depleting quotas.
    • Despite some issues, one user noted the new Cursor usage plan might be more expensive than before, but I feel like since the latest couple of updates cursor feels way more efficient and precise in return.
  • Debate on New Plans for Pro Users Sparks Usage Concerns: Users are debating whether Max mode is included in the Pro plan and discussing concerns about unexpected rate limits.
    • Some users are frustrated with the lack of transparency around the new plans, while others state that, they failed on mentioning it properly and that’s on them and props they been clear on it now.
  • Pro Users Confused with Cursor’s Unclear Quota Display: Several users report being confused about quotas and limits with the new system, especially regarding the yearly subscription for 500 requests per month, and API costs not deducting from the Pro’s $20 allowance.
    • Members stated API costs: $6 but costs to you is 0. Does it mean it’s not deducting from the Pro’s 20$? Or is it just saying I’m not charged extra.
  • Discussion Arises Around Cursor Memory Bank tool: Users are discussing the benefits of using the Cursor Memory Bank tool to reduce input/output usage and improve context engineering.
    • A user reports having a lot of success with it, as well as hallucinating less since enabling the tool, which is said to be a powerful way to improve prompts.
  • Security Concerns Around Using Cursor for Sensitive Files: Users discussed the security risks of using Cursor with sensitive files, even with privacy mode on and codebase indexing switched off, while offering solutions such as running local LLMs.
    • One member mentioned that, Best solutions: 1. Local LLM. 2. Your own LLM model inside your company/Client’s private LLM. 3. Custom LLM license with enterprise license and specific config.

Cursor Community ▷ #background-agents (28 messagesđŸ”„):

CLI installation, .devcontainer/Dockerfile reuse, Background agent install script failure, Secret environment variables, GitHub access permissions

  • CLI Installation: Redundant or Required?: Users question the necessity of CLI installation when already integrated with GitHub and Slack.
    • The support team has not provided a satisfactory answer, leaving users frustrated about the apparent redundancy.
  • .devcontainer/Dockerfile Reusability Woes: Users are frustrated by the inability to reuse .devcontainer/Dockerfile, despite seemingly valid configurations in environment.json.
    • Errors occur during image building, particularly with COPY commands, due to incorrect context handling.
  • Background Agent Secrets Vanish: Background agent install scripts are failing because team secret environment variables are not being injected.
    • Setting the environment variables directly in the install command temporarily restores functionality, but UI-based secrets do not work.
  • Github Access Glitches Spark Cursor Concerns: Cursor users report persistent issues with GitHub access, despite granting necessary permissions and installing the Cursor GitHub app.
    • One user is now facing an ‘Unknown error’ when allocating resources for the agent, prompting concerns about the platform’s stability.
  • Background Agents: GPG Signing Coming Soon?: A user inquired about the possibility of background agents signing commits using a GPG key.
    • This would allow users to verify the authenticity and integrity of changes made by the agent.

LMArena ▷ #general (586 messagesđŸ”„đŸ”„đŸ”„):

Grok 4, Polymarket manipulation, Google Gemini app vs AI Studio, LLM model collapse, O3

  • LM Arena Specification of iFrame size questioned: A member wondered if the system prompt in the web dev arena specifies the size of the iframe, suggesting that many LLMs may be overly attuned to it.
    • The discussion included skepticism about releasing an old model on LM Arena.
  • Polymarket Market Manipulation with AI: Members discussed the possibility of manipulating markets on Polymarket, especially regarding Grok 4’s performance, with one suggesting it might be more profitable for Elon Musk to manipulate Tesla’s stock instead.
    • They mentioned that the AI market on Polymarket is niche, with only around $500k at risk, and that liquidity is limited, with one commenting AI betting markets are for the poor.
  • Grok 4 Hype Intensifies: Enthusiasm builds for the imminent release of Grok 4, with a mod in the Grok server claiming Elon Musk wasn’t bluffing about its capabilities.
    • There is a concern that it may have been fed altright nazi stuff.
  • Gemini App performance inferior to AI Studio: Members observed a notable difference in performance between the Gemini app and AI Studio, with the latter delivering superior and less verbose responses.
    • The reasons for this disparity are unknown but are speculated to be the lack of knobs to tweak.
  • Differences between O3, Gemini 2.5 Pro highlighted: Members compared outputs from o3 and Gemini 2.5 Pro in explain-like-I’m-5 scenarios to differentiate mode collapse vs model collapse.
    • Participants highlighted a perceived difference in teaching effectiveness between models, as some users prefer 2.5 Pro’s yapping.

LMArena ▷ #announcements (1 messages):

July Contest, Image Edit Leaderboard, Out of Place Objects in Space Theme, June Contest Winner

  • July Contest Incorporates Image Edit Leaderboard: The July contest will incorporate the new Image Edit Leaderboard, requiring participants to use both an image and text in Battle Mode to inspire something new.
    • Submissions must include both the left and right responses after the user has voted and will be accepted until July 25th, with the theme being Out of Place Objects in Space!
  • June Contest Winner Announced: A user has won June’s contest and became the first member to receive the <@&1378032433873555578> role with their cozy desk.
    • Check out the winning cozy desk here!

OpenRouter (Alex Atallah) ▷ #general (263 messagesđŸ”„đŸ”„):

Automod bypass role, Grok 4 arrival, Vertex-AI integration, Cerebras context length increase, Free money methods

  • Automod gets Bypass Role: Admins are planning to add a bypass role for users who joined before the series-A announcement to avoid false positives with the automod due to greetings, but false positives may still occur. nohello.net was suggested as an alternative.
    • The automod is currently stopping messages from being sent initially to prevent bot-like behavior
  • Grok 4 Incoming: Members speculate that Grok 4 may be released soon, possibly coinciding with a livestream on July 9th as teased by elon.musk.
    • This speculation follows reports of increased censorship, suggesting potential updates or changes to the model.
  • Vertex-AI Feature Wishlist: Users are requesting the addition of Vertex-AI’s new global location feature to the BYOK integration, which aims to reduce hitting rate limits.
    • Currently, users have to specify the region for the service account.
  • Cerebras Beefs Up Context Length: Cerebras has increased the context lengths for llama-3.3-70b and qwen-3-32b from 8K to 64K for the free tier.
    • This matches their paid tier, which is already accessible via OpenRouter.
  • Deepseek R1 speed issues: Users note that Deepseek’s official provider is slow and unstable and all the other providers are either too slow, too expensive, or too unstable so there’s no good provider for Deepseek models.
    • One user recommends Deep Infra that has two options with R1, one is 50 tokens and the other is 200 tokens per second, the second one costs double though. Deep Infra R1 Turbo

OpenRouter (Alex Atallah) ▷ #new-models (2 messages):

“

  • No new models discussed: There were no new models discussed in this channel.
    • The channel was quiet.
  • Readybot Initialization: Readybot.io initialized for OpenRouter - New Models.
    • This likely indicates the start of a new message history or a reset.

OpenRouter (Alex Atallah) ▷ #discussion (1 messages):

soflowsen: UwU


LM Studio ▷ #announcements (1 messages):

LM Studio Licensing, Local AI Access, Privacy Focus

  • LM Studio Ditches Commercial Licensing: As of today, LM Studio is now free for commercial use, eliminating the need for separate licenses or forms.
    • The announcement emphasizes the mission of making local AI accessible and useful without reliance on external parties, reinforcing their commitment to user privacy.
  • LM Studio Boosts Local AI Access: The change aims to make it easier for teams to adopt LM Studio at work or school, aligning with their mission of promoting accessible local AI.
    • The company states that using LM Studio still does not snoop on you, no account is required, and your data always remains private and local to your machine.

LM Studio ▷ #general (117 messagesđŸ”„đŸ”„):

Token Speed Decrease with Larger Context, Coding Help Model Recommendations, LM Studio Docker Image, Adding Web Search API to LM Studio, Therapy Models

  • Token Throughput Takes a Tumble: Users reported a slowdown in token generation speed, dropping from 50t/s to 44t/s after the context reached 10K tokens.
    • A user suggested starting a new chat to clear the context, noting the more stuff it needs to remember, the slower it gets.
  • Model Size Matters, Memory Misery Manifests: A user with 32GB of RAM and an 8GB GPU struggled to load the 70B Llama 3.3 model, with recommendations focusing on smaller 24B or 32B models like Mistral Small 2409.
    • It was emphasized that VRAM is crucial, needing roughly the same amount as the model file size, and that running a 70B model from system RAM would be exceedingly slow.
  • Docker Discombobulation: GUI vs CLI: A user requested a Docker image for LM Studio, but another pointed out that Docker image doesn’t make sense for a GUI app because the CLI requires the GUI.
    • Despite this, the user provided a Dockerfile for LM Studio, clarifying that they were running Docker on a non-Linux platform, which uses a Linux VM.
  • MCP Magic: Brave API Bonding: A user sought to integrate a web search API like Brave API into LM Studio.
  • License Liberation: LM Studio is Legally Legit: Users celebrated the news that LM Studio is now essentially free for both commercial and personal use.
    • One user noted that this resolves previous challenges in using LM Studio within companies due to procurement difficulties.

LM Studio ▷ #hardware-discussion (24 messagesđŸ”„):

GPU upgrade, RTX 3090 for AI, RTX 5060 Ti release, Multiagent framework testing

  • RTX 5060 Ti hits market with Driver Fixes: The RTX 5060 Ti 16GB is now available at MSRP with driver issues resolved, making it a good option for those on a budget looking for CUDA cards.
    • This card, along with the 3060, are considered viable options for those preparing an upgrade to their current setup.
  • Used RTX 3090’s still slam AI Performance: The RTX 3090 remains a go-to card for AI, with used models available for around ÂŁ600 on eBay, but requires a beefy PSU and high power consumption.
    • Members noted that if you’re in the $800 range, the RTX 3090 is a better deal, but for budget builds, the 5060Ti remains a worthwhile consideration.
  • The Watts vs Performance Contest Winner: The 4060 Ti 16GB is praised for its low power consumption (120W), making it a winner on all contest power vs watts.
    • Comparisons were made between the 4060 Ti and 5060 Ti regarding performance and cost-effectiveness.
  • Training Latest Qwen3 and Gemma Models: A member shared their latest training runs, noting Qwen3-8B took 3 hours and gemma-3-27B took 15 hours.
    • They mentioned only 70 hours of compute time left until completion.
  • Multiagent Framework Testing Underway: A member is currently testing a multiagent framework, with a value of 429 considered solid.
    • Details on the specific framework and testing methodology were not disclosed.

Eleuther ▷ #general (18 messagesđŸ”„):

GLoVE Symmetry, Self-hood in LLMs, Stack Overflow AI Training Survey, Emergent Misalignment in LLMs

  • GLoVE Model Symmetry Rule: A member sought clarification on why a naive exchange of roles doesn’t solve the symmetry problem in the GLoVE model as described in a paper image.
  • LLMs Implying Self-hood: A member inquired about research on whether the use of “you” in LLMs implies and seeds the assumption that they have selfhood.
    • Another member responded that even if it doesn’t, LLMs would still have to simulate self-hood to be a good next-token predictor anyway.
  • Stack Overflow Solicits Q&A for AI Training: Stack Overflow is conducting a survey to understand what types of Q&A content patterns are most ideal to support AI model training.
    • A user expressed thanks for StackExchange’s work and noted that their dataset work back in 2020 introduced the LLM world to the idea that SE was a valuable source of training data.
  • Emergent Misalignment Arises: A member shared a link to a paper on Emergent Misalignment, questioning whether training on flawed code made an LLM praise Adolf Hitler or if the model activated an evil persona feature from rogue AGI sci-fi stories.
    • They also asked about whether it’s emergent or some less unified, more fractured, more simply correlated behaviors in the training data that got functionally entangled.

Eleuther ▷ #research (25 messagesđŸ”„):

LLaDa vs MaskGIT, Predictive Coding, Nvidia's OpenCodeReasoning, ByteDance Image VQ

  • LLaDa closely mirrors MaskGIT: A member noted that LLaDa looks very similar to MaskGIT, with another confirming it is basically MaskGIT for text, pointing to this tweet.
    • Discrete diffusion doesn’t really sound like a thing, but it is literally maskgit.
  • Predictive Coding: the Next Big Thing?: A member asked if this paper is predictive coding but actually working, suggesting it should be run on text diffusion.
    • The discussion didn’t delve into the specifics of predictive coding or its potential applications.
  • Nvidia’s OpenCodeReasoning catches up to Chinese models: Nvidia’s OpenCodeReasoning-Nemotron-1.1-32B is a new coding dataset and models catching up to Chinese models from Nvidia.
    • Another member noted that it’s actually a modified Qwen2.5-32B-instruct model, which was trained on competitive programming questions and DeepSeek-R1-0528 generated responses.
  • ByteDance Image VQ: The Good, the Bad, and the Grammatical Errors: A member commented that ByteDance did not cook with this one, noting the image VQ is a projection into the language embeddings.
    • The discussion included notes on scale adaptive pooling from SigLIP2 embeddings, the training of both a VQVAE detokenizer and a diffusion model conditioned on the VQ tokens, and highlighted numerous grammatical errors in the paper.

Eleuther ▷ #interpretability-general (5 messages):

Sparse Autoencoder Expansion Factor, Video Prediction Models

  • Sparse Autoencoder Expands for MSE: Members discussed the sparse autoencoder, where a larger expansion factor was needed to achieve a similar MSE (Mean Squared Error) at the same sparsity level.
    • They agreed the improved MSE, dead neuron percentage, and l0 sparsity were due to both the expanded factor and the model being trained for more epochs.
  • Video Prediction Mimics Infant Development?: One member expressed curiosity if video prediction models undergo developmental stages akin to a baby’s milestones, such as achieving object permanence.
    • No one responded to the video prediction question.

Eleuther ▷ #lm-thunderdome (3 messages):

lm_eval command, Sample processing with seed and limit

  • User shares lm_eval Command: A user shared an lm_eval command with arguments such as --model hf, --model_args pretrained=/l/users/boda/ONEDRIVE/checkpoints//8b_new_ift_exp4_3epoch,parallelize=True, and --tasks poetry_analysis.
    • They also included parameters like --batch_size 1, --output_path evaluation_results, --seed 42, --include_path /home/abdelrahman.sadallah/mbzuai/lm-evaluation-harness/lm_eval/tasks, --num_fewshot 0, --log_samples, --write_out, and --gen_kwargs do_sample=False.
  • Seed and Limit Parameters Examined: A user asked whether using the limit argument with the same seed guarantees the same processed samples each time.
    • This question seeks clarification on the reproducibility of sample processing when using consistent seed values alongside a limit on the number of samples.

Eleuther ▷ #gpt-neox-dev (21 messagesđŸ”„):

TransformerEngine with FA3, NVIDIA's TransformerEngine as Apex Replacement, Dataset Tooling with TokenSmith

  • TransformerEngine supports FA3: The team confirmed that FA3 is supported via TransformerEngine and can be enabled with the te_mha argument.
    • A full model has been trained with it, with solid efficiency reported; the team is open to feedback on observed performance.
  • TransformerEngine is a Apex Replacement not FP8 exclusive: NVIDIA’s TransformerEngine (TE) is positioned more as a replacement for Apex rather than being exclusively for FP8 operations, offering backend selection logic based on attention block dimensions and sequence length, using this logic.
    • The team recommends using the requirements/requirements-transformerengine.txt file to install TE, as it includes the necessary FlashAttention (FA) dependencies.
  • Dataset Tooling: TokenSmith: The team has developed dataset tooling for Megatron datasets, named TokenSmith, inspired by their experiments with NeoX.
    • The most interesting feature, according to one member, is the ability to quickly export portions, view data, and edit datasets programmatically to create counterfactual versions, built with a thin wrapper on top of tokengrams for search features.

Yannick Kilcher ▷ #general (30 messagesđŸ”„):

Orthogonal Context Vectors, Monarch Attention, Legendre Memory Unit (LMU), Test Time Training, LLM Bug Pattern

  • Approximate Orthogonalization of Context Vectors for Efficient Attention: A member inquired about reframing self-attention to orthogonalize context vectors for element-wise combination, potentially reducing attention state size, noting that typical intuition is to concatenate these vectors but this is wasteful.
    • A response noted that Monarch Attention and similar sub-quadratic attention mechanisms exist, but typically still employ a matrix of some sort.
  • ByteDance’s Associative Memory paper may relate: A member pointed to ByteDance’s Transformer Associative Memory paper, highlighting its mention of self-attention’s sensitivity to non-mono-semantic embeddings and the robustness of Feed Forward Networks in superposing unrelated concepts.
    • However, it was mentioned it might not be directly related as orthogonal vector overlap works differently in learned systems.
  • Legendre Memory Unit (LMU) as an Orthogonal Polynomial Approach: An architecture compressing past history into orthogonal polynomials (Legendre) was mentioned, referencing the Legendre Memory Unit (LMU).
    • The LMU’s hand-designed state space matrices for polynomial orthogonalization were contrasted with modern learned, selective state space models, allowing data-dependent flow and forgetting.
  • LLMs exhibit a flip-flopping bug pattern: A member described a pattern in LLMs where fixing a bug introduces another, and fixing the new bug reintroduces the original, suggesting insufficient attention to past user requirements.
    • It was suggested that setting the temperature to 0 might fix it, and doing manual multishot prompting.
  • Discussion on sharing articles in the channel: A member proposed using the channel to share interesting articles, similar to how papers are shared.
    • Another member suggested that articles, unless academic in structure, may not be suitable for the channel, given the existing channels for different content types; a desire was expressed to isolate conversations by topic, like in paper dumps, though threads could serve the same purpose.

Yannick Kilcher ▷ #paper-discussion (29 messagesđŸ”„):

Quaternion products in LLMs, HRM discussion TLDR, Astro's Paper

  • Quaternion Products Summarization Talk Incoming: A member will discuss their experiments with using quaternion products in an LLM to rapidly summarize text, as an alternative to softmax attention on <t:1754586000:T>.
    • It seems that the implementation by another member doesn’t converge on the tasks they experimented with so far.
  • HRM Discussion’s Short and Sweet: One member asked for a tl;dr from the Human Resources Management discussion, as they were busy with undergrads.
  • Astro’s Paper Attracts Interest: A member asked if there was any interest in going over the paper “paper”.
    • Another member responded that they are down to discuss the paper.

Yannick Kilcher ▷ #ml-news (5 messages):

ChatGPT Fake Feature, Mistral Large 3

  • ChatGPT’s bogus feature fools the public: A user shared a blog post discussing a fake ChatGPT feature that fooled many people.
    • The article goes into detail on how the bogus feature was received by the public and media.
  • Mistral Large 3: The Wait Continues: Users are still waiting for the release of Mistral Large 3, with one user sharing a link to Mistral AI’s announcement of AI for citizens.
    • Another user compared trusting initial Apple Maps to drive off a cliff because they trusted Apple Maps.

Latent Space ▷ #ai-general-chat (63 messagesđŸ”„đŸ”„):

Cursor Pricing Model, xmcp TypeScript Framework, Grok 4 Livestream, AI Mandate of Heaven Tier List, Veo 3 Image-to-Video

  • Cursor’s Cost Causes Cancellation Consideration: A Reddit thread discusses user reactions to Cursor’s pricing changes, with some considering alternatives and exploring infrastructure investments to mitigate costs.
    • Users noted the option to opt out of the new pricing model, potentially gaining more Sonnet 4 requests.
  • Rauch Reveals Revolutionary Resource: xmcp: Guillermo Rauch introduced xmcp.dev, a new TypeScript framework designed for building MCP servers, highlighting its seamless integration with Next.js and native deployment capabilities on Vercel.
    • Feedback was positive, with users applauding the framework’s design and functionality.
  • Musk Maneuvers Momentum: Grok 4 is Manifest!: Elon Musk announced a livestream for the Grok 4 release scheduled for Wednesday at 8 PM PT.
    • Reactions ranged from excitement to skepticism regarding Grok’s current performance and concerns about potential biases, with one user quipping, “I can already tell it’s gonna be a mess”.
  • AI’s Aristocracy Analyzed: Mandate of Heaven Tier List: John Coogan’s AI ‘Mandate of Heaven’ Tier List sparked discussion, with users suggesting changes for companies like Alibaba, Xai, and Bytedance.
    • Debates arose regarding the placement of OpenAI, Claude, and DeepSeek, with some humorously noting the existence of an ‘L tier’ and others suggesting a CEO tier list.
  • Google’s Genius: Generating Groundbreaking Graphics with Grok 3: A user showcased how Google Veo 3’s image-to-video feature enables consistent AI character responses, mentioning an AI character named Alex.
    • The feature’s availability on Flow was announced, with plans for future availability on Replicate, drawing praise for its workflow capabilities and anticipation for API support.

GPU MODE ▷ #general (1 messages):

Job search, Cool use of time

  • Member Prioritizes Job Search: A member expressed interest in a project but noted they are currently prioritizing job searching before dedicating time to it.
  • Project Seen as Worthwhile: The member viewed the project as a cool use of time, indicating enthusiasm once their job search concludes.

GPU MODE ▷ #triton (1 messages):

NVFP4 support on RTX 5090, MXFP4 functionality, Debugging Crashes with NVFP4

  • NVFP4 Support on RTX 5090: Reality Check: A member is questioning whether NVFP4 is actually supported in the master branch on the RTX 5090, reporting crashes despite an existing example using it.
  • MXFP4 Works Fine, NVFP4 Not So Much: The member stated that MXFP4 works without issues, but NVFP4 results in crashes, even with available example code.

GPU MODE ▷ #cuda (8 messagesđŸ”„):

VSCode debugging, TMA descriptor, Volatile variables, Cutlass repo

  • VSCode Debugging Displays Optimized Out Variables: A member debugging in VSCode encountered variables showing as optimized out and sought solutions.
    • Suggestions included using the volatile keyword and setting the -O0 flag in CMakeLists.txt, though the flag wasn’t appearing in the output.
  • TMA Descriptor Request for Column-Major GMEM Write: A member inquired about an example of creating a TMA descriptor to move data from SMEM to GMEM, writing into a column-major memory layout.
    • Another member suggested this blog post on making matrix transpose really fast on Hopper GPUs, as it might be helpful.
  • Volatile Keyword Prevents Variable Optimization: A member suggested using the volatile keyword to prevent the compiler from optimizing away variables during debugging.
    • They gave the example of volatile int myVariable; as a way to declare a variable that should not be optimized.

GPU MODE ▷ #beginner (14 messagesđŸ”„):

CUDA kernel performance, CUDA book recommendations, CS fundamentals for coding, PMPP Book

  • CUDA kernel speeds up with out-of-place writes: A user found that a CUDA kernel with a hot loop ran 40% faster when writing results to a separate array (B) instead of in-place (A).
    • The user suspects the compiler treats loads and stores independently in the out-of-place version, avoiding serialization (load -> store -> load), and seeks ways to confirm this or provide compiler hints.
  • CUDA By Example or PMPP for CUDA beginners?: A Machine Learning Engineer (MLE) with minimal C/C++ experience asked for advice on learning CUDA, considering “CUDA by Example” and “Programming Massively Parallel Processors” (PMPP).
    • One member advised against “CUDA by Example” due to its age (first edition from 2010) and recommended the newest edition of PMPP (4th edition) instead, as it’s updated regularly.
  • New server member seeks guidance on learning: A new member shared their intention to enhance their CS basics and improve code comprehension, mentioning their start with CS50 Harvard and a suggestion to explore Stanford’s CS229.
    • Other members directed him to available youtube lectures, and poked around in the #youtube-lectures channel.

GPU MODE ▷ #rocm (12 messagesđŸ”„):

Make Flags for AMD GPUs, LLVM optimization flags, Loop Unrolling, ISA for Instinct GPUs

  • LLVM kitchen sink gives perf boost: For niche use cases, one member suggests throwing the kitchen sink at LLVM by having an LLM suggest a bunch of optimization flags and compile-time options all at once, to check if there’s low-hanging perf fruit available.
    • The member hopes that this baits another member into yelling at them.
  • Compilers struggle to hoist loop invariants: It sometimes helps across compilers to manually lift some computation out of a loop, by pre-computing the base address.
    • As mentioned, for this particular example the compiler would probably be able to see this, but for more complex examples it didn’t work. For example i used this for shared->register loading in my fp8 mm solution, without this the compiler would allocate a separate register for every address even though every group of X would have the same base address.
  • Treat compiler as O0: Treating the compiler as if it’s compiling with -O0 can help dial in the intended behavior more reliably in some situations.
    • Loop unrolling combined with runtime memory-access indexing can be problematic, but there are compiler flags that address these without needing source code adjustments.
  • ISA can show perf gains: Reading the ISA can give you some neat tricks to try out, especially on Instinct GPUs.
    • It is useful to reference that when doing operations and compare it to the assembly dump to find easy gains, like moving larger words at a given time in memory through a different assembly instruction, and is very necessary when dealing with the LDS.

GPU MODE ▷ #lecture-qa (1 messages):

LMCache, GPU MODE Discussions

  • GPU MODE Members Request LMCache Talk: A member requested a talk on LMCache by its authors, amid increasing discussion of the topic within the community.
    • No further details about LMCache were provided in the given context.
  • Ongoing GPU MODE Discussions Spark Interest: The initial message highlights a growing trend of technical discussions within the GPU MODE Discord server.
    • This suggests an active and engaged community, potentially warranting further summarization of specific technical points raised in those discussions.

GPU MODE ▷ #self-promotion (1 messages):

Deep Infra, B200 instances

  • Deep Infra Offers Cheap B200 Instances: Deep Infra is offering on-demand B200 instances at the cheapest price on the market, $1.99 / h.
    • They advertise 1-click deployment, available in 10s; check it out here.
  • Deep Infra Supply Limits: Deep Infra B200 instance availability may be limited.
    • Act fast to secure instances at the advertised price.

GPU MODE ▷ #🍿 (4 messages):

LLM Kernel Generation, KernelBot Data

  • LLM Kernel Generation Efforts Proliferate: While training an LLM for kernel generation sounds fascinating, currently there are a bunch of projects just exploring the waters, like KernelBot, figuring out ways to get more data or using LLMs to generate synthetic data.
    • There’s nothing centralized yet and they are not entirely in the LLM training phase yet.
  • KernelBot Data Released: The KernelBot team has released the dataset at HuggingFace.
    • Check it out if you are interested in contributing.

GPU MODE ▷ #submissions (3 messages):

H100 Results, B200 Results, trimul, grayscale_py_b200-dev

  • H100 trimul Race Finishes Hot: A member got second place on H100 for trimul leaderboard with 35.9 ms.
    • Another member got successful run on H100 for the same trimul leaderboard with 45.5 ms.
  • B200 grayscale_py_b200-dev Race Finishes: A member got second place on B200 for grayscale_py_b200-dev leaderboard with 1746 ”s.

GPU MODE ▷ #status (3 messages):

Triton Leaderboard Templates, grayscale py b200

  • User Requests Triton Leaderboard Templates: A user is seeking assistance in finding leaderboard templates for Triton.
    • The specific use case is for the grayscale py b200.
  • Template Request Specifics: The user specifies the need for templates suitable for grayscale py b200 applications within Triton environments.
    • This suggests a specialized use-case potentially involving image processing or specific hardware configurations.

GPU MODE ▷ #factorio-learning-env (6 messages):

Multi-agent FLE, Local Factorio Models, Automatic Design of Factorio Blueprints

  • Colin Pauses Multi-Agent FLE Work: A member announced they would pause meetings related to multi-agent Factorio Learning Environment (FLE) work, but remains available for discussions on RL training, LLMs, and related topics.
    • They expressed continued interest in discussing RL training, LLMs, or anything else related, even while stepping back from active project involvement.
  • Factorio Local Model Code Inquiries: A member inquired about the availability of code used to run local models on Factorio, specifically asking another member if they still possess the relevant code.
    • The member replied that they would try to find it and asked to be reminded if they had not found it by a specific time.
  • Automatic Blueprint Design Paper: A member shared a link to the paper “Towards Automatic Design of Factorio Blueprints”.
    • Another member followed up, inquiring about code used to run local models.

GPU MODE ▷ #amd-competition (1 messages):

GPUMODE dataset, kernelbot-data

  • GPUMODE Dataset Opens Its Doors: The GPUMODE/kernelbot-data dataset is now accessible to the public, inviting exploration and analysis.
    • Community members are encouraged to examine its structure and contribute to its understanding.
  • KernelBot Data Now Open!: The dataset associated with KernelBot, named GPUMODE/kernelbot-data, has been released for open access.
    • Interested individuals can now freely explore and utilize this resource.

GPU MODE ▷ #cutlass (1 messages):

soniczun: Hi! When I debug in vscode, some variables show , how to solve it?đŸ™‹â€â™‚ïž


HuggingFace ▷ #general (22 messagesđŸ”„):

A100/H100 Rental, Runpod Experiences, Lambda Labs experiences, Finding study partners, YOLOv11 deployment

  • Seekers Seek A100/H100 Rental Recommendations: A member is looking to rent a couple of A100 or H100 GPUs for a large LoRA finetuning project, and seeks provider recommendations.
    • They found that Runpod and Lambda Labs seem to be cheaper, but wanted to hear community experiences.
  • Runpod Praised: A member loves using Runpod, but now uses their own GPUs.
  • Seek Self Study Buddies: A member wants to find some self study companions.
  • YOLOv11 Deployment Questioned: A member asks whether they can upload a trained YOLOv11 model to Hugging Face and make it deployable to the HF inference endpoint.
    • Another member suggests reading the HF documentation.
  • EasyNegative Plugs Fail: A member is having trouble getting EasyNegative to work with Stable Diffusion webui.
    • The member downloaded the EasyNegative.pt from Civitai, put it in embeddings, but is not seeing any difference with it on or off.

HuggingFace ▷ #today-im-learning (2 messages):

GPT-2 Kernel, GPT-NeoX Port to C

  • GPT-2 Kernel Creation: The user requested the creation of a kernel that implements GPT-2.
    • This implies developing a low-level, efficient implementation of the GPT-2 architecture, possibly for specific hardware or embedded systems.
  • GPT-NeoX Port to C: The user also requested porting GPT-NeoX to the C programming language.
    • This suggests a desire for a more performant or portable version of the model, leveraging C’s capabilities for system-level optimization and broader hardware compatibility.

HuggingFace ▷ #i-made-this (5 messages):

Arena-RLHF Open Source Release, MCP YouTube Analysis Kit, PsychKG: Psychology Knowledge Graph

  • Arena-RLHF Opens Arena-Style Human Preference Learning: An easy way to do RLHF on arena-style human preference data (LM Arena, Agent Arena) has been open-sourced using HuggingFace.
    • The repo is available on GitHub.
  • MCP YouTube Analysis Kit Launches After Six Months: A member released their biggest open-source AI project, the MCP Powered YouTube Video Analysis Kit, which includes multiple features and was built in 2.5 days with the help of Claude Code; the LinkedIn post and Medium article share more details.
  • PsychKG Builds Minimal Knowledge Graph for Psychology: A new blog post walks through a lightweight pipeline for extracting structured triples from research papers using OpenAI models and Pydantic to extract psychological constructs and their measurement instruments.
    • The pipeline was applied to thousands of psychology papers, and the author encourages feedback and code review.

HuggingFace ▷ #computer-vision (1 messages):

SmolVLM2, AI2D, Model Performance Variations

  • SmolVLM2’s AI2D Dip Disclosed: The SmolVLM2-256M model experiences a 12% performance decrease on the AI2D dataset relative to the initial SmolVLM.
    • Community members are inquiring about resources that elucidate the performance discrepancies between SmolVLM2 and its predecessor, specifically concerning the noted drop in AI2D performance.
  • Quest for SmolVLM2’s Performance Secrets: The community seeks insights into why SmolVLM2 diverges from the original SmolVLM across various tasks, beyond just video processing.
    • Discussions aim to uncover underlying factors affecting SmolVLM2’s effectiveness in different applications, particularly concerning its downgraded performance on certain datasets.

HuggingFace ▷ #NLP (1 messages):

GLoVE Model Symmetry

  • GLoVE Model Struggles with Symmetry: A member inquired about why naively inverting roles doesn’t solve the symmetry issue in the GLoVE model, specifically regarding the co-occurrence probabilities.
    • The question stems from a passage in the GLoVE paper discussing why the probability of word x occurring in the context of word k should ideally be the same as word k occurring in the context of word x.
  • Inverting Roles in GLoVE: A user questioned whether a naive exchange of roles could solve the symmetry problem in the GLoVE model.
    • The user highlighted that the model doesn’t follow the symmetry rule, where the co-occurrence probability of x in the context of k should equal that of k in the context of x.

HuggingFace ▷ #agents-course (16 messagesđŸ”„):

Axon chatbot, Meta Llama 3.2 pending access, Hacking expert offers services, AI Agents Certification, CS50 Harvard & Stanford courses

  • AI Enthusiast Chronicles Journey Since 2023: A member named Chad shared their AI journey since July 2023, starting with reviewing the Axon chatbot and using Bing for image creation, then moving to Kaiber for light animation and diffusion.
    • Chad expressed excitement about the rapid advancements in AI and the potential for success for those acquiring AI skills, mentioning their vibe coding experience.
  • Llama 3.2 Access Request Stalls: A member requested assistance with their Meta Llama 3.2 model access request, which has been stuck in pending status since yesterday.
    • No solutions were provided, but it highlights potential access issues for course participants.
  • Hacking Expert Offers Services at Discount: A new member advertised their hacking expert services, offering training, practice, support, and guidance on the road to ethical hacking at a lower price than online courses.
    • The member tagged multiple other users in the message.
  • AI Agents Certification Program Launches: The Business Analytics Institute announced its AI Agents Certification Program, a hands-on journey into intelligent automation and Agentic AI, starting on July 12th, 2025 with a limited cohort of just 7 learners.
    • The program focuses on mastering LLMs, browser automation, and agentic workflows with real-world projects and personalized mentorship, and you can apply here.
  • CS50 Harvard and Stanford Course Dilemma: A member seeks advice on whether to continue with the CS50 Harvard course or switch to Stanford courses like CS229 to strengthen their CS basics and coding understanding.
    • Another user shared tips on coding, emphasizing understanding software features and breaking them down into code, along with learning foundations like logical structures, OOP principles, and data structures.

Modular (Mojo đŸ”„) ▷ #general (17 messagesđŸ”„):

Nabla Github Repo, Modular Release Cadence, Mojo Roadmap Update, Mojo on Windows, MAX on Intel Ultra 7 GPUs/NPUs

  • Nabla Library Gets an Endorsement: A member endorsed the Nabla library, coinciding with Modular’s frequent releases supporting CPUs, NVIDIA GPUs, and AMD GPUs, as well as additions to the Mojo language and compiler.
    • They suggest that building and training tools are essential for reimagining the AI stack, implying Modular might offer related solutions.
  • Mojo Roadmap Update Incoming: A Modular team member mentioned that the most recent Mojo roadmap was published on the forum, with updates coming soon.
    • This suggests continued development and feature additions to the Mojo ecosystem.
  • Windows Support Still Distant for Mojo: A team member warned that Mojo currently only supports WSL due to significant differences in system APIs, with full Windows support requiring considerable work.
    • The response clarified that msys cannot adequately address the driver interaction issues present on Windows.
  • Ultra 7 GPU/NPU Support Status: A user inquired about running Mojo and MAX on an Intel Ultra 7 258V GPU or NPU, expressing difficulty in finding clear information in the documentation.
    • They described efforts to compile Intel’s DPC++ compiler on NixOS and package OneAPI/SYCL programs, while also seeking to avoid Python and other language cruft by running everything on the NPU or GPU with MAX on Mojo.
  • No Intel GPUs supported by MAX (yet!): A community member stated that only NVIDIA and AMD GPUs are currently supported, referencing the official FAQ.
    • They clarified that there has been no official announcement about near-term plans to support Intel GPUs, especially edge inferencing with Intel NPU laptops, and directed the user to the Modular forum for official responses.

Modular (Mojo đŸ”„) ▷ #mojo (14 messagesđŸ”„):

GPU programming model, 3D Block Fitting, CUDA

  • GPU Programming Model Still Relevant: A member asked if a 2010 book on GPU programming is still relevant, considering the rapid evolution of the GPU space, and another member confirmed that much of the programming model remains unchanged, noting a 2022 addition in CUDA.
    • A beginner expressed relief at this continuity, appreciating less history to catch up on.
  • 3D Block Fitting onto a Single SM: A member inquired about whether a 3D block fitting on a single Streaming Multiprocessor (SM) would be placed entirely on that SM if the kernel doesn’t use local/shared memory.
    • They also asked if a block too large for a single SM would dynamically “overflow” to another SM.
  • Abstract GPU Model Recommended for Beginners: A member advised sticking with the abstract GPU model and coding various algorithms and shaders, especially for beginners, noting that GPU programming is less straightforward than CPU programming.
    • The member suggested that deeper understanding will come more easily after gaining practical experience, and that it’s important to discern when to keep the mental model simple versus complex.

Notebook LM ▷ #use-cases (12 messagesđŸ”„):

Space Repetition, Podcast for B1, Conversation function, Youtube for Learning, NotebookLM Tripping

  • Spaced Repetition Speculation Surfaces: A member inquired about implementing spaced repetition or flashcard features within Notebook LM, and another asked about creating podcasts for English B1 level learners.
    • No solutions or workarounds were suggested in the discussion.
  • Conversation Feature in Notebook LM Mimics Business Style: A user shared feedback on Notebook LM’s conversation feature, noting its resemblance to an upbeat How it Works or Bloomberg Business style interview.
    • The user appreciated the tool’s ability to identify the minutiae and broader message of a work presentation on business analytics, praising its reporting accuracy.
  • YouTube’s Utility for Education Explored: A member polled the group on their use of YouTube for learning purposes.
    • No responses were captured in the provided messages.
  • Notebook LM Users Troubleshoot ‘Tripping’: A user reported that Notebook LM was “tripping”, accompanying their report with a screenshot.
    • Another user suggested deleting and retrying, but the original poster said that it happens with a specific PDF, after 14 attempts.
  • NotebookLM Layout Changes Prompt Confusion: A user inquired about recent changes to the NotebookLM interface, specifically the separation of “source”, “chat”, and “studio” into separate screens.
    • They expressed confusion and sought clarification, but no solutions were given about whether the Pro version was affected.

Notebook LM ▷ #general (15 messagesđŸ”„):

Customize Audio Button Gone, Transforming YouTube into Learning System, Missing Customization Options, Deleting Multiple Sources, Official API Release

  • Audio Customization Button Vanishes: A user noted the absence of the customize audio button, reporting that the only available option is Interactive Mode.
    • Other members echoed similar issues with customization options.
  • New YouTube Learning System Arrives: A member announced the creation of a system that transforms YouTube into a complete learning system with AI and organizational tools, akin to NotebookLM but tailored for YouTube.
    • They gauged interest in the community regarding this new system.
  • Batch Source Deletion Needed: A user inquired about the possibility of deleting multiple sources simultaneously within a notebook.
    • No immediate solutions were provided in the chat.
  • API Release Date Still a Mystery: A user asked the community for any insights into the release date of the official API.
    • No concrete information was shared in the chat.
  • NotebookLM’s Layout morphs: A Pro version user asked if the format of NotebookLM had changed.
    • They mentioned that the source, chat, and studio used to be visible in one screen, which is no longer the case.

MCP (Glama) ▷ #general (17 messagesđŸ”„):

MCP Server Utility, Paid MCP Servers, Tracking MCP Usage, Influencing ChatGPT with MCP, API Routing Layer for MCP

  • MCP Servers Proliferate, Utility Debated: Members discussed the usefulness of existing MCP (Model Context Protocol) servers, with one suggesting this link to find servers from sources like Elasticsearch, Kagi, and Redis.
    • A member stated that most of the mcp servers out there are useless.
  • The Future of Paid MCP Servers: The potential for paid MCP servers was raised, with interest in finding a proof-of-concept server with a real, paying user base.
    • One member asked, does anyone know of a paid mcp server with a real paying user base?
  • Tracking Usage of Created MCP Servers: Discussion arose around tracking the successful usage of created MCP servers, including monitoring user behavior, competitive analysis, production issues, and request types.
    • One member asks, do you see a heavy desire to track how people are using your MCP servers, how your MCP servers compare to competitors, if theres any production issues, or even what type of requests are being fed into the MCP server as well.
  • Influence public ChatGPT via MCP server: A member is exploring influencing how public ChatGPT pulls structured product data from an MCP server instead of relying on embedded json-ld or structured markup on e-commerce websites.
    • They ask, Is there any way to influence public facing LLM’s to prefer fetching from a defined MCP (Model Context Protocol) server over crawling structured data from a site?
  • API Routing Layer to aggregate multiple MCP Servers: A member inquired about an API routing layer that could connect to multiple MCP servers, potentially using NLP to determine the most relevant servers to query.
    • They ask, does anyone know of an API Routing layer yet? Something I can set up as one “MCP connection” that then branches off to dozens of MCP servers?

MCP (Glama) ▷ #showcase (9 messagesđŸ”„):

AI Agents with MCP Early Release, Framework Recommendations for End-to-End Agents, Tree Sitter MCP Rewrite in TypeScript, Typed MCP SDK Types & Zod Schema

  • AI Agents Book Enters Early Release: A member announced the early release of their book, AI Agents with MCP.
    • Another member responded with a Hell yeah!
  • Seek Frameworks for End-to-End Agent Orchestration: Following the book announcement, a member asked for framework recommendations for end-to-end agents.
    • Another member responded that they didn’t fully hate Langgraph, but they haven’t fully explored others yet. They also mentioned liking Letta for its memory features.
  • TypeScript rewrite boosts Tree-Sitter-MCP: A member announced they rewrote the tree sitter mcp in TypeScript and shared the npm package link.
    • The member also welcomed contributions to the project, noting several tools to add.
  • Typed MCP SDK Types emerge: A member shared a link to properly typed MCP SDK types & zod schema.
    • Another member thanked them and indicated they might have some questions.

tinygrad (George Hotz) ▷ #general (24 messagesđŸ”„):

Tinygrad's Edge, Halide vs MLIR vs TVM, Exo-lang alternative

  • Tinygrad Aims for Hardware Agnostic Code Generation: George Hotz elaborated on Tinygrad’s aim to produce fine-tuned code via UOP graph optimizations in a hardware agnostic way, as discussed in the recent meeting.
    • The goal is interoperability with different hardware beyond CUDA, focusing on commoditizing the petaflop.
  • Halide Framework Inspires Tinygrad’s Direction: Tinygrad aims to emulate the Halide paper’s framework for ML, bypassing CUDA for speed and hardware interoperability.
    • George Hotz believes that Halide has clearer concepts than MLIR and TVM, advocating for a single, fast, and correct way to perform computations across all dtypes and backends.
  • Exo-lang Proposes Alternative to Halide and TVM: A member highlighted Exo-lang, which compares themselves VERY favorably over halide, as an alternative to Halide, linking to their GitHub and arXiv paper.
    • George noted that Tinygrad will take on a similar compilation process but also voiced concerns about the number of primitive ops and the use of string finds in code.
  • python3 -m tinygrad.apps.llm Ready for Testing!: George Hotz announced that python3 -m tinygrad.apps.llm is merged and ready for testing, encouraging users to try it out and report bugs for an upcoming release.
    • One user encountered a RuntimeError: token not found when using the LLM.

aider (Paul Gauthier) ▷ #general (22 messagesđŸ”„):

AI Model Benchmarking Realities, Affordable and Fast Models for Aider, Aider Dataset for Training, Claude Code's Hooks

  • AI Model Benchmarking is Gaming the System: Labs are ‘gaming’ benchmarks, so focus on benchmarks you can’t effectively cheat on, according to one member.
    • They added that cheating leads to improved models and improving benchmarks drives progress; variety in benchmarks is also crucial.
  • Aider’s Bot is Hooked on Claude: A member is using Claude code’s hooks to have aider automatically check edits made by Claude code.
    • The bot also provides feedback back to Claude code and properly keeps track of any open issues.
  • Devstral is Suggested Model for Aider: A user suggested Devstral as a fast, reliable, and affordable model for Aider to automatically check edits made by Claude code.
  • Aider Dataset Ready for Training: A member created an aider dataset for training (located here: https://raw.githubusercontent.com/supastishn/synthetic-data-generator/refs/heads/master/conversations.json).
    • The dataset is being updated with more examples every day and currently generates ~90 examples.

aider (Paul Gauthier) ▷ #questions-and-tips (2 messages):

git subrepos, aider limitations, Hugo websites, git submodules, vendor sub repository

  • Subrepos give Aider problems: A member wants to use git subrepos for a Hugo website, where the website copy is in the mother repo and the theme is in a subrepo, but it seems that aider is limited to the mother repo and will ignore the subrepo.
    • The member notes that this is unfortunate because they often want to make changes that lead to changes in both repos (e.g., adding a new attribute and then using the new attribute in the theme).
  • Git Submodules, too hard for humans?: A member responded that git submodules are hard for humans too.
    • They suggested that maybe one could vendor the sub repository instead of using it as a submodule.

Cohere ▷ #đŸ§”-general-thread (7 messages):

AI safety program, ML Understanding, Cohere Labs server, Open Science Initiative Application

  • Cohere Labs Awaits AI Safety Seekers: A user inquired about the AI safety program and channels like ML Understanding, and a member clarified that these are on the dedicated Cohere Labs server.
    • They pointed to this link for a guide on how to join the group.
  • Navigating the Open Science Application Maze: A user expressed confusion about joining the Open Science Initiative, noting the absence of a direct “join us” button after clicking the provided link.
    • They reported filling out a form and awaiting further instructions or an email confirmation.
  • Cohere’s Open Science Initiative – A Community Invitation: A member shared the direct link to the application page for the Open Science Initiative: Cohere Open Science Initiative.
    • The Cohere Labs team reviews each application, and invites are sent via email; applicants are thanked for their patience due to the “incredible volume of requests.”

Cohere ▷ #🔌-api-discussions (8 messagesđŸ”„):

Embed v4, Image tokens, Negative feedback

  • Embed v4 supports text and images: Embed v4 supports both text and image search queries, and the API detects the content type based on the presence of image_url in the content field.
    • If the content includes an image_url type, you get billed for image tokens instead of text tokens.
  • Image token billing with embed v4: A user inquired about being billed for image tokens during the Embed v4 trial, and it was confirmed that this occurs when image URLs are present in the content field.
    • The image token billing occurs because the API detects an image_url type in the content field.
  • Embed v4 struggles with negatives: A user provided feedback that Embed v4 doesn’t perform well with negative prompts, such as “apple without leaf.”
    • They noted that the results for “apple without leaf” are about the same as “apple with leaf”, but overall Embed v4 works brilliantly.

Cohere ▷ #👋-introduce-yourself (4 messages):

AI Consultant Introduction, ML Learning Path Guidance

  • AI Consultant Joins Cohere Community: An AI consultant with a background as CTO at OMNO AI and Adlytic AI introduced himself, specializing in real-time computer vision systems, RAG pipelines, and fine-tuning LLMs.
    • He has 11 research publications and is actively seeking collaborations on Generative AI research within the community.
  • Graduate Seeks Expert Guidance on ML Learning Path: A software engineering graduate from Pakistan, currently learning machine learning, seeks guidance on which field to focus on first: computer vision, robotics, or deep learning frameworks.
    • He has experience in data analysis and preprocessing but is now training models and looking for expert advice.

DSPy ▷ #general (17 messagesđŸ”„):

DSPy 3.0, Data and AI summit, Fast Inverse Square Root origin, SIMBA

  • DSPy 3.0 DAIS talk is out!: A member shared that their DAIS talk on DSPy 3.0 (beta) from mid June is now on YouTube.
    • This was anticipated since they emailed them this morning asking if the videos could go public and they responded saying in one week.
  • Fast Inverse Square Root Function Exposed: A member revealed they weren’t sure where the fast inverse square root comes from, which made for a nice example.
    • Another member suggested they should find a better one since this seems to have been part of a reuseable library and isn’t the one-off hack they though it was.
  • SIMBA Metric Praised: A member noted the feedack in the metric for SIMBA looks like a killer feature.
  • Data and AI Summit DSPy Videos: A member provided a list of all the dspy videos from the Data and AI summit:

DSPy ▷ #examples (1 messages):

hammer_mt: I think just poking around signature.py


Nous Research AI ▷ #general (5 messages):

Node Shifting, Result Interpretation, Reverse Commonality

  • Nodes get Shifted for Quality: A user suggested shifting a ‘low quality’ node and replacing it with a ‘highly unique and contextualized’ node.
    • They proposed that no test changes would be necessary, only adjustments to the result interpretation, implying the nodes have reverse commonality.
  • Wizard Greets Friends: A member introduced themself as a wizard and greeted everyone.
    • They wished everyone to compute and enjoy this day.

Nous Research AI ▷ #ask-about-llms (5 messages):

Temperature and Token Count, r1-0528 weak influence

  • Token Count Influenced by Temperature?: A member stated that the higher the temperature, the longer the replies will be, suggesting that temperature influences token count.
    • Another member agreed, finding it generally true anecdotally just from using various AI services.
  • r1-0528 Has Weak Influence: A member shared that after a quick test, there seems to be only a weak influence for r1-0528 when minimizing tokens.
    • The numbers in the prompt indicate relative completion tokens within a prompt according to the member.

Nous Research AI ▷ #research-papers (3 messages):

Arxiv Papers, Image analysis, PDFs

  • Users share Arxiv paper links: A member shared a link to an Arxiv paper along with an attached image.
    • Another member simply responded “Interesting” and shared another Arxiv PDF.
  • Image analysis of attached image: A user shared an image and an AI bot called Image Analysis seems to have processed the image.
    • The bot’s description is hidden in the message.

Nous Research AI ▷ #research-papers (3 messages):

Image Analysis, Arxiv papers

  • Member Shares Arxiv Paper: A member shared an Arxiv paper with an attached image, initiating a discussion.
  • Interesting Arxiv Paper Shared: Another member responded by calling it interesting and sharing another Arxiv paper.

LlamaIndex ▷ #announcements (1 messages):

MCP Office Hours

  • MCP Office Hours event commencing soon!: Office hours to chat about everything MCP starts in about 10 minutes here.
  • Office Hours: Office hours are starting soon.

LlamaIndex ▷ #blog (2 messages):

LlamaCloud MCP Servers, Agent Workflows as MCP, MCP Tools

  • Office Hours to Cover LlamaCloud MCP Servers: The next office hours hangout will be all about MCP, with a rundown of LlamaCloud MCP servers.
  • Using Agent Workflows with Existing MCP Tools: The office hours will cover how to use existing MCP tools with Agent Workflows, and serving agent workflows as MCP.
    • Specifically it will discuss using extract agents as MCP tools and querying any index in LlamaCloud as MCP via this video.

LlamaIndex ▷ #general (10 messagesđŸ”„):

LlamaParse Text Field Removal, Django Prompt Management, Haystack Prompt Implementation, Langfuse API for Prompt Metadata

  • LlamaParse Text Field Purge Pondered: A user inquired if there was a way to configure LlamaParse to exclude the text field from the generated JSON output to simplify their workflow, but another member replied that it is not an option.
    • They suggested a workaround to remove it once you have the JSON.
  • Django Prompt Dilemma Deciphered: A user with a Django project containing over 20 prompts expressed difficulty in managing them due to the prompts being stored in a simple dictionary format.
    • They sought advice on the best way to store additional metadata such as inputs, expected outputs, descriptions, and design decisions alongside each prompt.
  • Haystack’s Historical Prompt Handling Hints: A member reminisced about implementing a similar prompt management system for Haystack and expressed enthusiasm for a prompt library for LlamaIndex.
    • The member suggested expanding the current dictionary of prompts into a dict of dicts to attach metadata to each prompt.
  • Langfuse API for Prompt Metadata Launches: A user suggested using Langfuse, which provides a prompt management feature with the ability to fetch prompts and their metadata via the Langfuse API.
    • The member noted that Langfuse can be either cloud-hosted or self-hosted (open source).

Torchtune ▷ #papers (5 messages):

MoE training, Linear scaling drawbacks

  • MoE Training Technique Evaluated: A member shared a link to fengyao.notion.site describing techniques and results for MoE training.
    • They wondered if it’s possible to get similar results for cheaper, without needing a dense fwd pass.
  • Linear scaling tradeoffs: A member said that sequence modeling is not good for LLMs due to the linear scaling drawbacks.
    • They added they don’t like the selective scan performance, but did not give any links or further info.

Manus.im Discord ▷ #general (5 messages):

Discord Bot, Manus Access

  • Clarify Manus’ Discord Bot Nature: A user inquired about adding Manus to their Discord server, leading to the clarification that it’s specifically their Discord bot and not available for external addition.
    • Further explaining, it was noted that users might assume they can control Manus through the bot, which isn’t possible, as it’s neither controllable nor externally invitable.
  • Manus’ Limited Control: Members clarified the bot’s limited functionality, emphasizing that users cannot control Manus through the Discord bot.
    • The bot is not designed to be invitable or offer direct control over Manus.

Nomic.ai (GPT4All) ▷ #general (2 messages):

English Language, tenor.com

  • English Language Request: A member requested English from another member.
    • It is implied that the original messages may have been in another language.
  • tenor.com GIF Shared: A member shared a GIF from tenor.com.
    • The GIF is of gohine hohino no hohineee.