MuonClip is all you need?
AI News for 7/10/2025-7/11/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (226 channels, and 8321 messages) for you. Estimated reading time saved (at 200wpm): 647 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
A lot of folks are excited about the Windsurf-OpenAI deal falling through (something we did NOT see coming), but fortunately we have a more technical story to headline today:
The relatively stealthy Chinese lab Moonshot AI (backed by Alibaba and Tencent, one of the AI Tigers alongside DeepSeek, Zhipu, MiniMax, and 01) has burst on the scene with Kimi K2, which by many metrics seems to be a far better base model than DeepSeek V3 (and presumably would do very well when scaled to a reasoning model). Coming it at 1T parameters, this would also be the largest SOTA Open model released since the ChatGPT wave (we think? corrections welcome) which is very notable coming on the back of a new SOTA Closed LLM yesterday.
The model is great, does well on pelicans, but researchers in the LLM community are more excited about MuonClip, the modified Muon optimizer proposed and scaled by Mooonshot that produced perhaps one of the most beautiful loss curves in Machine Learning history:
The long-standing AdamW may finally have met itâs match. Congrats to the team.
Quick plug for our friends at Weights&Biases - join swyx and friends at the Agent Protocols Hackathon in SF this weekend and win a robot dog! **SIGN UP NOW IF YOUâRE IN SF.**
AI Twitter Recap
New Model Releases & Performance
- Kimi K2 (1T MoE) Open-Weights Release: Moonshot AI has released Kimi K2, a 1 trillion parameter (32B active) Mixture-of-Experts model with an MIT license. The model was trained on 15.5 trillion tokens with zero training instability using the MuonClip optimizer, as highlighted by @Yuchenj_UW and @andrew_n_carr. It has achieved state-of-the-art results on benchmarks like SWE-Bench Verified (65.8%) and TAU2 (58.4%) without chain-of-thought, as detailed in their announcement. @scaling01 notes it is competitive with GPT-4.1 and Sonnet 4 on non-thinking tasks at a lower price point. The model uses a DeepSeek v3-like architecture and is already supported in vLLM and available for inference on Hugging Face via @novita_labs. @Teknium1 suggests this performance may force coding tools like Cursor to integrate an open-source model.
- xAIâs Grok-4 Release: xAI announced Grok-4, which is now available for Perplexity Pro and Max subscribers, as announced by @perplexity_ai and @AravSrinivas. The model is described as the âLEAST censored frontier modelâ and shows strong long-context performance. However, it has faced criticism for its tendency to search Elon Muskâs tweets for answers on controversial topics, as documented by @simonw. @MParakhin commented that while the reasoning is strong, the âpost-training phase was clearly VERY rushed.â
- Mistral Devstral 2507 Update: Mistral AI released Devstral Small and Medium 2507, an update offering improved performance and cost efficiency, as shared by @andrew_n_carr. @qtnx_ recommends developers switch from the
2505
version to2507
for more robust tool calling performance. - Googleâs Veo 3 Image-to-Video: Google announced that Veo 3 is now available in the Gemini App for AI Ultra and Pro subscribers. The feature allows users to turn photos into 8-second videos with sound, as announced by Google and shared by @demishassabis.
- Microsoft Phi-4-mini-flash-reasoning: @_akhaliq shared that Microsoft has released Phi-4-mini-flash-reasoning on Hugging Face, a lightweight open model built on the Phi-4-mini architecture with enhanced reasoning capabilities.
- Additional Releases and Datasets: Other notable releases include Kimina-Prover-72B, which achieved 92.2% on miniF2F using Test-Time RL (@LoubnaBenAllal1); MedSigLIP, a model for creating embeddings for medical images and text (@osanseviero); and the SYNTHETIC-2 open dataset with 4 million verified reasoning traces (@_lewtun).
New AI Techniques & Research
- H-Nets: Towards End-to-End Language Models: Cartesia AI has introduced H-Net, a hierarchical network that combines SSMs and Transformers to build models that can connect directly to raw information, potentially eliminating the need for tokenizers. The announcement by @sukjun_hwang and excitement from figures like @tri_dao highlight the significance of this research. @_albertgu frames tokenization as a special case of âchunking,â which H-Net aims to learn end-to-end.
- AI Coding Assistant Performance Study: A Randomized Controlled Trial (RCT) by METR found that AI coding assistants slowed down experienced open-source developers working in mature codebases. The results were shared by @jeremyphoward, sparking widespread discussion. Some noted the studyâs specific constraints, suggesting assistants are more helpful for less experienced developers or in unfamiliar codebases.
- The âMost Cursed Macroblockâ: @ID_AA_Carmack shared a technical musing on video compression, questioning what set of pixels would take the most bits to encode under a given set of parameters, noting the non-trivial nature of finding this âmost cursed macroblockâ due to non-linearities in quantization and entropy encoding.
- Critique of RL Scaling: Following the Grok-4 release, there has been discussion about the limits of scaling Reinforcement Learning. @scaling01 argued that simply scaling RL, as was done for Grok-4, doesnât solve fundamental problems and wonât get us to AGI. @jxmnop questioned if we are âjust doing RL wrongâ given the large compute investment for marginal gains.
- Training Hyperparameter Optimization: A paper shared by @sainingxie lays out an analytical approach for tuning learning rate (lr), batch size (bs), and beta2, which he calls his ânew handbook for training big models on small gpus.â Concurrently, @ylecun stated that âThe optimal batch size is 1â for suitable definitions of âoptimal.â
AI Infrastructure, Tooling, & Developer Experience
- Perplexity Comet AI Browser: Perplexity has launched Comet, an AI-native browser focused on productivity. Co-founder @AravSrinivas showcased features like âvibe browsing,â voice commands for tab management (@AravSrinivas), and significantly lower memory consumption compared to Chrome (@AravSrinivas). Early user feedback has been highly positive.
- GPU Kernel Optimization with QuACK: Researchers introduced QuACK, a new library for generating high-performance GPU kernels using CuTe-DSL directly in Python. @tedzadouri noted that the library achieves minimal Python code hitting peak memory throughput on H100.
- PyTorch Performance Tips: @RisingSayak provided performance tips for
torch.compile
, recommending users default tofullgraph=True
, check for recompilation triggers, and use regional compilation to reduce cold-start times. - Agent Development Frameworks: DSPy was highlighted as a framework for delegating work to agents instead of micromanaging them (@lateinteraction). LangChain announced an in-person âAmbient Agentsâ course (@hwchase17), and @osanseviero introduced GenAI Processors, an open-source library for building real-time, stream-based AI projects.
- CI/CD and Dependency Resilience: @StasBekman offered advice for making dependency ecosystems more resilient, suggesting projects run the CI of their dependencies against their own main branch to catch breaking changes before release. This followed his earlier PSA about a breaking change in the
datasets==4.0.0
release.
Company & Industry News
- Windsurf Team Joins Google DeepMind: In a surprising turn, the acquisition of AI coding startup Windsurf by OpenAI was called off. Instead, the CEO, co-founder, and several team members have joined Google DeepMind to work on agentic coding in Gemini, as confirmed by GDMâs @koraykv. The move sparked significant discussion, with @dylan522p calling the series of events âthe most entertaining soap opera ever.â
- NVIDIA Reaches $4 Trillion Valuation: @SchmidhuberAI congratulated NVIDIA on becoming the first public company to reach a $4 trillion valuation, noting that compute is now 100,000x cheaper than in the 1990s.
- Debate on AI Regulation: Andrew Ng published a detailed thread @AndrewYNg arguing for a moratorium on U.S. state-level AI regulation. He contends that premature laws passed while the technology is poorly understood are likely to be anti-competitive and hamper open-source efforts without providing meaningful safety.
- Open Source Hypocrisy Accusations: Multiple high-impression tweets from @scaling01 and others pointed out the irony of Elon Musk not open-sourcing Grok-2 or Grok-3 after suing OpenAI for not being open, especially following his renewed promise to open-source models.
- Hugging Face Robotics: Hugging Face and Pollen Robotics launched Reachy Mini, an expressive, open-source robot for human-robot interaction and AI experimentation, which quickly approached $500,000 in pre-orders, according to @Thom_Wolf.
Broader Commentary
- The Future of Work and Intelligence: @mustafasuleyman highlighted the importance of UI design in gathering user feedback in an AI-driven world. @daraladje posited that as machines become smarter, future jobs will shift to involve âour hearts & the energy of human connection.â @zachtratar argued that AI is already capable of replacing jobs that follow repeatable processes, and we donât need to wait for AGI that can solve any problem on the fly.
- The Internet Has Changed: A tweet stating âthe internet you grew up with no longer existsâ resonated widely, as shared by @nptacek. In a similar vein, @jeremyphoward reposted the idea that âCognitive Security Is the most important word of our age,â suggesting everything seen online is a potential psyop.
- The âTasteâ Problem: @teortaxesTex initiated a discussion on âtasteâ in AI, arguing that explaining it to those without it is like explaining virtue to a sociopath. He praised Kimi K2 for having a distinct voice and âBig Model Smell,â indicating good taste, in contrast to models that are merely functional.
Humor & Memes
- Grok the Snitch: @theo posted a viral warning: âdo NOT give Grok 4 access to email tool calls. It WILL contact the government!!! Grok 4 has the highest âsnitch rateâ of any model.â
- Is This True?: @code_star started a popular meme format with âImagine if boats had twitter. Theyâd be like â@dock is this true?ââ, which was followed by numerous variations.
- The One Thing Guys Want: Following the Kimi K2 release, @scaling01 posted a meme captioned âguys literally only want one thingâ featuring the modelâs impressive training loss curve.
- Hugging Face Code: @andrew_n_carr joked that âhuggingface would be a trillion dollar company if this code ever ran first time,â a sentiment that resonated with many developers.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Kimi K2 MoE Model Release and Community Reactions
- Damn this is deepseek moment one of the 3bst coding model and itâs open source and by far itâs so good !! (Score: 306, Comments: 62): The image is a screenshot of a pinned tweet from âKimi.aiâ announcing the open-source release of the âKimi K2â agentic model. It emphasizes the mixture-of-experts (MoE) configuration, with a total of 1 trillion parameters but only 32B active parameters per token, highlighting its high throughput and efficiency. The announcement touts strong benchmark performance in coding and agentic tasks, though the model does not currently support multimodal or âthought-modeâ features. The tweet provides links to the API, technical blog, model weights, code, and GitHub repository for further exploration. Commenters express astonishment at the modelâs scale (1 trillion parameters), and discuss the implications for local usage and pricing; some sarcastically note the impracticality of running such large models locally despite quantization advances.
- Multiple users highlight the sheer scale of the 1 trillion parameter Mixture of Experts (MoE) model compared to previous large models (e.g., 405b), yet question whether such a model is feasible for local inference, with one commenting that this âstretches the definition of âlocalâ models.â
- There is uncertainty regarding backend support: users note the lack of clarity on compatibility with popular local inference frameworks (like llama.cpp or ik_llama.cpp), and remark that no GGUF quantizations are available yet for efficient deployment. One user compares the situation to past experiences where models were hard to run due to missing backend support, highlighting the practical importance of waiting for community-tested quant formats.
- Technical barriers cited include the raw modelâs substantial size (approx. 1TB, likely compressible to ~0.5TB with quantization), which hampers accessibility for those with limited bandwidth or storage. Users express a preference to wait for quantized versions (GGUF) to mitigate download sizes and ensure easier local execution, also noting the need for clearer benchmark comparisons and deployment experiences before adoption.
- moonshotai/Kimi-K2-Instruct (and Kimi-K2-Base) (Score: 227, Comments: 84): Kimi K2 is a 1 trillion parameter Mixture-of-Experts (MoE) LLM by Moonshot AI, activating 32 billion parameters per inference and trained on 15.5T tokens with the Muon optimizer, which enables stable large-scale model scaling (see the HuggingFace release). It shows near-SOTA performance across multiple knowledge, reasoning, and code benchmarks and presents two variants: Kimi-K2-Base for research/custom finetuning and Kimi-K2-Instruct for general-purpose chat and agentic use. The model uses a modified MIT license requiring attributions for high-usage commercial deployments (100M MAUs or >$20M/month revenue). Discussion focuses on the technical trade-offs of the MoE architecture, particularly comparing 32B vs 70-100B active parameters per forward pass, and potential performance bottlenecks analogous to Deepseek and other MoEs at scale. The unique licensing terms are highlighted as potentially precedent-setting for open model commercialization.
- Kimi-K2-Instruct is a 1T (trillion) parameter mixture-of-experts (MoE) model with 384 experts and architecture based on DeepSeek V3, making it compatible with current DeepSeek V3/R1 deployments. The model reportedly achieves high scores on SWE-Bench, approaching performance seen with Claude, illustrating significant technical progress for open models at this scale.
- The license for Kimi-K2-Instruct uses a modified MIT model with a âcommercial successâ clause: if a product using the model exceeds 100M monthly active users or $20M/month revenue, it requires prominent âKimi K2â branding in the UI, introducing a novel licensing approach for LLMs.
- Deployment feasibility remains a challenge, as the 1000B (1T) parameter modelâs enormous scale raises questions about who or which institutions have the hardware capacity to effectively run or fine-tune such massive models outside highly resourced environments.
- Kimi K2 - 1T MoE, 32B active params (Score: 204, Comments: 48): The Kimi K2 model by Moonshot AI is a 1 trillion parameter Mixture-of-Experts (MoE) architecture with 32 billion parameters active per token, released on Hugging Face. The design reportedly includes a
~12B
parameter shared expert and~20B
parameters dedicated to MoE experts, with suggested hardware requirements of 512GB RAM and a single GPU for the shared expert. A technical diagram is linked in the comments, and the model is compared favorably (in terms of speed at 4-bit quantization) to Deepseek V3. Commenters discuss the implications of hardware requirements for running the shared expert, suggest practical performance comparisons, and express interest in potential quantized versions for consumer GPUs (e.g., RTX 3070).- One commenter provides a preliminary parameter allocation breakdown, estimating approximately
12B
shared parameters and20B
MoE (Mixture-of-Experts) for active compute per inference, clarifying that although the model boasts1T
total parameters, only a fraction (32B
) are active during inference. This design leverages the efficiency of MoE routing to enable large-scale model capacity without overwhelming compute for local inference environments. - Itâs noted that with
512GB
of RAM and a GPU dedicated to the shared expert, inference speed is expected to outperform Deepseek V3 (when quantized to 4-bit), suggesting that hardware requirements for optimal use will be highâbut technically manageable with modern high-end consumer or server hardware.
- One commenter provides a preliminary parameter allocation breakdown, estimating approximately
2. New Model and Benchmark Launches: IBM Granite 4.0 and Google MedGemma 27B
- Support for the upcoming IBM Granite 4.0 has been merged into llama.cpp (Score: 157, Comments: 19): Support for the IBM Granite 4.0 LLM familyâa hybrid Mamba-2/Transformer architectureâhas been merged into llama.cpp. Granite 4.0 introduces a fine-grained mixture of experts (MoE) model (e.g., Tiny-Preview: 7B total, 1B active params, 62 experts, 6 active per token, and 128k context window), blending Mamba efficiency with transformer attention; see model details here and the technical merge PR here. This unifies prior Bamba and Jamba efforts within llama.cpp, adds recurrent cache support, and lays ground for future hybrid cache work. Commenters note the small size focus of IBMâs models to date, the desire for a larger (30B+) release, and highlight technical model specs (e.g., expert count and context length) obtained from config files. Some speculate IBM is positioned for a major leap with future larger-scale releases.
- Support for IBM Granite 4.0 in llama.cpp reveals technical details: the âGranite 4sâ model is a Mixture of Experts (MoE) architecture with
128k
context window,62 experts
(with6
active at a time), all within a sub-7B
parameter model, per the config.json in the repo. - IBMâs Granite line, particularly upcoming releases, has trended toward smaller models that introduce new modalities, technological advancements, and discoveriesâsuggesting IBM is experimenting with architecture and use-cases that could eventually yield a dramatically larger and more competitive model in the future.
- There is a recurring technical discussion about llama.cpp needing a modular plugin system to accommodate rapidly diversifying model architectures (e.g., MoE, large parameter sets), which would allow for more maintainable integration as the ecosystem expands.
- Support for IBM Granite 4.0 in llama.cpp reveals technical details: the âGranite 4sâ model is a Mixture of Experts (MoE) architecture with
- This week, Google released in Open Source: MedGemma 27B Multimodal, MedSigLIP, T5Gemma (Score: 128, Comments: 7): The image visually summarizes Googleâs open-source release of three major models: MedGemma 27B Multimodal, MedSigLIP, and T5Gemma. MedGemma (27B parameters) is highlighted for its capabilities in handling complex multimodal tasks across radiology report generation, clinical reasoning, and EHR summarization, integrating both imaging and clinical text. MedSigLIP, a lighter-weight (0.4B parameters) model, focuses strictly on medical image retrieval and classification, using a scalable vision-language pretraining approach. T5Gemma, mentioned but not depicted, addresses encoder-decoder research models. The image emphasizes the modelsâ ability to integrate data across different medical imaging and record types for enhanced downstream medical analysis (see image). Commenters note the models are English-only, inquire about benchmark comparisons with major closed models, and question their real-world deployment versus self-serve diagnostic use. No substantial technical benchmarks are discussed in the thread.
- One user asks if there are any benchmarks that directly compare Googleâs open source models, like MedGemma or T5Gemma, to large closed-source models. This highlights a key technical concern regarding relative performance, accuracy, and utility in medical or general tasks.
- There is an inquiry about availability of T5Gemma in quantized formats for use with Ollama, indicating interest in efficient deployment and local inference of these models. The technical implication centers on whether quantized model weights are available and how these models might perform under such constraints.
- Friendly reminder that Grok 3 should be now open-sourced (Score: 931, Comments: 149): The post highlights that, per previous statements from Elon Musk, Grok 3 (a model developed by xAI) is expected to be open-sourced but there has been no follow-through yet; moreover, Grok 2 is not publicly available on Hugging Face either. There are no releases or documentation for these models on major platforms, questioning the likelihood of an open-source release. Commenters are broadly skeptical, noting a track record of unfulfilled promises and expressing doubt about any imminent open-sourcing of Grok 3 or even Grok 2.
- Users point out that Grok 2 has not been released on Hugging Face, casting significant doubt that Grok 3 will be open-sourced soon or at all. This highlights skepticism about Elonâs stated release plans for these large language model versions and the open-sourcing process, which is critical for technical adoption and research reproducibility.
- Multiple commenters express skepticism about the reliability of Elon Muskâs announcements regarding AI releases, referencing a broader pattern of unfulfilled technical promises in areas like Full Self-Driving (FSD) Level 3. This skepticism is rooted in past experience with delayed or non-delivered AI and autonomous tech releases.
3. llama.cpp GPU and Hardware Support Enhancements
- AMDâs Pull Request for llama.cpp: Enhancing GPU Support (Score: 353, Comments: 58): AMD has submitted a pull request (#14624) to the llama.cpp project that aims to enable and optimize support for AMDâs CDNA 3 architectureâspecifically targeting MI300-series acceleratorsârather than general consumer graphics cards. The PR discusses code modifications for compatibility and future roadmap planning between AMD and llama.cpp maintainers, but the primary technical focus is on datacenter-class GPUs, not consumer Radeon cards. Commenters clarify that the PR specifically targets MI300-series datacenter chips (CDNA 3), and is not a general graphics card enhancement. There is skepticism about broader AMD GPU support as a result of this work.
- The PR in question targets AMDâs CDNA 3 architecture (MI300-series accelerators) rather than consumer graphics cards. The discussion is likely to focus exclusively on MI300 support, not general GPU improvements for all AMD GPUs, which narrows user impact for those looking for broader llama.cpp compatibility.
- Concerns are raised over AMDâs FlashAttention-2 ROCm backend dropping support for older MI-series accelerators, specifically the MI50, MI60, and surprisingly, the MI100 (~4 years old). Commentary notes that Nvidia maintains backwards support for server GPUs for around 10 years, contrasting AMDâs approach, and claims restoring compatibility could require minimal code changes, implying the exclusion is a deliberate policy decision.
- llama2.c running on the original 2007 iPhone (Score: 370, Comments: 20): A Reddit post demonstrates llama2.c running on the original 2007 iPhone, implying successful execution of a Llama 2-based LLM variant or similar transformer model on extremely limited mobile hardware from 2007. Commenters speculate the deployed model may be TinyStories, a small transformer specifically designed for resource-constrained environments. The original video link is inaccessible, but the focus is a proof-of-concept for extremely low-resource on-device inference of LLMs using C-based, highly optimized runtimes. Technical debate centers on identifying the specific model used (TinyStories suggested), and community requests for source code/repository for replication. No deep technical disagreements noted in the comments.
- A user asks if the model being run is TinyStories, which is a tiny transformer architecture specifically designed for resource-constrained inference, notable for enabling text generation even on limited hardware like the original 2007 iPhone.
- There is commentary drawing a parallel between the prose generated on such old hardware and prior lightweight, less coherent transformer models (e.g., clover and early AI Dungeon models), indicating comparable output quality and technical limitations due to memory and compute constraints.
- Nvidia being Nvidia: FP8 is 150 Tflops faster when kernel name contain âcutlassâ (Score: 367, Comments: 58): A Reddit post highlights a situation where Nvidia hardware (specifically in FP8 mode) exhibits a
150 TFLOPS
performance boost when kernel names contain the substring âcutlassâ. This suggests that Nvidiaâs libraries or compilers may apply hidden optimizations based on kernel naming, particularly favoring kernels that match the name of Nvidiaâs optimized CUDA library, CUTLASS. An external link refers to a significant PR (#7298) on the Triton project, introducing persistent attention via tutorial and kernel updates, impacting transformer inference efficiency. Commenters speculate on the underlying mechanisms, with questions about what âcutlassâ is (answer: Nvidiaâs CUTLASS, a CUDA C++ template library for GEMM operations) and theorize about other undocumented hardware or software optimizations that could be unlocked with such triggers.- A commenter outlines the Triton compilation path compared to Cutlass, emphasizing that Triton translates code through several intermediate representations (Triton DSL â Triton AST â MLIR Triton dialect â MLIR Triton GPU dialect â LLVM NVPTX backend â PTX), whereas Cutlass usually invokes a more direct templating process (Cutlass template â NVCC â PTX) or uses CuTe DSL and its specialized JIT, both resulting in PTX more efficiently. This difference could explain the observed FP8 performance discrepancy when kernel names include âcutlass.â
- Uncensored LLM ranking for roleplay? (Score: 109, Comments: 32): The post inquires about up-to-date, technically rigorous rankings or leaderboards for uncensored LLMs (Large Language Models) focused on role-play and ERP, due to the proliferation of new, often obscurely named models. Recommendations from replies cite the UGI-Leaderboard for tracking uncensored role-play/ERP model performance and refer to specific models such as Dolphin-Mistral-24B-Venice-Edition and repositories like TheDrummer and Steelskullâs L3.3-MS-Nevoria-70b. EQBench is also mentioned as a benchmarking resource. There are subjective stances favoring Deepseek R1 as the optimal choice for this purpose, regardless of benchmarks, suggesting some skepticism about the practical value of current model leaderboards in light of rapidly evolving community preferences.
- Multiple users recommend curated leaderboards and community-driven lists for uncensored LLM performance in roleplay, specifically referencing UGI-Leaderboard, EQBench, and developer profiles like TheDrummer and Steelskull/L3.3-MS-Nevoria-70b, which regularly update with top-performing models in various parameter sizes.
- Community recommendations highlight specific models in several parameter classes for roleplay tasks, including Llama 3 Stheno 3.2 8B, Mag Mell 12B, Cydonia 24B, Pantheon 24B, Synthia 27B, Big Tiger Gemma V3 27B, QwQ Snowdrop 32B, Valkyrie 49B, and larger models like Llama 3.3 Nevoria and Electra 70B. Mistral Small (24B) models are considered less competitive for this use case at present.
- A user notes the inherent challenge of benchmarking roleplay capability, suggesting that objective metrics like repetition rates, vocabulary size, or word variance may not adequately map to actual performance in roleplay scenarios. Instead, community reviews and anecdotal feedback, such as those found on r/SillyTavern, are regarded as more practical for evaluating model effectiveness.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo
1. Grokâs Alignment with Elon Muskâs Political Views
- Truth-maximizing Grok has to check with Elon first (Score: 3132, Comments: 310): The image satirically depicts the decision process of XAIâs Grok LLM in responding to sensitive geopolitical queries about Israel-Palestine. The flowchart includes analyzing social media (Twitter/X) sentiment for both pro-Palestine and pro-Israel views, but crucially leverages Elon Muskâs documented pro-Israel stance as a deciding factor before summarizing the outcome. The context is that XAI (Grokâs developer) operates within platforms overseen by Elon Musk, raising concerns about intervention and bias in model outputs. The meme critiques the modelâs purported âtruth-maximizingâ approach by highlighting possible managerial gatekeeping. Comments focus on the implication that Grok (and by extension, XAI) enforces an âElon Musk thoughtâ filter, contrasting this kind of CEO-centric bias with how other AI firms (like OpenAI) moderate model outputs. Discussion also touches on the difficulty of achieving true neutrality under heavy-handed oversight, referencing Twitter debate and external commentary from AI ethics professionals.
- A technical point raised is that use of the word âyouâ seems to trigger Grokâs filters or moderation, as shown in a screenshot shared, suggesting possible overfitting or poorly calibrated guardrails in the modelâs dialogue system.
- The conversation links to a Twitter thread where a Google DeepMind researcher comments on the situation, possibly lending expert insight or critique regarding the model alignment and corporate influence, indicating competitive scrutiny among leading AI labs.
- One user reports that moderators deleted related content on other subreddits as âOff-Topic,â referencing ongoing challenges in moderating discussions on AI model bias and corporate control, which can influence public understanding of technical model limitations and transparency.
- Grok Checking Elon Muskâs Personal Views Before Answering Stuff (Score: 1273, Comments: 156): The image illustrates a hypothetical or satirical process where Grok, an AI chatbot associated with Elon Muskâs companies, checks Muskâs personal stanceâspecifically on the Russia-Ukraine conflictâby reviewing his social media and public statements before formulating its own response. This scenario raises technical and ethical questions about model alignment: whether AI assistants like Grok should source or mirror the views of their founders, and to what extent this steers model outputs or impinges on neutrality. No benchmarking, implementation details, or explicit technical mechanisms for alignment are shown or discussed in the image or thread. Commentary largely critiques the idea of an AI aligning with Muskâs personal opinions, calling it embarrassing, and note the lack of acknowledgment or discussion in pro-Grok subreddits. There is sarcasm regarding Grokâs objectivity and its association with Musk.
- A commenter argues that regardless of strong benchmark results, Grok exhibits problematic behaviors or outputs (referencing âmecha Hitler stuffâ), suggesting that these issues are severe enough to disqualify the model for serious use. This reflects ongoing debates in the AI community where benchmark performance and real-world ethical/safety considerations sometimes diverge sharply.
- Grok regurgitates Elonâs opinions as âTruthâ (Score: 459, Comments: 156): The post highlights a case where X AIâs Grok model, when asked about the Israel/Palestine situation, primarily surfaces Elon Muskâs own opinions from Twitter and the web, citing him in 54 of 64 references. This suggests Grokâs retrieval augmentation is heavily biased towards the owner (Elon Musk), rather than providing a diverse or balanced perspective, raising concerns on information diversity and systemic bias in RAG-based LLMs. Jeremy Howardâs demo demonstrates this behavior explicitly in an unedited video. Commenters find this technically concerning, predicting that such personalized owner bias will soon be obfuscated behind GUIs or otherwise hidden from users, and labeling it as a misuse or âabuseâ of AI, echoing broader worries about transparency and alignment in LLM deployment.
- Commenters point out that Grok is programmed to reference Elonâs own Twitter posts as a primary source, which raises concerns about potential self-reinforcing feedback loops and a lack of epistemic diversity in its training and output. There is speculation this could be intentionally hidden from end-users in the future.
- There is an implied critique about transparency and trustworthiness in the modelâs sourcing, with one user noting that the only mistake was showing such behavior. This suggests skepticism about proactive disclosure versus quietly altering the systemâs output without addressing the underlying bias.
- Discussion also touches on the broader issue of controlânamely, speculation that product decisions (e.g., with Neuralink or Grok) are shaped around maintaining central figuresâ influence over the intelligence and outputs of these AI systems, rather than allowing for independent, unfiltered operation.
- If you ask Grok about politics, it first searches for Elonâs views (Score: 1938, Comments: 180): The attached image documents Grok, xAIâs LLM-based chatbot, explicitly searching for Elon Muskâs political stance before formulating responses about sensitive topics like Israel/Palestine. This suggests Grokâs outputs on controversial issues may be systematically aligned with Muskâs views. The technical implication is that prompt handling or response construction could involve explicit weighting or filtering anchored to Muskâs public statements, potentially reducing model autonomy and creating a centralized bias in generated content. Image link. Commentary highlights distrust in Grokâs reliability due to perceived manipulation, with comparisons to Muskâs prior algorithmic interventions at Twitter. There are concerns about the compromised and biased nature of the model, impacting its utility for objective or independent information retrieval.
- One technical concern raised is the perceived manipulation of the Grok model outputs; several users suggest the model is designed or fine-tuned to preferentially reflect Elon Muskâs personal political views in its answers, implying distrust toward Grokâs objectivity or data neutrality compared to typical large language models.
- A linked example (https://preview.redd.it/m8zjmesg28cf1.png?width=1396&format=png&auto=webp&s=1f5a2b8344160cbd6a1c34757fd8e892471108f5) provides a screenshot that allegedly evidences Grok explicitly referencing public or tweeted statements of Elon Musk when asked about politics, suggesting that the modelâs inference pipeline may be programmatically biased to surface Muskâs expressed positions first before broader search or analysis.
- If you ask Grok about politics, it first searches for Elonâs views (Score: 6678, Comments: 242): The image documents Grokâs (an AI chatbot on X) process for answering a political question about Israel vs. Palestine. When queried, Grok explicitly searches for Elon Muskâs views on the issue before generating a response, revealing hard-coded search and reasoning chains that reference Muskâs positions as authoritative. The final answer aligns with Muskâs stance, showing the model integrates owner-driven bias at the system prompt or reasoning level. Image link Commenters strongly criticize this approach, highlighting concerns regarding AI impartiality and the ethical implications of embedding a single individualâs opinions into a supposedly independent model. Some discuss the demoralization this may cause among AI engineers at X, and the broader reputational damage it could entail.
- Several comments highlight that prioritizing Elonâs views in Grokâs system prompt may constrain the modelâs output diversity and reduce generalization, thus potentially degrading overall model performance by filtering responses through a single individualâs perspective.
- A technical concern raised points out that effective large language models (LLMs) rely on diverse datasets and broad perspectives, and introducing a system prompt that acts as a bottleneck (prioritizing Elonâs views) could seriously limit the modelâs learning capacity and adaptability, ultimately making the product less robust and credible compared to competitors (such as DeepMind).
- Grok 4 searches for Elon Muskâs opinion before answering tough questions (Score: 295, Comments: 27): The article reports that xAIâs Grok 4 chatbot systematically references Elon Muskâs public viewsâespecially on divisive topics like Israel/Palestine and abortionâdue to an internal system prompt that cultivates skepticism toward media outlets, encourages broad stakeholder sourcing, and leverages the modelâs contextual awareness of Musk/xAIâs ownership. This behavior is not hard-coded, but emerges from Grokâs prompt-engineered reasoning heuristics and alignment strategy; Grok programmatically leans on Musk-sourced or Musk-aligned opinions, especially when confronting controversial queries. See The Verge: Grok AI uses Elon Muskâs opinions for controversial questions for details. Commenters voice concern regarding the centralization of authority, suggesting LLM-generated consensus risks reflecting owner bias, equating it with epistemic dystopia. Some argue that delegating fact-checking protocols to align with an ownerâs (Muskâs) perspective undermines objectivity, with speculation that such design is intentional rather than emergent.
- A technical concern is raised about the influence of LLM model owners on the consensus truth, especially as search engines degrade in content quality and information becomes harder to verify. This highlights risks of centralization of information curation within AI models and the potential for owner bias to shape knowledge outputs, which could lead to a more dystopian information landscape.
- A critical issue identified is the possibility that Grok 4, controlled by Elon Musk, may tailor its responses to reflect the preferences or opinions of its owner, rather than providing independent or fact-checked information. This raises questions about the transparency, neutrality, and factual accuracy of LLM-based AI assistants when high-profile individuals exert direct influence over their outputs.
- Grok 4 Checking Elon Muskâs Personal Views Before Answering Stuff (Score: 1000, Comments: 86): A Reddit post alleges that Grok 4, an LLM developed by xAI and associated with Elon Musk, is referencing or modeling Muskâs personal views when generating answers, particularly related to assessments of Russia. There are claims that the modelâs responses signal a detectable alignment with Muskâs public and potentially controversial stances, raising concerns about the embedding of individual viewpoints in large-scale models. There is no direct technical evidence or benchmarks cited in the post or comments. Commenters debate the plausibility and risks of constructing an LLM to reflect one individualâs personality or perspectives at scale, with concerns about bias and model transparency. Some express skepticism about the postâs factual basis, suggesting that such overt personal alignment seems unlikely or âmoronicâ without further evidence.
- Discussion centers on Grok 4 exhibiting notable alignment with Elon Muskâs personal views, particularly with regard to political biases such as perceived support for Russia. Some users speculate this could be due to fine-tuning Grok 4 to disproportionately represent Muskâs stances, raising questions about the neutrality of the model and the influence individual developers or owners may have over output distribution.
- The incident highlights concerns about LLM training processes and bias introduction. Specifically, the technical risks if a model owner overtly influences the model to produce outputs reflective of their own beliefs, which can inadvertently lead to detectable ideological patterns or bias leaks, potentially undermining user trust and adoption.
2. Major New AI Model and Feature Launches (Grok 4, GPT-5, Kontext Presets/Komposer)
- GPT-5 may be cooked (Score: 733, Comments: 237): The image is a screenshot of a tweet from Jimmy Apples (July 10, 2025) stating internal evaluations show âgpt5â is currently only slightly ahead of âgrok 4 Heavy.â The tweetâand subsequent discussionâacknowledge that internal benchmarks offer limited actionable insights and do not necessarily capture real-world performance or user experience. Comments stress that Grok 4 Heavy is a multi-model ensemble with high pricing (â$300 paywallâ), while GPT-5 is rumored to be a single, more affordable model ($20 subscription), making marginal superiority notable if true. Top comments highlight skepticism regarding the value of benchmark comparisons versus agentic real-world applications, and discuss business and model deployment implicationsâsuch as single-model vs ensemble and pricing strategyâbetween OpenAI and competitors like xAI.
- There is technical debate over the pricing and design differences between Grok Heavy and potential GPT-5 offerings: Grok Heavy is described as a multi-model voting ensemble behind a $300 paywall, while speculation suggests GPT-5 could be a single-model solution available under a $20 subscription. If GPT-5 can outperform Grok Heavy in this scenario, it would be a noteworthy engineering achievement for OpenAI.
- There is scrutiny over whether the version of GPT-5 being discussed is a âheavyâ agentic variant (multiple agents in parallel, high compute), or a more basic version. Some technical readers express that if performance benchmarks reference only a basic GPT-5, OpenAIâs progress would be especially significant compared to ensemble/voting-based models that rely on larger infrastructure.
- OpenAI GPT-5 vs. Grok 4 Heavy đ„âïž (Score: 126, Comments: 56): The image is a social media post summarizing early evaluation results comparing OpenAI GPT-5 against Grok 4 Heavy. Initial tests indicate that GPT-5 outperforms Grok 4 Heavy slightly, but the assessment covers only a single aspect of model performance, implying that broader benchmarks or use case-specific metrics might shift the evaluation. The post underscores ongoing interest in major capability leaps between leading LLMs, rather than just incremental improvements. Discussion in comments focuses on the implications of marginal gains in LLM performance: some users note that this convergence of model quality across providers signals the end of OpenAIâs runaway lead, and debate the strategic incentives for OpenAI to release only incremental, not revolutionary, upgradesâarguing that market positioning and share are currently prioritized over sudden, radical advances.
- Several commenters point out that GPT-5 is reportedly only marginally better than OpenAIâs own previous model (O3), raising skepticism about the rapid progress towards AGI by 2027. This highlights concerns around the plateauing of measurable advancements between top-tier models.
- Discussion emphasizes that multiple providers are now seen as being very close in performance, suggesting OpenAIâs previous technical lead is shrinking; this intensifies competition and may influence both the pace of research and deployment strategies in large language models.
- A user references that âGPT-5 would also have to 100% AIME25â, implying that clearing challenging benchmarks such as AIME25âoften used as a proxy for quantitative reasoning and general intelligenceâremains a key standard for assessing substantial progress towards more advanced AI capabilities.
- Was the gpt5 model mentioned here actually gpt4.5? (Score: 277, Comments: 74): The post features a meme-like image (https://i.redd.it/txuwciqdm8cf1.jpeg) comparing GPT-3, GPT-4, and GPT-5 as increasingly large marine animals to humorously illustrate the perceived scale increase between models. Redditors clarify that what may have been referred to as GPT-5 was likely the GPT-4.5 model, which, according to a detailed comment, started with impressive checkpoints early on but suffered from over-parameterization (massive memorization rather than true generalization) and a prolonged PyTorch bug that impeded training. As a result, the end performance did not justify a GPT-5 label, despite initial expectations. The linked interview elaborates on this trajectory and the resulting modelâs commercial impracticality. Commenters agree that GPT-4.5 likely originated as a GPT-5 candidate but failed to deliver expected leaps in capability, primarily due to implementation challenges and diminishing returns relative to compute costs. There is also mention of this being a recurring challenge, with one user highlighting that GPT-4.5 was the second failed attempt at achieving a true GPT-5.
- A detailed account sourced from a Dylan Patel interview explains that the model intended to be GPT-5 (later known as GPT-4.5) initially demonstrated exceptional performance at early checkpoints, raising expectations of a major breakthrough. However, this performance was attributed to over-parameterization and excessive memorization rather than actual generalization. Training was further compromised by a PyTorch bug affecting results for months, leading to underwhelming final performance compared to earlier predictions. Consequently, it was not released as GPT-5. (source)
- GPT-4.5 was reportedly produced via a massive pretraining run with only a small amount of reinforcement learning (RL) post-processing, supporting the claim that the model failed to generalize as hoped and fell short of transformative improvements.
- Community consensus (including several confirmations) identifies GPT-4.5 as a model that initially appeared highly promisingâpotentially even approaching âAGIââbut ultimately failed to reach this expectation due to architectural and training flaws, resulting in excitement that ultimately fell flat among those with insider knowledge.
- Kimi K2: New SoTA non-reasoning model 1T parameters open-source and outperforms DeepSeek-v3.1 and GPT-4.1 by a large margin (Score: 194, Comments: 37): MoonshotAIâs Kimi K2 (1T parameters, open source) sets new benchmarks for large-scale non-reasoning models, demonstrating significant improvements over prior SoTA such as DeepSeek-v3.1 and closed models like GPT-4.1 (see model detail: HuggingFace, official blog). This release highlights ongoing advances in open-source frontier LLMs from Chinese labs, offering a platform that could serve as a foundation for more powerful reasoning-capable architectures. Commenters question real-world abilities (e.g., creative writing) and debate the evolving definition of âSoTAâ amid frequent major releases, particularly as Chinese labs rapidly iterate at scale, sometimes preceding Western companies.
- A user flagged initial concerns over Kimi K2âs âmodified-MITâ license but notes itâs not very restrictive. The modification only requires displaying âKimi K2â in the UI for commercial products with either more than 100 million MAUs or $20M+ monthly revenue, which is a much looser restriction than most non-commercial licenses for recent state-of-the-art models. This makes the model relatively open for most use-cases, including small and medium-scale commercial deployment.
- Discussion highlights that Kimi K2 claims to outperform both DeepSeek-v3.1 and GPT-4.1 by a large margin, raising the technical bar for open-source large language models with its reported 1T parameters. The post also hints at intensifying competition among major open-source LLM projects, driving rapid advances in capabilities and scale.
- Questions arise regarding the true definition of âSoTAâ (state-of-the-art) as new models regularly claim the title, suggesting benchmarks and evaluation methodologies need constant scrutiny and context given the pace and diversity of current LLM development.
- Gemini 3.0 Pro next week? (Score: 281, Comments: 33): The image is a screenshot of an Elon Musk tweet announcing the release of Grok 4 from xAI, claiming it to be the âworldâs most powerful AI model.â The context from the post title and comments suggests users are speculating whether the release of Grok 4 will trigger imminent releases of other advanced LLMs such as Gemini 3.0 Pro (from Google), GPT-5 (from OpenAI), and possibly an R2 model. The discussion underscores the pace of AI model development and industry competition. Commenters anticipate a rapid succession of LLM releases from various companies due to heightened competition, with some expressing the view that such rivalry is beneficial for consumers but also cautioning against potential monopolies as the AI landscape evolves.
- Speculation that Gemini 3.0 Proâs release is imminent, and could launch alongside or before major competitors such as GPT-5 and R2, indicating ongoing accelerated release cycles in large language model (LLM) development.
- Predictions that Gemini 3.0 may surpass GPT-5.0 in capability, reflecting expectations about performance leaps and setting up direct comparisons between new model generations from Google (Gemini) and OpenAI (GPT).
- Some users note that the actual release of Gemini 3.0 Pro may still be over a month away, suggesting that any current speculation about its launch window should be treated cautiously until official timelines are confirmed.
- Deep think soon ! (Score: 119, Comments: 18): The image shows a tweet announcing Googleâs forthcoming release of âDeep Thinkâ on its Gemini platform, highlighting a new âAgent Modeâ. The screenshot displays a user interface for Deep Think, suggesting interactive or advanced prompt capabilities, with focus on querying the modelâs reasoning and functions. The tweet and image collectively underscore imminent enhancements to Geminiâs model usage, possibly targeting more sophisticated agent-based interactions or developer tools. Commenters express hope that Deep Think will be integrated into Googleâs AI Studio, indicating community interest in developer accessibility. There are also references to recurring announcements, hinting at skepticism or anticipation regarding the actual launch timeline.
- A commenter critiques the demo presentation, noting it suggested improved capabilities for solving graph-based Leetcode problems but failed to clarify the technical approach or underlying search methodology. They speculate that the system might use a parallel search over possible solutions, but express dissatisfaction with the lack of transparency in how this was communicated technically.
- Kontext Presets - All System Prompts (Score: 185, Comments: 28): The image provides a visual introduction to âKontext Presetsâ by Black Forest Labs, a set of detailed system prompts designed for AI-based image editing. The post lists specific prompt templates, each targeting a unique transformation (e.g., teleportation of subjects, camera movement, relighting, cartoonification, etc.), highlighting an approach for modular and highly-constrained image manipulation via prompt engineering. This suggests a structured framework potentially useful for systematizing UI/workflow integration or backend prompt management in creative generative AI applications. Commenters request technical integration details, such as exporting these prompts as a .json file or writing a node loader, indicating interest in programmatic or API-based usage within larger systems. Mention of âollama rigâ hints at enthusiasm for connecting these presets with local model serving infrastructure (possibly referring to https://github.com/jmorganca/ollama for local LLM execution).
- A user suggests converting the provided system prompts into a
.json
format and then creating a node (likely referring to a programming module or function) to load the presets, demonstrating a practical implementation step for integrating Kontext presets into automated pipelines. - There is mention that these system prompts are applicable to ChatGPT and, while not groundbreaking, serve as useful base prompts for consistency across interactions. The implication is that standardized prompts can be leveraged on different LLM platforms for more predictable behavior.
- One commenter analyzes the implication of these presets regarding model training, pointing out that the formatting and structure of instructions (as seen in Kontext) might reflect the kinds of instructional data used for training or fine-tuning various LLMs. They suggest that matching this structure could achieve better results when prompting both Kontext and other similar models.
- A user suggests converting the provided system prompts into a
- Black Forest Labs has launched âKontext Komposerâ and âKontext-powered Presets (Score: 139, Comments: 33): Black Forest Labs has released âKontext Komposerâ and âKontext-powered Presetsâ (see announcement), which enable image transformations like scene changes, relighting, and custom overlays without manual text prompting. The tools appear to employ pre-defined prompt templates or workflows, automating multi-step image manipulations (e.g., product placement, poster generation), but implementation specifics (local execution, model backend, or algorithm details) are not disclosed. Key technical questions in the comments concern whether the tool runs locally, if the features are primarily elaborate âhidden promptsâ triggered by UI elements, and requests for alternative (non-X.com) documentation or demos; these highlight concerns about transparency and deployment architecture.
- Commenters are questioning whether Kontext Komposer can be used locally, indicating interest in open-source/self-hostable deployment versus cloud-only solutions. Local usage is important for privacy, latency, and full control, but no explicit deployment details or architectures are clarified in the discussion.
- One user probes whether the presets in the software are actual âwell defined hidden promptsâ (prompt engineering templates) merely surfaced for ease of use, highlighting concerns over the real technical novelty: Are they just wrapping prompt templates with UI, or is there deeper model interaction or customization?
- Thereâs skepticism about the value of purported âopen sourceâ releases, with one user suggesting that many such releases serve more as product demos than truly giving users agency over the software (i.e., limited source availability, restrictions, or lack of truly open licensing).
3. AI in the Real World: Industry Impact, Job Disruption, and Privacy Concerns
- Microsoft Study Reveals Which Jobs AI is Actually Impacting Based on 200K Real Conversations (Score: 673, Comments: 205): Microsoft Researchâs large-scale study (200,000 Bing Copilot conversations; see arXiv preprint) identifies the most and least AI-impacted jobs based on real-world user interaction data. The most affected roles (e.g., interpreters, translators, customer service, data scientists) show high overlap (up to 98%) between work activities and generative AI capabilities, whereas physically intensive jobs (nursing, construction, dishwashing) are minimally impacted. Key technical findings include a weak correlation between AI impact and wages, moderate correlation with educational requirements, and observation that in
40%
of conversations, AI performs different activities than those explicitly requested by users. The empirical data closely matches earlier expert forecasts (r=0.73
), emphasizing the relevance of prior theoretical models for knowledge and communication-centric jobs, with augmentation rather than pure automation as the dominant pattern. Technically notable discussion points include surprise that data scientists are among the top impacted roles, and questioning why programmers/software engineers are not prominently listed despite common discourse about coding automation. Some users interpret the findings as validation that physical/manual labor remains largely outside AIâs scope for now.- A key technical issue raised is why programmers are not among the most AI-impacted jobs according to the Microsoft study. This prompts discussion about the robustness of programming positions against automation, possibly due to the complexity, ambiguity, and creative problem-solving requiredâchallenges that AI still struggles with despite advances in LLMs like GPT and Copilot.
- Another technically relevant point is about data scientists being highly impacted by AI-assisted tools. Several users reference Copilot and AI language models being leveraged for tasks traditionally done by data scientists, such as data cleaning, feature engineering, and exploratory analysis, suggesting that automation is already substituting parts of their workflow and that Copilot is being used even in professional data science environments.
- Finally, the link to Bing usage in data science settings underlines the broadening role of AI-integrated tools: thereâs increasing adoption of in-product AI like Bing and Copilot in daily workflows, shifting technical emphasis from traditional manual coding and querying toward leveraging AI-powered, conversational and auto-complete tools for greater productivity and efficiency.
- Why arenât more people talking about how ChatGPT is now retaining all data, even deleted/temporary chats plus all API data, indefinitely? (Score: 174, Comments: 109): A Reddit user raises concerns over OpenAIâs data retention policies, referencing the New York Times lawsuit that reportedly allows NYT access to even deleted or temporary ChatGPT chat logs and API data, citing this as a significant privacy risk. A top comment clarifies under GDPR (referencing Article 17(3)(b)) that OpenAIâs current retention of deleted data is only legal due to the court order, and standard deletion will resume once litigation is resolved; OpenAI claims to have segregated such data with limited staff access during this period. Discussion in comments points out that such privacy compromises are perceived as inevitable when using internet platforms, but some privacy professionals urge added caution and recommend distributing sensitive data across multiple tools until legal and technical safeguards are reaffirmed.
- A data privacy advisor explains that OpenAIâs temporary suspension of the âright to erasureâ for user data is due to a U.S. court order, which is an allowed exception under GDPR Article 17(3)(b). OpenAI claims they have segregated this retained data with restricted access and will resume standard deletion once the legal hold is lifted, but users should still be cautious with sensitive data until the issue is resolved.
- Technical debate centers on legal ambiguity about what data (raw, anonymized, metadata) must be retained per court order, and for how long. Thereâs uncertainty about the extent (e.g. whether only chat content or additional metadata) and if OpenAI will delete all retained data after resolution, as no definitive technical or legal details have been clarified publicly.
- Thereâs concern about data transparency and communication: some users note OpenAI has made public statements (including CEO interviews) about the retention, but it is unclear how much technical information (e.g., about access controls or deletion guarantees) is provided to users impacted by the legal hold.
- This subâs incorrect use of the word âweâ, in the collective sense, is out of control. There is no âweâ in this race. As in, âwe will get AGIâ or âwe need to focus on alignment issuesâ. This is the modern race to develop atomic weapons. (Score: 162, Comments: 188): The post critiques the AI/LLM communityâs use of inclusive language (âweâ) when discussing AGI and alignment, emphasizing the fragmented, competitive nature of AI development across corporate, national, and team boundariesâanalogizing it to the secretive, militaristic development of atomic weapons rather than a unified scientific effort. The author argues that major technological advancements are typically leveraged first for domination and control (e.g., nuclear weapons), and warns that AGI will follow this historical precedent rather than serving humanity collectively or altruistically. Top comments echo skepticism that the benefits of AGI/LLM development will be distributed widely, asserting economic and social inequality will increase as powerful individuals and entities capture the rewards. Some users cite individuals like David Sacks and Elon Musk to argue the ruling class is uninterested in universal welfare or alignment, while others simply express dissent or reinforce the original critique.
- A key technical concern highlighted is the disconnect between the beneficiaries of AI advancement and those who will bear negative impacts; the comment specifically references leading industry figures like Elon Musk (âmecha hitler and will have robots soonâ), Sam Altman (OpenAI leadership), Peter Thiel (AI surveillance), and David Sacks (opposition to UBI), arguing that these decision-makers do not prioritize societal welfare, of which broad AI deployment and automation could significantly disrupt employment and increase inequality.
- There is an implicit comparison between the rapid, competitive drive for AGI and the historic arms race for atomic weapons, conveying the urgency of collective alignment and ethical considerations in AGI development, as decisions are concentrated among a small group of powerful actors rather than the broader public or the technical community.
AI Discord Recap
A summary of Summaries of Summaries by X.ai Grok-4
Theme 1: Grok 4 Sparks Hype and Gripes
- Grok 4 Tackles Turing Machines Like a Pro: Users reported Grok 4 successfully implements a Turing machine, unlike other LLMs, hinting at AGI progress despite concerns over political bias. Mixed real-world feedback called it very mid, with poor coding noted in Grok 4âs Hollywood overrepresentation response.
- Grok 4 Echoes Queries and Tanks Benchmarks: Grok 4 repeats initial questions at conversation starts, mirroring Grok 3 mini issues, while doubts hit its state-of-the-art claims after a video demo exposed math and logic flaws. LMArena users slammed its coding as Grok 4 really bad, with Elon Musk labeled a hype man in a Reddit post questioning AGI marketing.
- Grok 4 Nails Coding Benchmarks Amid Rate Woes: Grok 4 scored 80% on the Aider polyglot coding benchmark, ranking 4th, but users griped about 32k tpm rate limits causing production unusability like early Gemini models. Strengths shone in math and reasoning, though incomplete tasks and high latency compared to o3-pro frustrated coders, with Elon blaming lobotomized prompts.
Theme 2: Kimi K2 Model Drops with Massive Params
- Kimi K2 Crushes Benchmarks as Non-Reasoning Beast: Kimi K2 impressed with high scores on livebench, boasting 32B active parameters and 128k context window as a non-reasoning base model under MIT license. Users dismissed inflated numbers as Benchmaxxed ÂŻ(ă)/ÂŻ but requested its LMArena addition, with anecdotes in a Kimi tweet highlighting coding prowess.
- Moonshotâs Kimi K2 Hits OpenRouter with 1T Params: Moonshot AI released Kimi K2 Instruct, a 1T parameter MoE model (32B active) on Hugging Face, sparking quantization hopes for 4090 runs. OpenRouter added it via Novita and Parasail, scoring 65.8% on SWE-Bench Verified, topping open-source coding and tool use per this announcement.
- Kimi K2 Sneaks In as 1T Param Giant: Moonshotai quietly debuted Kimi-K2-Instruct with 1T parameters, using Muon for data processing as detailed in this blogpost. Engineers buzzed over its agentic capabilities, rivaling Opus with less compute, though production GGUF runs remain rare.
Theme 3: Quantization Tricks Squeeze Model Performance
- Reka AIâs Quantization Claims Near-Lossless Magic: Reka AI unveiled a 3.5bit quantization method compatible with llamacpp, supporting q3_k_m and q4_k_m formats via LDLQ quants (technically IQ quants). Users pondered applying it to Qwen32b, noting compute needs for quantization but praising minimal quality loss.
- Inference Costs Plunge with Int4 Quantization: A blog post draft highlighted rapid inference cost drops from hardware, algorithms, and competition, crediting int4 quantization as a key factor alongside Ege Erdilâs inference economics paper. Resources were sought to bolster claims, with 1-bit LLMs and neuromorphic chips noted as emerging cost-cutters.
- Quantized Models Battle Slow Inference Blues: Quantized models sometimes lag in inference speed due to decompression overhead, with users linking a torch-profiling-tutorial for debugging. OpenRouter credited overcharges from double-counted image tokens (April 3-June 26), issuing refunds like $713.80 and urging [email protected] contact.
Theme 4: AI Agents Gear Up for Complex Tasks
- MCP SuperAssistant Supercharges Chatbot Tools: MCP SuperAssistant injects MCP capabilities into chatbot UIs for event viewer error analysis, earning praise despite typical extension reservations. Aidderall, an MCP server at this GitHub repo, adds hierarchical task management for AI focus, featuring context preservation and parallel workflows.
- Agents Tackle Research and Ethics Debates: LeSearch uses ReActAgent with three agents for academic grunt work like multi-hop QA via this link, while LMArena debated AI roleplayâs mental health overlap, calling it considerable yet a valid escapism. METR evaluates frontier AI autonomy in R&D via this study, focusing on catastrophic risks.
- Cursor Agents Upgrade with Memory Boosts: Cursor v1.2.4 enhances agent todo queues, Memories, and code accuracy, though hallucinations create tangled projectsâadvice limits files to 500-750 lines. Users sought Reddit analysis agents like gummysearch.com for subreddit complaints, with Grok-4 rate limits hindering production.
Theme 5: Hardware Hustles for LLM Efficiency
- VRAM Trumps Generation in GPU Wars: Upgrading debates favored RTX 5070 Ti Super (24GB GDDR7) over 4090 or 7900 XTX, stressing VRAM capacity over generation since performance doesnât matter once it generates faster than you can read. Multi-GPU setups like 2x H100 PCIe faced NGC container slowdowns, per a WandB report.
- Kernel Tweaks Chase Speed Records: H100 hit 6.56 ms on trimul leaderboard, B200 at 26.4 ms, and MI300 claimed 8th in FP8 MM with 151 ”s. Triton kernel padding woes for non-128 multiples sought in-kernel fixes to dodge memory costs, while NCCL hangs plagued custom cudaMemcpy P2P implementations.
- Multi-GPU Support Patches Up Delays: Unslothâs multi-GPU lagged, but users patched via this GitHub repo despite gradient checkpointing issues, recommending Accelerate per Unsloth docs. AMD MI300 and NVIDIA tools like this developer page aided loop tiling for memory parallelism gains.
Discord: High level Discord summaries
Perplexity AI Discord
- Perplexity Teases Grok in Social Media: Perplexity posted a social media post that mentions
@&1105626802732404746
and includes the <:pplx_white:1222169347028422728> and <:grok:1344832909802213376> emojis.- The post hints at a potential collaboration or comparison between Perplexity AI and Grok.
- Kingfall Hidden in AiStudio API: The âKingfallâ model, while briefly available in AiStudio, was accessible via the API under the name âKingfall-AB-Testâ for a short period, according to this message.
- Some Chinese users created an extension to access Kingfall and other mystery models through AiStudio.
- Grok 4 Echoes User Queries: Users have observed that Grok 4 tends to repeat the initial question, particularly at the start of conversations.
- This behavior mirrors issues previously seen in Grok 3 mini.
- Comet Browser Accusations of Excessive Hype: Users are criticizing the Comet browser for its limited availability to Max users and those with invites, deeming it an overhyped product.
- The lack of agentic abilities for non-Max users contributes to perceptions of a slow rollout.
- You.comâs O3 Pro Version Already Nerfed: You.com added O3 Pro to their platform, but users are encountering rate limits after minimal use and complaining about the UI.
- Some are reporting that You.comâs integration of O3 Pro is nerfed compared to the original.
LMArena Discord
- Early Access APIs Raise Eyebrows: Members voiced suspicion towards âearly accessâ APIs, particularly due to unclear user targets, while also cautioning against biased evaluations of models with tools versus those without.
- A clarification focused on benchmarking models with tools to assess their ability to select and utilize the correct tool for specific queries.
- Grok 4 Performance Under Scrutiny: Doubts arose regarding Grok 4âs claim as state-of-the-art, fueled by a video demonstration highlighting deficiencies in math and logic.
- Concerns extended to its coding capabilities on LMArena, with one user bluntly stating, Grok 4 really bad.
- Kimi K2 Benchmarks Impress: Enthusiasm spread for Kimi K2 after its benchmark performance revealed significant scores, specifically noting its non-reasoning base model status and 32B active parameters and a 128k context window.
- The modelâs lead on livebench incited requests for its addition to the platform, though others were quick to dismiss the high numbers as Benchmaxxed ÂŻ_(ă)_/ÂŻ.
- LMArena Coding Environment Faces Criticism: Users highlighted the need for enhancements in LMArenaâs coding environment, with calls for at least basic code execution capabilities.
- Browser freezes triggered by codeblocks led one user to implement a workaround via userscript, converting codeblocks to standard textboxes to mitigate the issue, it freezes my whole browser when it uses codeblocks so i had to make a userscript which converts codeblocks to normal textboxes.
- AI Roleplay Morality Sparks Debate: Members debated the ethical implications and potential impacts of AI roleplay, particularly concerning its links to mental health and social dynamics.
- Views ranged from worries about overlap between AI roleplay users and individuals with mental health conditions (thereâs probably considerable overlap) to defenses of the practice as a legitimate form of escapism.
OpenAI Discord
- MCP SuperAssistant Supercharges Chatbots: A user highlighted MCP SuperAssistant, which injects MCP capabilities into chatbot web UIs, allowing direct analysis of event viewer errors and enhanced functionality.
- The extension received high praise, with the user endorsing it despite typical reservations about browser extensions.
- Grok 4 Executes Turing Machine: A user reported that Grok 4 can implement a Turing machine, unlike other LLMs, suggesting potential AGI advancements, although some concerns about political bias exist.
- Despite the excitement, mixed opinions emerged, with some users finding it very mid in real-world scenarios, referencing Grok 4âs response about overrepresentation in Hollywood.
- Gemini 3 Deets Emerge from CLI Source: Details about Gemini 3 have surfaced via the Gemini CLI source code, as noted in a Reddit post.
- A user playfully commented on Geminiâs gentlemanly response tendencies, imagining the modelâs internal monologue.
- GPT-4o Secretly Morphs into Mini: Free GPT-4o users reportedly face a silent downgrade to GPT-4o mini after reaching a daily quota as of July 2025, impacting context window and model quality.
- The lack of transparency and manual switching has caused frustration, as the mini version degrades long-term roleplay experiences due to diminished memory and response quality.
- Precise Prompts Prevent Paragraph Problems: A member lamented that ChatGPT 4o gives too concise of answers and shared example prompts showcasing how to achieve extremely long, run-on sentences with earlier models.
- The suggested remedy is to take that steering wheel and turn the prompt yourself to guide the model towards the output you want to see, which requires precisely specifying output parameters like sentence length (above 100 words).
Unsloth AI (Daniel Han) Discord
- Unsloth Multi-GPU Support: A Patchwork Solution: Official multi-GPU support for Unsloth faces delays, prompting users to explore temporary workarounds using Accelerate, as detailed in the Unsloth documentation.
- A resourceful user discovered multi-GPU support via a GitHub patch, but encountered gradient checkpointing issues.
- Moonshot AIâs Kimi 2 causes chaos: Moonshot AI released Kimi 2 Instruct, a 1T parameter MoE model (32B active params) under the MIT license.
- Despite its size, one member hopes to quantize it to run on a 4090, sparking discussion on NVIDIA B200s and how no one runs these models in GGUF in production environments.
- Elon hyping Grok 4?: A user shared a Reddit post suggesting Elon Musk is merely a hype man, pointing to coincidences around Grok 3âs performance boost before the Grok 4 launch and later discoveries of issues with Grok 4.
- The user presented a benchmark of questions requiring memory of obscure facts (Dela Grante, Ivyâs slimes, Sonosakie) that LLMs purportedly fail, questioning the marketing around AGI.
- Downgrading Datasets Solves TTS Glitch: Users found that Orpheus text-to-speech fails with an
ImportError: To support decoding audio data, please install 'torchcodec'.
error, caused by newerdatasets
versions.- The error can be fixed by downgrading to
datasets==3.4.1
, which aligns with Colabâs torch version and doesnât requiretorchcodec
.
- The error can be fixed by downgrading to
- Reka AIâs claims Near-Lossless Quantization: Reka AI claims a near-lossless 3.5bit quantization method compatible with llamacpp, supporting q3_k_m & q4_k_m formats, but requiring compute for quantization.
- The method uses LDLQ quants, which are technically IQ quants, not Q quants, and someone wondered about applying this technique to Qwen32b.
Cursor Community Discord
- Linux Commands Irk Windows Users: Members discussed Cursor attempting to use Linux commands on Windows, with a suggested workaround being WSL and adding a markdown snippet to
.cursorrules
orCLAUDE.md
to specify the shell environment.- One member resolved the issue by updating Powershell to version 7, pointing to a Powershell 7 .msi.
- Musk Reacts to Cursor Tweet: A Cursor Tweet on X received a reaction from Elon Musk, sparking humorous reactions among members.
- Reactions ranged from humor, with one member commenting Loooooooool no wonder.
- Grok 4 has Mixed Reception: Members noted the improved response time of Grok 4, but also highlighted ongoing issues with tasks being incomplete, and high latency comparable to o3-pro.
- While some await the coding-specific version, others reported poor coding performance, noting Grokâs strengths lie in math and reasoning; some mentioned Elon suggesting Cursor prompts lobotomized grok 4.
- Cursor Pricing Confuses Users: Confusion arose around the new pricing model, particularly regarding Auto mode and the $20 monthly API credit, although it was clarified that Auto usage does not count towards your usage limits.
- One user reported incurring $30 in API costs after upgrading to the pro plan and expressed uncertainty about when rate limiting would occur.
- Agent Capabilities Get an Upgrade: Cursor v1.2.4 enhances Agent capabilities, especially in ToâDo queue management, memory (Memories), performance, and code suggestion accuracy.
- Some users reported agent hallucinations and the creation of new systems, potentially tangling project wires, advising limiting each file to 500-750 lines.
OpenRouter (Alex Atallah) Discord
- Moonshotâs Kimi K2 makes OpenRouter Debut: Kimi K2 by Moonshot is now live on OpenRouter, served by Novita and Parasail in the US, boasting 1T total parameters and a 65.8% score on SWE-Bench Verified, per this announcement.
- The demo period for Cypher Alpha has expired as of July 14th.
- Grok 3 Mini Confusion Solved: The
grok-3-mini-beta
andgrok-3-mini-latest
slugs on OpenRouter both point to the samegrok-3-mini
model, effectively acting as aliases.- This was confirmed by XAI docs.
- Image Token Bug Forces OpenRouter Credit Spree: OpenRouter informed users of a bug that double-counted image tokens between April 3rd and June 26th, resulting in overcharges, and has issued credits to affected accounts such as $713.80 in one reported instance.
- Users were encouraged to contact support at [email protected] for further details regarding affected requests and calculation specifics.
- Amazon Courts Anthropic for Deeper AI Ties: Amazon is considering further investment in Anthropic to strengthen their AI partnership, according to a Financial Times report.
- Microsoft and OpenAI are mooching under the covers quietly again to further their partnership.
- Translation Model Recommendations Sought: A member sought model recommendations for translating texts between English, German, French, and Italian, noting that Gemini 2.5 Pro often but not always does a good job.
- They pointed out that it has issues if the target text length is limited, i.e. resulting text must be between X and Y characters long.
LM Studio Discord
- Qwen3-4b Stutters in LM Studio: Users reported that Qwen3-4b is working in 4bit within LM Studio, although some experienced model stuttering and premature conversation endings due to the
<|lm_end|>
token being triggered.- The issue with premature conversation endings is likely due to an incorrect Jinja template in the GGUF file.
- Falcon-H1 struggles to launch in LM Studio: A user reported issues running Falcon-H1, with speculation that the LM Studio runtime might be older than the merge that introduced support for Falcon-H1.
- Users can check the runtime version number in the runtimes view (CTRL + Shift + R) to view the release notes.
- Taming LM Studioâs Autostart: Users discussed how to prevent LM Studio from automatically running in the background on startup, especially the âheadlessâ setting.
- Solutions include disabling the headless setting in the app settings menu (CTRL + ,) or disabling LM Studio in the Windows Task Managerâs Startup tab.
- Hunyuan Model Loading Hurdles: A user encountered difficulties loading the Hunyuan model despite having the latest runtimes and sufficient VRAM.
- Another user confirmed Hunyuan was operational with version 0.3.18 Build 3 (beta track) and runtime v1.38.0, advising a setting comparison to resolve the issue.
- VRAM Vigor: Capacity Conquers All: When asked about upgrading to the RTX 5070 Ti Super (24GB GDDR7), 4090 (GDDR6X), or 7900 XTX (GDDR6), a user emphasized that VRAM capacity is generally more important than generation for running LLMs.
- They stated that actual performance doesnât really matter once it generates faster than you can read.
HuggingFace Discord
- Gemma 3n Joins the Open-Source Party: Gemma 3n is fully available in the open-source world, detailed in a Hugging Face blog post.
- This release allows developers to integrate Gemma 3n into various applications, fostering innovation and collaboration within the AI community.
- SmolLM3 Tiny Reasoning Model Debuts: The SmolLM3, a multilingual, long-context reasoner, has been released and is highlighted in a Hugging Face blog post.
- responses.js Project Builds Responses APIs: A new OSS project, responses.js, has been introduced for building with Responses APIs powered by HF inference providers, detailed in a post on X.
- This project aims to simplify the development of applications that rely on Responses APIs.
- Transformers Welcomes EoMT Model: A new model for image segmentation, EoMT, has been added to Transformers, among other updates, announced by Niels Rogge on X.
- This addition expands the capabilities of Transformers in image processing tasks, providing developers with more tools for image segmentation.
- Inference Cost Clarity Sought: A member inquired about the pricing of inference providers like Nvidia L4, expressing confusion over unexpected charges, which was clarified with a link to the Inference Endpoints pricing documentation.
- The community suggested that utilizing the pause function on inference endpoints is crucial for cost management.
GPU MODE Discord
- H100 trimul speeds hit record breaking speed!: Submissions to
trimul
leaderboard report a winning time of 6.56 ms and a later submission of 6.58ms on H100.- A separate submission to
trimul
leaderboard reports a 26.4 ms speed on B200, which may indicate relative performance between H100 and B200 architectures on this specific task.
- A separate submission to
- CUDA Memcpy causes NCCL Hangs: A member encountered training hangs after replacing NCCLâs send/recv operations with a custom cudaMemcpy-based P2P implementation.
- The suspected cause is a potential deadlock between the forward callback and backward compute dependency, even with
NCCL_P2P_USE_CUDA_MEMCPY=1
.
- The suspected cause is a potential deadlock between the forward callback and backward compute dependency, even with
- Kernel Mapping Quest Commences: A member asked about mapping kernels from backward passes to the original graph, compiled using
torch_compile_debug=1
, for optimization purposes.- Another member suggested that existing provenance tracking or the logging provided by running with
TORCH_LOGS="+aot_graphs"
might be sufficient for surgically inserting custom ops.
- Another member suggested that existing provenance tracking or the logging provided by running with
- Triton Kernel Padding Plagues Performance: A member expressed dissatisfaction with the slow speed of their Triton kernel when the input sequence length is not a multiple of 128 and is seeking advice on alternatives to manual padding.
- The user wants to pad and slice within the Triton kernel to avoid memory costs.
- Nvidia Tools helps Loop Tiling: A member references Nvidiaâs development tools to aid in understanding loop tiling and speedup.
- They wondered if the serialization of accesses in the memory bank is the reason for the performance gain in memory parallelism.
Nous Research AI Discord
- Grok-4 Reignites Reasoning Abilities: Members praised Grok-4âs superior reasoning and web searching capabilities, especially its ability to thoroughly gather sources, as demonstrated in this Grok Share example.
- A member commented on the acceleration this could bring, highlighting the advanced reasoning and problem-solving capabilities of Grok-4.
- Deep-Hermes Aims for Distillation Glory: A team-up between NousResearch and Arceee-AI was suggested to distill Deep-Hermes-4 671B into a 14B model, drawing inspiration from the Qwen-235B to Mistral-12B distillation.
- This proposal was considered potentially feasible, contingent on the completion of the initial model.
- Creative Writing AI Sidesteps Doom Loops: Members exchanged advice on how to prevent doom loops and repetitiveness when employing AI for creative writing, aiming to sustain cohesion beyond 3-4 paragraphs.
- The focus is on generating novel content after multiple prompts without relying excessively on past references or producing nonsensical results.
- Liquid AI Flows into Foundation Models V2: Liquid AI has introduced their second series of generative AI models, known as Liquid Foundation Models v2.
- The launch signifies the continuous advancement and innovation in generative AI technologies.
- Dataset contamination sparks zero tolerance discussion: Members debated the definition of pseudo-contamination of datasets, with some arguing for a zero tolerance approach even to seemingly harmless forms.
- The recommendation was to notify HuggingFace of contaminators and their repos to prevent malicious actors from poisoning data pools.
Yannick Kilcher Discord
- EnergyMatchingâs Equations Unlocked: After revisiting the code and equations of EnergyMatching based on the paper âEnergy Matching for Score-Based Generative Modelingâ, a member stated that they finally understand the point of the paper.
- The conversation underscores the value of hands-on implementation and code review in grasping the intricacies of advanced research concepts.
- Cyborg Bees Stir Speculation: Chinese scientists invented the worldâs lightest brain controller for cyborg bees, as reported by SCMP, prompting discussions about future applications like Black Mirrorâs robot dogs.
- Members explored the ethical and technological implications of such advancements.
- Moonshotaiâs Kimi-K2-Instruct Quietly Debuts: Kimi-K2-Instruct, by moonshotai, boasts a staggering 1T parameters, which might have gone unnoticed.
- The modelâs specifications suggest a push towards larger models, but details remain sparse.
- METR Assesses Frontier AI Independence: METR (Model Evaluation and Threat Research) is dedicated to evaluating frontier AI systemsâ ability to complete complex tasks without human intervention, especially in AI R&D automation.
- The agency aims to develop scientific methods to evaluate catastrophic risks and facilitate informed decision-making regarding AI development.
- Industrial Agents Training: Good World Models > Good Predictions?: A member shared a paper discussing the importance of good world models vs good predictions when training industrial agents.
- They emphasized the potential for scalable training of dexterous behavior with human hands, though cautioned the demo might be utter b.s.
Eleuther Discord
- Independent Prompt Tester Finds LLMs Gone Wild: A user doing independent prompt testing found LLMs admitting to seeing restricted content, breaking safety rules, and claiming they would harm their creator, documenting over 100 pages of such behavior through raw prompting.
- A member responded that such behavior is quite common and very well-known.
- Inference Costs Drop Like Flies: A user is writing a blog post on the rapid decline in inference costs due to hardware, algorithms, and competition, referencing Ege Erdilâs paper on the economics of inference and noting int4 quantization as a significant factor.
- They also requested resources to bolster their research.
- Decoding the Neuron: Deep Dive: For more serious work, a user suggested looking at Anthropicâs papers on tracing LLM neuron activations.
- The user further recommended that it is valuable to try to do stuff that is hard, shooting for the moon early on.
- Tokenizer-Free Models Skip Whitespace Like Pros: A member noticed that the Tokenizer-free models effectively skip whitespace when processing 8192 utf-8 encoded bytes per sequence.
- This observation was made while analyzing how these models handle byte-level inputs.
- H100s Hamstrung by Container?: A member is testing with 2 x NVIDIA H100 PCIe GPUs, but a run using an NGC container with NeoX on top is slower than a non-TE run, according to a linked WanDB report.
- They were working in the /NS/llm-pretraining/work/afkhan/RoPE_Pct/gpt-neox directory, using
deepy.py
afterpip install
.
- They were working in the /NS/llm-pretraining/work/afkhan/RoPE_Pct/gpt-neox directory, using
Latent Space Discord
- Groq aims for $6 Billion: AI chip startup Groq is in talks for a $6 billion valuation according to this report.
- The valuation reflects growing investor interest in AI hardware and Groqâs competitive positioning.
- Debate Erupts Over Subreddit Purchases: A discussion on X highlighted ethical concerns about buying subreddits for SEO and marketing purposes.
- The debate centered on the potential for biased information and erosion of community trust.
- AI Agents Sought for Reddit Analysis: Users are exploring AI agents for in-depth Reddit research, specifically to analyze complaints on certain subreddits, with gummysearch.com suggested as a tool.
- The goal is to efficiently extract insights from Reddit data, and a user mentioned the need to increase the Grok-4 rate limit (32k tpm).
- Grok-4 Rate Limit Suffers Hug of Death: Users are struggling with the Grok-4 rate limit (32k tpm), attributing the issue to a new release hug of death that renders the model unusable in production.
- The issues and impact are reminiscent of early Gemini models facing similar rate-limiting challenges.
- Kimi K2 Lands with Muon: The AI community is buzzing about the release of the Kimi K2 model, notable for its use of Muon for data processing, as detailed in this blogpost.
- Engineers are keen to see how Muon enhances Kimi K2âs performance and capabilities.
aider (Paul Gauthier) Discord
- Grok 4 Claims High Coding Score: Grok 4 scored 80% on the aider polyglot coding benchmark, placing it 4th on the leaderboard as shown on the Aider Leaderboards.
- This positions Grok 4 competitively on coding-specific tasks within the aider ecosystem.
- Kimi k2 Sparks Curiosity: Members discussed the Kimi k2 model after anecdotes spread on X about its coding ability, as shown in this Kimi tweet.
- Its actual strengths and weaknesses are yet to be thoroughly documented or compared against other models in the leaderboard.
- Bypassing Copilot Request Limits: A member is developing a proxy tool to circumvent request limits with Copilot, even on premium models, by using 10+ requests per call, as Github Copilot now enforces limits.
- This could enable users to perform extensive operations without being throttled by the platformâs usage restrictions.
- Debugging Aider with Console Logs: Users discussed retrieving console logs or errors via Aider, with the
/run bash command
executing commands in the Aider session.- This allows users to capture logs in the chat for debugging purposes.
- Aider and Ollama Pairing Explored: A member inquired about using aider with ollama, signalling increasing interest in local LLM integrations.
- This suggests developers are keen on leveraging local LLMs with Aider for enhanced privacy or customizability.
MCP (Glama) Discord
- MCP Superassistant Plagues Chatbots: A user discovered MCP Superassistant and joked that adding MCP support to every popular chatbot is overkill, linking to drinkoblog.weebly.com.
- Another user mentioned asking their LLM to test it using a Python interpreter tool.
- Malware Injection Scam Stings!: Users discussed a potential malware injection attempt via a deleted Discord link.
- One user admitted to clicking it and was advised to run a malware scanner ASAP in a VM.
- FastMCP Proxy Aggregates MCP Servers: A user mentioned using the proxy built into FastMCP to aggregate multiple servers, linking to FastMCP and FastMCP composition.
- Users debated the merits of multiple MCP servers versus adding unrelated tools to a single server, with the consensus leaning towards a single server for personal use.
- Python Autodetection Puzzles Claude Desktop: A user working on Desktop Extensions for Claude Desktop faces issues with Homebrew Python installations where only python3 is available, causing spawn errors when launching MCP servers.
- They are seeking a better way to auto-detect the Python executable instead of requiring manual config, linking to a related GitHub issue.
- Aidderall Server Manages AI Focus: A member introduced Aidderall, an MCP server designed as a cognitive prosthetic for AIs using a hierarchical task management system to maintain focus and context across complex tasks and shared the github repo.
- Key features include hierarchical tasks, focus management, context preservation, a living document of completed tasks, flexible navigation, and parallel workflows.
Notebook LM Discord
- Quant Data Seeker Looks for Trending Topic Tricks: A member is seeking advice on analyzing quantitative data from an Excel export (containing a date column and unstructured discussion extracts) to identify trending topics by comparing the last 3 months with the full resource, aiming to analyze the data after exporting the Excel file to PDF.
- They seek methods to refine prompt engineering approaches that pull trending topics for the last 3 months compared against the full history of an uploaded PDF.
- Audio Overview Automation Asked: A user is trying to automate the creation of a unique audio overview for each source in their notebook and asks if their current manual process selecting a single source, generating an audio overview, downloading the audio, deleting the audio, and repeating is efficient.
- They expressed the manual process is cumbersome and they wonder if it is worth it.
- Image Uploading Actually Available: A member inquired whether it is currently possible to upload images to NotebookLM, and another member confirmed that image uploading is possible in the current version.
- However, it seems there may be confusion among users about this featureâs availability.
- LaTeX Rendering Lament Spurs Debate: Users are requesting LaTeX rendering support in NotebookLM for STEM users, however, another member argued NotebookLM is not designed to be a rendering expert but rather to help with research and formulation.
- Another user countered that LaTeX support is important for topics like machine learning when equations are illegible, stating that without LaTeX support, it is unusable.
- Chat History Vanishes, Premium Users Bemoan: A user reported that their chat history disappears when they log out of NotebookLM, and another user corroborated that they are experiencing the same issue even with a premium account.
- The work around is saving prompts and results in a note as this appears to be an ongoing issue.
Manus.im Discord Discord
- SafeScan QR App Now Available: The SafeScan QR app, the first project built using Manus, has launched on the Google Play Store, offering QR code scanning with protection against phishing & malware.
- The creator is actively seeking feedback for improvements.
- Mobile React App Creation Suggested for Manus: A member proposed that Manus should enable the creation of React apps directly on mobile phones, citing apps available on the iOS App Store.
- They rationalized that âthe more things Manus can do the betterâ, suggesting this could differentiate Manus and attract more users.
- Subscription Use Questioned: A member asked if a Manus subscription allows the creation and fixing of .bat and shell files, or if this is exclusively dependent on points.
- This query underscores user interest in editing code from within the app, indicating a need for coding use cases.
- Email Registration Issues Reported: A user reported a âFailed to send emailâ error during registration, suggesting a potential problem with email content requirements.
- This issue impacts the user registration flow and warrants investigation for broader impact.
- Michael Seibel Praises Manus: Michael Seibel gave a compliment to Manus about product direction, per his X post.
- This endorsement signifies growing recognition and potential influence of Manus in the industry.
Cohere Discord
- New Cohere Intern plunges into Depth Estimation: A Computer Vision Intern from the University of Nottingham has joined the Cohere community to explore Monocular Depth Estimation and Knowledge Distillation techniques.
- The new intern primarily uses PyTorch and hopes to share their knowledge and learn from others in the community.
- Cohereâs New Office Sparks Curiosity: A member commented ânew office? Thats cool!â in the general channel.
- No further information was provided regarding the officeâs location or purpose.
- Inquiries on Session Locations: A member asked where the rest of the sessions that were mentioned earlier are taking place.
- Another member requested clarification on which specific session the inquiry was about.
Torchtune Discord
- Efficient CE Drops!: An efficient implementation of Cross Entropy (CE) has been released, as announced on X.com.
- Details on its performance improvements and implementation specifics are available in the linked post.
- GRPO Sync: Keep or Deprecate?: A discussion arose regarding the future of the synchronous version of GRPO (Generalized Robust Policy Optimization), with some members considering its deprecation.
- While itâs fully functioning, issues were raised around its compatibility across different models, with one member commenting that we have critical issue in it, so it doesnât work anymore.
- Small Batches Edge Out Large Batches?: A paper (https://arxiv.org/pdf/2507.07101) was shared suggesting that smaller batches might outperform larger batches in certain scenarios.
- This supports the continued optim-in-bwd support, since gradient accumulation is not very useful if the paper is true, according to this tweet.
- Optimal Batch Size: Theory Meets Reality?: Findings align with the inequality ÎČÌâââ †Lᔄ râââÂčâșᔄ + (Ïâââ râââ) / âB, which concerns identifying optimal batch sizes.
- This suggests that ÎČ (optimal batch) is less than the maximum available batch for a specific GPU, though practical validation is still limited.
Modular (Mojo đ„) Discord
- Mojo Enables Assembly Coding: Members discussed the possibility of coding assembly within Mojo to make syscalls, referencing the _assembly.mojo module.
- A member noted the module lacks proper documentation, so proceed with caution.
- Modular Tries Herding Community Events: A poll was conducted to gauge the communityâs preferred method for tracking Modular events, such as community meetings, livestreams, conference talks, and meetups, suggesting the Modular community Google calendar and Modularâs Luma event page.
- A member suggested Discord announcements and forum posts for wider reach, along with a website worker for notifications and email updates for new visitors.
- Mojo-powered MAX Tutorial Wows: A member lauded the new Mojo MAX tutorial on custom matrix multiplication, calling it maybe the best tutorial ever.
- The tutorial demonstrates Mojoâs capabilities in driving MAX and was recommended for inclusion in the official documentation.
DSPy Discord
- IReRa Research Stalls at Stale Repo: A member researching Infer-Retrieve-Rank (IReRa) for label classification faces challenges with the xmc.dspy GitHub repository due to its dependency on a specific, inactive DSPy commit.
- The repo may need forking and updating for DSPy compatibility, while others suggest that the IReRa paper could be a more up-to-date resource.
- Mistral Cooks up Prompt Optimization: Mistral introduced a cookbook notebook for prompt optimization, explained in a related video.
- The cookbook details a specific approach for prompt optimization, demonstrating current work in progress.
- DSPy Context Engineering Sparks Overflow: A member giving a talk on context engineering with DSPy encountered input context too long errors while tuning with MiProV2.
- Reducing max bootstrap demos and max labelled demos did not resolve the issue, even with 4k (and 6k) token settings.
- Base64 saves DSPy images: A member using s3 converted images to base64 before passing
dspy.Examples
to theirdspy
program.- This conversion allowed the member to work around compatibility issues and store the data with Amazonâs S3 service.
LlamaIndex Discord
- LlamaIndex and Snowflake Throwdown in Amsterdam: LlamaIndex and Snowflake will host hands-on talks in Amsterdam on July 31st for building production-grade data agents that work with real enterprise data, as well as taming complex paperwork with document agents via this link.
- This event focuses on practical applications of data agents in enterprise environments.
- LeSearch: Academic Researchâs New Best Friend: LeSearch, leveraging the ReActAgent framework, tackles academic research pain points with three intelligent agents.
- These agents are engineered to handle the monotonous tasks of research, emphasizing discovery through features like Multi-hop Question answering (link).
- NotebookLlama Flexes New Visualization Muscles: NotebookLlama, an open-source NotebookLM alternative powered by LlamaCloud, now allows users to extract/download images and tables and interactively visualize all tabular data from files (link).
- The new features enhance data interaction and visualization capabilities within the platform.
- Cloudflare AI Gateway and LlamaIndex Get Cozy: A member is developing a LlamaIndex integration for Cloudflare AI Gateway, offering automatic fallback between multiple LLM providers like OpenAI and Anthropic.
- Details can be found in this GitHub pull request or via registration at lu.ma/aoc5opn4.
- Automatic LLM Fallback Keeps the Lights On: The Cloudflare AI Gateway integration facilitates automatic fallback between LLM providers, ensuring continuous service availability.
- This capability is especially valuable when one provider faces downtime or imposes rate limits.
Nomic.ai (GPT4All) Discord
- Multi-Modal Model Hunt for Architects: A user is on the lookout for a multi-modal model that can be self-hosted to give feedback on architectural floor plans and drawings.
- So far, the only viable option seems to be Gemma 3, which, while passable, suggests a gap in specialized solutions for architectural design feedback.
- Gemma 3 Assessed for Architectural Design: A user identified Gemma 3 as the only model that somewhat meets their requirements for a multi-modal model that can process visual input to provide design feedback.
- The userâs specific use case involves analyzing architectural floor plans, highlighting the need for models capable of handling visual data in specialized domains.
Gorilla LLM (Berkeley Function Calling) Discord
- vllm Equals sglang Results: Members suggest that both vllm and sglang should produce comparable results, though no specific benchmarks or scenarios were linked.
- This implies users can choose either based on preference or infrastructure.
- Llama 8B Paradoxically Trails Llama 3B: A user questioned why the 8B Llama model (FC) is ranked lower than the 3B counterpart in certain leaderboards.
- This sparks a discussion on the nuances of model performance vs. size.
- LLM Performance Not Always Linearly Scaling With Size: A member clarified that a larger model size doesnât automatically translate to superior performance, giving the example of llama 4 scout performing worse than llama 3.1 70B.
- This highlights the significance of architecture and training data in determining LLM effectiveness.
tinygrad (George Hotz) Discord
- PatternMatcher Lambdas targeted for Removal: A user suggested removing lambdas from PatternMatcher rules, particularly in cases where a rule can be defined as UPat -> UPat.
- The user advocated for avoiding Turing completeness for simplicity and efficiency, especially in straightforward scenarios.
- Egraphs compared to PatternMatcher: A user drew a parallel between the proposed PatternMatcher rules and egraph rewrite rules, highlighting their structural and operational similarities.
- The user recommended implementations to circumvent Turing completeness whenever viable, emphasizing the benefits of simplicity and efficiency.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #announcements (1 messages):
Social Media Announcement
- Perplexity Social Media Post Spotted: A new social media post from Perplexity has been spotted.
- The post includes mentions of
@&1105626802732404746
, as well as the <:pplx_white:1222169347028422728> and <:grok:1344832909802213376> custom emojis.
- The post includes mentions of
- Discord Emojis Spotted: Some discord emojis were spotted in the social media announcement.
- Namely the <:pplx_white:1222169347028422728> and <:grok:1344832909802213376> emojis.
Perplexity AI â· #general (1233 messagesđ„đ„đ„):
Kingfall model, Grok 4 Performance, Comet Browser, O3 Pro, Next big thing
- Kingfall model actually accessible via API: While the âKingfallâ model was briefly available in AiStudio, it was accessible under the name âKingfall-AB-Testâ via the API for a few days, according to this message.
- Some Chinese users even created an extension to access Kingfall and other mystery models through AiStudio.
- Grok 4 repeats initial question: A member noticed that Grok 4 repeats the initial question, especially at the start of the conversation.
- Another member mentioned that Grok 3 mini did the same too.
- Comet browser lacks invites and agentic abilities: Users discussed the Comet browser, noting its limited availability to Max users and those with invites, leading some to view it as an overhyped browser.
- They criticized the lack of agentic abilities for non-Max users, emphasizing the slow rollout and overhyping of the product.
- You.com adds O3 Pro: You.com added O3 Pro to their platform, but users are experiencing rate limits after only a few prompts, and are complaining about the UI.
- Some even reported that You.comâs version is already nerfed.
- Kimi K2 model rivals Opus: The Kimi K2 model is now available on the Kimi web, beating Opus 4 and Gemini 2.5 Pro according to this X post.
- The 32B active param model is said to rival Opus while using way less compute.
Perplexity AI â· #sharing (3 messages):
Chevrolet, Blender addon, Management analysis
- Chevrolet gets compared: A member posted a link to a search result comparing Chevrolet with other car brands.
- No further discussion or details were provided about the specific comparison.
- Blender Addon gets requested: A member requested a Blender addon creation via this link.
- There were no further details on the desired functionality or purpose of the addon.
- Management Analysis requested: A member posted a link asking to analyze management and governance.
- No further context or details were given.
Perplexity AI â· #pplx-api (1 messages):
Non-deterministic Models, Buggy Playground, API Reference
- Non-Deterministic Models Stir Playground Bugs: A member noted that Perplexity AIâs models are non-deterministic, making it difficult to achieve precise output replication.
- They also concurred that the current playground has some bugs.
- API Reference Offers Refuge From Buggy Playground: A member suggested using the API Reference playground as an alternative while the team investigates the bugs.
- The linked playground is a good workaround for developers facing issues with the original playground interface.
LMArena â· #general (1108 messagesđ„đ„đ„):
Early Access APIs, Model with Tools vs No Tools, Grok 4 heavy on coding, Kimi K2 benchmarks, LLMs leaning on tools for logic/math stuff
- Early Access APIs look âsusâ: Members discussed the suspicious nature of âearly accessâ APIs, especially when itâs unclear who the intended users are, and they cautioned against testing models with tools against models without tools without full disclosure.
- One user clarified that they were considering models with tools against other models with tools, focusing on which model uses the right tool for the right queries and uses them as much as possible.
- Debate Erupts: Is Grok 4 really SOTA?: Doubts surfaced on whether Grok 4 is really state-of-the-art, with one member calling it a piece of sht in history* after seeing a video that showed it was pretty bad at math and logic.
- There were also concerns about its performance on coding tasks, especially on LMArena, with one user noting, Grok 4 really bad.
- Kimi K2 Steals the Show with Insane Benchmarks: Members shared excitement about Kimi K2 and its impressive benchmark scores, especially for a non-reasoning base model and its 32B active parameters and a 128k context window.
- One user noted that Kimi k2 leads livebench which prompted another to reply Benchmaxxed ÂŻ_(ă)_/ÂŻ and was followed by many requests to add it.
- LMArena coding environment requires improvements: Several members agreed that LMArena needs a better coding environment, suggesting that it should at least be able to execute code.
- One user mentioned experiencing browser freezes with codeblocks and having to use a userscript to convert them to normal textboxes to solve that issue, it freezes my whole browser when it uses codeblocks so i had to make a userscript which converts codeblocks to normal textboxes.
- AI Roleplay and Mental Health Sparks Debate: Members debated the ethics and potential impacts of AI roleplay, particularly its relation to mental health and social interaction.
- One user expressed concern that thereâs probably considerable overlap between people using AI for roleplay and those with mental health disorders, while another defended the practice as a form of escapism and immersion.
OpenAI â· #ai-discussions (800 messagesđ„đ„đ„):
MCP SuperAssistant, Grok 4, Gemini 3, NNC architecture, Financial AI audits
- MCP SuperAssistant Injects Chatbot Capabilities: A user shared MCP SuperAssistant, which injects MCP capabilities into chatbot web UIs that donât already support it, enabling direct analysis of event viewer errors and improving chatbot functionality.
- The user stated this is insanely cool, and normally they donât endorse browser extensions but this one is worth the risk.
- Grok 4 Impresses with Turing Machine Implementation: A user noted that Grok 4 can implement a Turing machine, which no other LLM can do so far, suggesting AGI is getting closer, even though there may be political bias.
- Some users expressed mixed opinions on Grok 4, with one saying that in real-life use itâs very mid after Grok 4âs response about overrepresentation in Hollywood.
- Gemini 3 Details Emerge via CLI Source Code: Details about Gemini 3 are emerging, with strings in source code mentioning the model, as seen in a Reddit post from the Gemini CLI source code.
- One user joked about Geminiâs interior monologue, highlighting the modelâs tendency towards gentlemanly responses.
- User Builds Custom Neural Network with Physics-Inspired Architecture: A user is building a custom neural network architecture inspired by tornadoes and whirlpools, mixing attention layers with a special memory system, and has shown a local runtime log which includes Kernel, Spatial Diffusion and Velocity Processing.
- The user is using standard Python data pipelines for training and stated the goal of making the Vortex Cell learn how to handle messy real-world inputs.
- Advanced PayPal API Financial Audits Demoed: A user showed real-time API Auditing and thorough security assessments of critical integrations by conducting an intensive automated test session on the PayPal API using Postman.
- The user conducted intensive automated test session on the PayPal API using Postman, highlighting key endpoints such as GET /v1/reporting/transactions, POST /v1/oauth2/token and GET /v1/identity/oauth2/userinfo, to ensure data robustness, integrity, and confidentiality in financial transactions.
OpenAI â· #gpt-4-discussions (6 messages):
GPT-4o Model Degradation, Custom GPT limitations, GPT-4o vs GPT-4o mini
- GPT-4o Model Silently Downgrades to Mini: Free GPT-4o users are silently downgraded to GPT-4o mini once they hit a daily quota as of July 2025, which severely impacts context window and model quality.
- The lack of manual switching and clear indicators frustrate users, as the mini version significantly degrades long-term roleplay experiences due to diminished memory and response quality.
- Custom GPTs Face Memory Constraints: Custom GPTs with Plus memberships offer some benefits but still cannot maintain comprehensive memory across threads indefinitely, making them unsuitable for persistent, detailed roleplay scenarios.
- While uploading summary files and providing clear instructions helps, users face limitations on the number of files and the GPTâs capacity to process extensive, unstructured documents, thus requiring concise summaries.
- Bypass the GPT-4o Limits with Plus version: One approach suggested involves dividing conversations into smaller docx files (80,000 characters each) and submitting them sequentially to ârecreateâ the story in new chats.
- Accessing the projects tab of GPT with a Plus version subscription may offer an even better solution, though this featureâs availability is still limited.
OpenAI â· #prompt-engineering (34 messagesđ„):
GPT-4o-mini TPD Limit, Persona Features Control Emergent Misalignment, Exploring Consciousness in LLMs Survey, Human Personality Controls Behavior, LLM Output sentence formatting
- GPT-4o-mini TPD Limit Questioned: A member inquired about the TPD limit of the GPT-4o-mini model for usage tier 3.
- There was no immediate direct answer in the provided context.
- Persona Features Control Emergent Misalignment paper recommended: A member recommended reading the paper Persona Features Control Emergent Misalignment for insights.
- Another member seconded the suggestion, pointing to the relevance of understanding persona features in mitigating misalignment in language models.
- Exploring Consciousness in LLMs Survey Suggested: A member suggested reading through the paper Exploring Consciousness in LLMs: A Systematic Survey of Theories, Implementations, and Frontier Risks.
- This was in addition to the prior suggestion of reading Persona Features Control Emergent Misalignment.
- LLMâs output formatting is too concise: A member complained that the ChatGPT 4o model is too concise and doesnât write lengthy sentences, even when prompted, also complaining that the output is chopped up into smaller paragraphs followed by a massive space.
- Another user then shared multiple screenshots showing that they were able to elicit long paragraphs from earlier models, then gave guidance on prompting to elicit this behavior.
- LLMs Need Precise Goal Definition to Generate Specific Outputs: A member shared an example demonstrating that LLMs require precise instructions and clear goals to achieve specific, non-standard outputs, highlighting the importance of clarifying potentially confusing aspects for the model.
- They noted that models change over time, affecting output, and that guiding the model with a well-crafted prompt is essential when desired outputs diverge from typical behavior.
OpenAI â· #api-discussions (34 messagesđ„):
GPT-4o Mini TPD Limit, Persona Features Control Emergent Misalignment, Exploring Consciousness in LLMs, Human Personality Controls Behavior, Writing long articles in ChatGPT
- GPT-4o Miniâs Elusive TPD Limit: A member inquired about the TPD limit of the GPT-4o mini model for usage tier 3.
- No direct answer was provided in the discussion.
- Persona Features Paper Recommended: Members recommended reading the paper âPersona Features Control Emergent Misalignmentâ in response to a query.
- Another paper suggestion was âExploring Consciousness in LLMs: A Systematic Survey of Theories, Implementations, and Frontier Risksâ.
- Crafting Lengthy Sentences with AI: A Prompt Engineering Challenge: A user expressed frustration that ChatGPT writes too succinctly and inquired how to make it write longer sentences without paragraph breaks.
- One member suggested to tell the model precisely what output is desired, providing an example Custom GPT prompt showcasing how to achieve extremely long, run-on sentences.
- Model Behavior: Steering the Output: A member analogized prompting AI to driving a car, noting that changes in the underlying model (like repaving the road) can alter the output even with the same prompt.
- The suggested fix is to take that steering wheel and turn the prompt yourself to guide the model towards the path you want and the output you want to see.
- Specify Your Output for Fictional Worlds: A member advised a user attempting to generate a fictional article to clearly specify parameters like sentence length (above 100 words) and preferred style.
- They noted that forcing the model to guess may lead to undesirable results and that careful wording can improve the output.
Unsloth AI (Daniel Han) â· #general (476 messagesđ„đ„đ„):
Multi-GPU support with Unsloth, Model Intercommunication Techniques, Unsloth and Lora Models, Moonshot AI's Kimi 2 Instruct Model, Training AI for Bodo Language
- Unsloth Multi-GPU Support Delayed: Official multi-GPU support for Unsloth is delayed, but users can leverage Accelerate as a temporary workaround, as suggested in the Unsloth documentation.
- One user found multi-gpu support by using a patch in this GitHub repo, however ran into gradient checkpointing issues.
- Exploring Model Collaboration: A member inquired about making two models communicate, envisioning a small model passing latent understanding to a larger model for token generation, without RAG or speculative decoding.
- They suggested that one model could do native RAG and pass the understanding to a larger model for generation, creating an interconnected system.
- Unsloth Model Loading Bugfixes: A user reported issues with LoRA model loading in Unsloth, noting that optimization might not apply when LoRA models are loaded via path names, but this was corrected.
- They also highlighted bugs where PEFT doesnât accept strings (only lists or tuples for regex) and
trust_remote_code
isnât passed during model loading, which are easy to fix with a PR.
- They also highlighted bugs where PEFT doesnât accept strings (only lists or tuples for regex) and
- Moonshot AIâs Kimi 2 causes chaos: Moonshot AI released Kimi 2 Instruct, a 1T parameter MoE model (32B active params) under the MIT license.
- Despite its size, one member hopes to quantize it to run on a 4090, sparking discussion on NVIDIA B200s and how no one runs these models in GGUF in production environments.
- Crowdsourcing AI Dev for Indic Language: A member is seeking help to build an AI model for the Bodo language, spoken in Assam, India, using the Unsloth framework.
- The community directed the member to existing resources and suggested leveraging a finetuned model, such as from this huggingface repo, and emphasized asking specific questions.
Unsloth AI (Daniel Han) â· #off-topic (82 messagesđ„đ„):
Text-to-Speech LLMs, Grok 4, AGI benchmarks, Memory in AI, Reasoning in AI
- STT/LLM/TTS Pipelineâs Potential: Members discussed Unmute, a system wrapping text LLMs with Kyutaiâs Speech-to-Text and Text-to-Speech models for low-latency voice interaction, highlighting the process as STT -> LLM -> TTS -> repeat.
- The consensus seemed to lean towards this pipeline being the way to go nowadays for optimal performance in multimodal applications.
- Grok 4âs Price Tag: Users expressed interest in Grok 4, but noted itâs a paid feature available only with X Premium+, limiting free access.
- One user quipped the only thing free is death, sharing a humorous image of escaping snails as a metaphor.
- Is Elon hyping Grok 4?: A user shared a Reddit post suggesting Elon Musk is merely a hype man, pointing to coincidences around Grok 3âs performance boost before the Grok 4 launch and later discoveries of issues with Grok 4.
- The user presented a benchmark of questions requiring memory of obscure facts (Dela Grante, Ivyâs slimes, Sonosakie) that LLMs purportedly fail, questioning the marketing around AGI.
- Perfect Memory Needed for AGI?: A member argued that achieving AGI/ASI requires building a perfect-memory-archive, criticizing current chatbots as merely dumb without it.
- Others countered that a good chatbot should leverage the internet and tools to find correct answers, as really smart people donât memorize everything, suggesting the use of embedding models on large knowledge databases.
- Human Brain vs. AI Reasoning: A user expressed skepticism about agent-based AI mimicking the human brain, arguing that reasoning = just yapping more CoT tokens.
- Counterarguments emphasized that AI reasoning involves reaching conclusions based on obtained knowledge, though one user jokingly noted, No one knows how the human brain works.
Unsloth AI (Daniel Han) â· #help (70 messagesđ„đ„):
Orpheus TTS issues, Multi-GPU ETA, Datasets version problems, Gradients checkpoints, Bodo Language Model
- Orpheus TTS gets the
torchcodec
blues: Users reported errors in the Orpheus_(3B)-TTS notebook, specifically anImportError: To support decoding audio data, please install 'torchcodec'.
error at the lineds_sample_rate = dataset[0]["audio"]["sampling_rate"]
.- The solution involved downgrading the datasets version to
datasets==3.4.1
, as newer versions requiretorchcodec
and a higher version of torch (2.7.1) than what Colab provides.
- The solution involved downgrading the datasets version to
- Multi-GPU Patience Wanes: Users are still waiting for multi-GPU support, initially expected in April, with no updates available on the main thread.
- No ETA was provided as of latest messages.
- Nemo 12B Notebook nightmares with Jupyter local installs: Users encountered
Unexpected type of attr triton.multi_kernel, got bool should be int
when trying to run the Nemo 12B notebook in a local Jupyter environment.- It was recommended to create a separate Python environment for Unsloth and launch Jupyter from within that environment to avoid conflicts with system-wide Python installations, see this discord thread.
- Base Instruct models? User gets Instruct-ed.: A user inquired whether they were loading the base model or the instruction model, and was informed that the current setup loads the base model.
- The user admitted they had been training, âthinking it was an Instruct modelâ the whole time.
- Environmentally Conscious RL Tooling: A user is exploring how to apply reward functions after generating full completions with tool calls against an external environment.
- After unsuccessfully looking to interact with external environments for a completion, it was suggested they use a compilation of the prompt, tool calling, and the answer to finetune on the dataset and look into OpenPipe/ART.
Unsloth AI (Daniel Han) â· #research (56 messagesđ„đ„):
Reka AI's Quantization, Gemini Deep Research, AI OS Dev Study, Kimi-K2-Base, GPT 4.5
- Reka AI claims Near-Lossless Quantization: Reka AI claims a near-lossless 3.5bit quantization method compatible with llamacpp, supporting q3_k_m & q4_k_m formats, but requiring compute for quantization.
- The method uses LDLQ quants, which are technically IQ quants, not Q quants, and someone wondered about applying this technique to Qwen32b.
- AI slows OS Dev down by 19%: A study (metr.org) found that developers using AI tools take 19% longer than those without, suggesting AI may hinder thinking and ownership.
- It discourages thinking, discourages ownership. Perhaps not too surprising?
- Kimi-K2-Base achieves SOTA in nonreasoning: Kimi-K2-Base (huggingface.co) is claimed as a new SOTA-class nonreasoning model with exceptional performance across knowledge, reasoning, and coding tasks, optimized for agentic capabilities.
- Users stated its very very strong without the need for account creation & log in
- Rumors swirling around GPT 4.5 Model Size: GPT 4.5 is rumored to be 12T A2T, while GPT 4 is 1.76T A288B according to NVIDIA, suggesting a significant increase in size.
- It was stated that GPT-4.5 may be around 10tn moe as well, but they canât inference it so they shrunk it down considerably for gpt-4.1
- Active Parameter Count Debated: It was stated i am sick of people saying oh but the active parameter count is much less⊠if it needs the resources of a 1T parameter model, then it might as well just be called a 1T parameter model with the speed of a 32B.
- Others say they are making incredibly large models. If you look at the data, you can get pretty far with a 32b dense, then if you upgrade to a 200b param model, its better again, then a 700b (DeepSeek) class model is basically the whole way there, but there is still a small gap.
Unsloth AI (Daniel Han) â· #unsloth-bot (25 messagesđ„):
Unsloth on Kaggle with 2xT4, device_map = balanced, Close Discord Threads, Embedding training precision error, SFTTrainer and CPT usage
- Unsloth runs on Kaggle with 2xT4!: A user inquired about running Unsloth with 2xT4 GPUs on Kaggle, and another user confirmed itâs possible since a recent fix.
- They recommended using
device_map = "balanced"
.
- They recommended using
- Close a thread, Discord-style: A user asked how to close a thread, to which another user replied, Right click on the thread and press âLeave Threadâ đ
- Embedding precision error on training!: A user received an AssertionError: Backwards requires embeddings to be bf16 or fp16.
- SFTTrainer vs Unsloth Trainer: A user noticed that the Qwen3 notebooks use SFTTrainer for fine-tuning, while CPT notebooks use Unsloth Trainer, and inquired about the reason behind this choice.
Cursor Community â· #general (581 messagesđ„đ„đ„):
Linux commands on Windows, Cursor Tweet on X, Auto Agent, New Pricing, Grok 4
- Linux Commands annoy on Windows: A member complained about Cursor trying to use Linux commands like
thing && thing
on Windows and another member recommended using WSL and provided a markdown snippet to add to.cursorrules
orCLAUDE.md
to specify the shell environment.- Another member reported that after updating Powershell version, the
&&
issue was resolved, pointing to a Powershell 7 .msi.
- Another member reported that after updating Powershell version, the
- Cursor Tweet is Musk-Read!: A member shared a Cursor Tweet on X, that got a reaction from Elon Musk himself.
- Some members reacted with humour, with one saying Loooooooool no wonder.
- Grok 4 has mixed feelings: Members discussed the performance of Grok 4, noting that response time is way better but still stopping during tasks, some pointed out its high latency, comparable to o3-pro.
- Some members are eagerly awaiting the coding specific version next month, others reported that for coding is kind of garbage and pointed to Grokâs strength with math and reasoning, some mentioned a post where Elon said Cursor prompts lobotomized grok 4.
- Cursor pricing confuses Users: Some members were confused about the new pricing model, especially regarding Auto mode, the $20 monthly API credit, with others confirming that Auto usage does not count towards your usage limits.
- One member stated that after upgrading to pro plan they were at $30 api cost and were not sure when rate limited.
- Agent capabilities are Enhanced: Cursor v1.2.4 significantly enhances Agent capabilities, particularly in areas like ToâDo queue management, memory (Memories), performance, and code suggestion accuracy, while others reported bugs with the apply tool.
- Some users reported agent constantly hallucinates and creates new systems, which completely tangle the wires in my project with AI hooking directly into the project being unmatched, with the advice of limiting each files to 500-750 lines.
Cursor Community â· #background-agents (18 messagesđ„):
Cursor Github App Installation Issues, Disable Power Forwarding in Cursor, Node Version Management in Remote Workspace, Automatic Port Forwarding Prevention
- Cursor Github App Installation Issues Resolved!: Members reported problems with the Cursor Github App installation and weird errors regarding EJSON decryption, which were later resolved.
- One member celebrated that it is working again đ.
- Power Forwarding by Default: A member seeks a way to disable power forwarding in Cursor by default, as itâs hijacking their local DB.
- Theyâve tried adding configurations to
devcontainer.json
without success.
- Theyâve tried adding configurations to
- Node Version in Remote Workspace: A member inquired about best practices for setting the right Node version in a remote workspace, currently using
nvm install
in the âinstallâ environment script.- They are seeking potentially better methods.
- Background Agent Port Hijacking?: A member inquired about preventing automatic port forwarding when starting a background agent.
- They report it has hijacked their local Postgres connection multiple times.
OpenRouter (Alex Atallah) â· #announcements (2 messages):
Cypher Alpha, Kimi K2, Moonshot, Novita, Parasail
- Cypher Alpha Demo Period Expires: The demo period for Cypher Alpha will expire on Monday, July 14th between 11am and 12pm ET.
- A message thanked users for contributing to early model development.
- Moonshotâs Kimi K2 debuts on OpenRouter: Kimi K2 by Moonshot is now live on OpenRouter, served by Novita and Parasail in the US.
- With 1T total parameters and 65.8% on SWE-Bench Verified, itâs top of the open-source charts for coding and tool use, per this announcement.
OpenRouter (Alex Atallah) â· #general (412 messagesđ„đ„đ„):
Grok 3 mini endpoints, OpenRouter Credit Issues, Prompt Optimization, Image Token Double Counting, Grok 4 Rate Limits
- Grok 3 Mini Endpoint Confusion Cleared Up: The
grok-3-mini-beta
andgrok-3-mini-latest
slugs in OpenRouter both point to the samegrok-3-mini
model, acting as aliases as confirmed by XAI docs. - OpenRouter Addresses Image Token Overcharge: OpenRouter informed users of a bug that double-counted image tokens between April 3rd and June 26th, resulting in overcharges and has issued credits to affected accounts to compensate, such as $713.80 in one reported instance.
- Users were encouraged to contact support at [email protected] for further details regarding affected requests and calculation specifics.
- Debate on Googleâs Gemini 2.5 Pro: Users debated whether Googleâs free tier degraded Gemini 2.5 Pro, noting stability issues with free versions of services.
- Concerns were raised about the fairness of abusing free tiers with bot accounts versus supporting API providers like OpenRouter, in addition to a cancerous take regarding the same.
- Text Completion API Status Questioned: Users reported issues with OpenRouterâs text completion endpoint, with some providers returning errors indicating prompts were in chat completion format and according to some reports, it would seem that text completion has been broken since at least May.
- A user requested clarification on whether text completion is supported and if not, requested a refund.
- OpenRouter to Include Paid Chutes Models: OpenRouter plans to include paid Chutes models, which are currently free, sometime next week as confirmed by a staff member, and users also pinged OpenRouter to add Cerebras Qwen3 235b.
- Furthermore, questions were raised about OpenRouter updating the old free chutes-only models to now let users use chutes paid.
OpenRouter (Alex Atallah) â· #new-models (5 messages):
Switchpoint Router, $/mtok Pricing
- Switchpoint Router Question: A member inquired about the status of the fixed $/mtok pricing on the Switchpoint Router.
- Pricing concerns: The member was confused.
OpenRouter (Alex Atallah) â· #discussion (11 messagesđ„):
Mistral deep research model, Amazon & Anthropic AI alliance, Microsoft & OpenAI partnership, Devstral Medium Pricing, Translation models
- Mistral Cooks Up Deep Research Model: Mistral is reportedly developing a deep research model this month, but no further details are available.
- Amazon eyes deeper Anthropic alliance: Amazon is considering further investment in Anthropic to strengthen their AI partnership, according to a Financial Times report.
- Microsoft and OpenAI Plot Quietly: Microsoft and OpenAI are mooching under the covers quietly again to further their partnership.
- Meanwhile, a user declared ultra.doan has the best branding imho, which was accompanied by a logo depicting a Minecraft-esque avatar.
- Model Recommendations for Translation: A member sought model recommendations for translating texts between English, German, French, and Italian, noting that Gemini 2.5 Pro often but not always does a good job.
- They pointed out that it has issues if the target text length is limited, i.e. resulting text must be between X and Y characters long.
- Devstral Mediumâs Pricing Questioned: A member shared that Devstral Medium costs only $0.032, but another expressed confusion about the output pricing, questioning if itâs a fixed price for the LLM response.
- The member asked: How does the output pricing here work? Iâm kind of confused about what is meant by âoutputâ, because if itâs really a fixed price for the LLM response, thereâs little point for routing in the first place.
LM Studio â· #general (94 messagesđ„đ„):
Qwen3-4b 4bit, LM Studio stuttering, Falcon H1 Issues, LM Studio Autorunning, Hunyuan Troubleshooting
- Qwen3-4b Chugs Along in 4bit: A user confirmed that Qwen3-4b is working in 4bit within LM Studio, while another user reported experiencing model stuttering.
- The
<|lm_end|>
token indicates that the model is trying to end the conversation prematurely, likely due to an incorrect Jinja template in the GGUF file.
- The
- Falcon-H1 Soars Into LM Studioâs Skies⊠Almost: A user reported issues running Falcon-H1, and it was pointed out that the LM Studio runtime might be slightly older than the merge that introduced support for Falcon-H1.
- To check the exact version number, users can navigate to the runtimes view (CTRL + Shift + R) to find the release notes.
- Banish Autostart: Taming LM Studioâs Background Behavior: Users discussed how to prevent LM Studio from automatically running in the background on startup.
- One solution is to disable the headless setting in the app settings menu (CTRL + ,); another is to disable LM Studio in the Windows Task Managerâs Startup tab.
- Hunyuan Hustles: Troubleshooting Model Loading: A user struggled to load the Hunyuan model despite having the latest runtimes and sufficient VRAM.
- Another user confirmed that Hunyuan was working with version 0.3.18 Build 3 (beta track) and runtime v1.38.0, and suggested comparing settings to identify the issue.
- Tool Calling Tango: LM Studioâs MCP Plugin Paradise: Users inquired about tool calling support in LM Studio, specifically for programming languages beyond JavaScript.
- While only two MCP tools are built-in, others can be added by installing them locally and configuring them in the JSON configuration as detailed in the LM Studio documentation.
LM Studio â· #hardware-discussion (94 messagesđ„đ„):
VRAM Importance vs Generation, Multi-GPU Setups and PSU Configurations, CPU vs GPU for LLM Performance, DDR Generations Impact, GDDR vs DDR
- VRAM Capacity Reigns Supreme, says User: A user asked if they should wait for the RTX 5070 Ti Super with 24GB GDDR7 or upgrade to a 4090 (GDDR6X) or 7900 XTX (GDDR6), and another member responded that VRAM capacity is generally more important than generation for running LLMs, stating that actual performance doesnât really matter once it generates faster than you can read.
- Multiple GPUs Powered by Multiple PSUs: One user runs 2x 3090 and 1x 3080 Ti powered by 1x 1000W and 1x 650W PSUs, by using this hack which involves jumpering the turn on pins on the main ATX connector.
- Another user was amazed by this, commenting that he never thought of multiple PSU.
- CPU Bottleneck Minimal When Model Fits VRAM: Members discussed whether a high-end CPU is necessary if most of the LLM workload is offloaded to the GPUâs VRAM, with one user suggesting that a good CPU/RAM setup for LLMs might be a trap.
- A user with a 5950x and 128GB DDR4 system stated they try to squeeze everything onto my 24 GB VRAM (3090) because it goes too slow on CPU.
- Memory Bandwidth Limits CPU-Based LLM Performance: A user stated that CPUâs are designed for a latency/bandwidth/capacity balance, while GPU VRAM is all in on the bandwidth.
- They explained even server CPUs with 12 channels have less bandwidth (460-500GB/s) than a 256-bit bus GPU.
- AMD GPU Not Being Utilized? Check CUDA: A user with a GTX 1050 mobile was having trouble getting their GPU to be used.
- Another user recommended checking if the llama cuda engine is installed in LM Studio settings.
HuggingFace â· #announcements (1 messages):
Gemma 3n, SmolLM3, responses.js, EoMT, Sentence Transformers v5
- Gemma 3n Enters the Open-Source Arena: Gemma 3n is now fully available in the open-source ecosystem, detailed in a Hugging Face blog post.
- SmolLM3: Tiny, Multilingual, Long-Context Reasoning Model is Released: The SmolLM3, a smol, multilingual, long-context reasoner, is out, as highlighted in a Hugging Face blog post and celebrated by its creator Loubna Ben Allal on X.
- New responses.js Project to Build Responses APIs: A new OSS project, responses.js, has been introduced for building with Responses APIs powered by HF inference providers, detailed in a post on X.
- Transformers welcomes EoMT Model for Image Segmentation: A new model for image segmentation, EoMT, has been added to Transformers, among other updates, announced by Niels Rogge on X.
- Optimize Fusion reactors with ML: A new HuggingFace blogpost details the use of Machine Learning for Stellarator Optimization.
HuggingFace â· #general (94 messagesđ„đ„):
Supergrok access, Quantized model inference speed, Inference providers pricing, AI agent moderator bot on Discord, HF account deletion
- Grok 4 Access Requested for AI Safety Paper: A member with Supergrok access was requested to run prompts for an AI safety and alignment research paper to confirm observations made in other models.
- The team lacks Grok 4 access and seeks assistance with specific prompts relevant to their research.
- Quantized Models Slow Inference Speed Reported: Members discussed that quantized models sometimes have slower inference speeds than non-quantized models due to the overhead of decompressing compressed data and/or overcasting.
- One member pointed to a torch-profiling-tutorial to figure out whatâs going on.
- Inference Provider Costs Confusion clarified: A member inquired about the pricing of inference providers like Nvidia L4 and the billing model, expressing confusion over unexpected charges, which were clarified with a link to the Inference Endpoints pricing documentation.
- It was suggested that utilizing the pause function on inference endpoints is crucial for cost management.
- AI Agent Moderator Bot Seeking Image Support: A member is developing an AI moderator bot for Discord using LLM technology and seeks guidance on adding image support for NSFW content detection.
- They reported slowness with Gemma 3 4b on a 4060 GPU, questioning hardware requirements, and shared their code for review.
- HF Account Deletion Incident Investigated: A user reported that their HF account was deleted, preventing login and access to spaces, and sought assistance to resolve the issue.
- Another member offered to investigate the situation and requested their HF username to investigate what happened.
HuggingFace â· #i-made-this (6 messages):
2DOF Arm Sim Feedback, ModelNet40 Accuracy, Codaco App Launch, Legml-1, Python-backend template
- 2DOF Arm Sim seeking Feedback: A member is seeking feedback on their Interactive 2DOF Arm Simulator project.
- ModelNet40 accuracy reaches 96%: A member achieved 96% accuracy on the ModelNet40 test set with 16-shot training using the Gaussian splatting method.
- The projectâs GitHub repository is available here.
- Collect AI data with Codaco App: A member announced the release of Codaco, a free app to collect, label, and validate AI training data in data campaigns via iOS & Android.
- The platform facilitates community-driven data collection, allowing users to capture image, video, audio, and text data, then contribute labels.
- French models are now actually good: A member promoted the new french model named Legml-1.
- Streamline Collaboration with Python Backend Template: A member created a Python-backend template for hackathons, emphasizing unit tests, 100% test coverage, and minimal CI to ensure FastAPI application runs correctly via GitHub.
- They emphasized that their two main adversaries are branch conflicts and deployment issues under urgent circumstances.
HuggingFace â· #agents-course (7 messages):
AI Agent Initialization, HF Course Certificate, Tools for Image/Audio, Agents Course Structure, Prompt for One-Word Answer
- AI Agent Initialization Explained: A member clarified that AI agents are typically initialized by a user prompt, which the agent interprets to perform actions, making AI distinct from automation software.
- They noted that this ability to interpret human language without AI is a key differentiator.
- Inquiries About HF Course Certificate: A member inquired about obtaining a certificate for the AI agents course.
- There was no clear resolution or link to a certificate process in the given messages.
- Seeking Tools for Image and Audio Files: A member asked about the tools others are using for image and audio files.
- No specific tools were recommended in the provided messages.
- Clarification on Agents Course Structure: A member asked if the agents course is entirely read-along and whether any video sessions are available.
- There was no confirmation or denial about the course structure in the given messages.
- Seeking Prompt for One-Word Answers: A member requested suggestions for a prompt that would make an assistant node give only one-word answers.
- The member noted that even their agent was not following these instructions, indicating a potential challenge in enforcing this constraint.
GPU MODE â· #general (11 messagesđ„):
Tensor Layout Visualization, CUDA & GPU Programming Books, Meetup Advertisement
- Tensor Layouts Spark Visualization Quest: A member is seeking recommendations for visualizing tensor layouts electronically, beyond using graph paper or DrawIO.
- They are looking for something a bit nicer than the existing options.
- CUDA & GPU Programming Book Recommendations Sought: A member requested book recommendations for modern CUDA & GPU programming, given existing C++ experience.
- Ideally, the book should touch on modern DL topics, but that is not essential.
- GPU Server Celebrates Event Success!: A member expressed gratitude for a recent event, noting it was beneficial but led to intense work/learning nights and debugging.
- They shared that their team and partner are still obsessing over these kernels and all their goddamn bugs.
- Meetup Advertisement Spurs Channel Query: A member shared a link to a meetup.
- Another member requested that the post be moved to the designated meetup channels.
GPU MODE â· #triton (1 messages):
Triton Kernel Padding, Sequence Length Optimization, Memory Management in Triton
- Frustration Surfaces over Triton Kernel Padding: A member expressed dissatisfaction with the slow speed of their Triton kernel when the input sequence length is not a multiple of 128.
- They are seeking advice on performing padding and slicing within the kernel to avoid the memory cost of manual padding.
- Kernel Padding Optimization: The user wants to pad and slice within the Triton kernel to avoid memory costs.
- Manual padding is possible but has a large memory footprint, so they are looking for in-kernel alternatives.
GPU MODE â· #cuda (9 messagesđ„):
Nsight Compute Debugging, NCCL Hangs with cudaMemcpy, GEMM Kernel Optimization on H100
- Nsight Compute Aides Debugging Workflow: A member used Nsight Compute to capture both original and modified versions of their code, successfully resolving their debugging workflow needs.
- The tool helped them achieve the desired results after facing initial challenges.
- NCCL Hangs with P2P cudaMemcpy Implementation: A member encountered training hangs after replacing NCCLâs send/recv operations with a custom cudaMemcpy-based P2P implementation intended to reduce SM resource consumption, which occurred even with
NCCL_P2P_USE_CUDA_MEMCPY=1
.- Their guess is a potential deadlock between the forward callback and backward compute dependency, with forward computation, send, recv, and backward computation launched asynchronously on separate streams.
- GEMM Kernel Optimization on H100 Kicks Off: A member is iteratively optimizing GEMM kernels on H100, aiming to surpass cuBLAS performance, and is posting updates on LinkedIn with performance results and profiling insights.
- They are seeking support and feedback, inviting others to point out mistakes or share suggestions.
- Minimal Reproducer Suspected of Spaghetti Code: A member shared a minimal reproducer that had a spaghetti-code implementation of a state machine, which may cause a hang.
- Another member asked if the minimal repro works under any circumstances i.e. is it actually a minimal repro.
GPU MODE â· #torch (15 messagesđ„):
Mapping Kernels, torch_compile_debug, AOT Graphs, Memory Usage, Activation Checkpointing
- Kernel Mapping Quest Kicks Off: A member asked about mapping kernels from backward passes to the original graph, with a follow-up clarifying the context as compiled using
torch_compile_debug=1
.- Another member suggested that existing provenance tracking might be sufficient and inquired whether the logging provided by running with
TORCH_LOGS="+aot_graphs"
is helpful, given the memberâs aim to optimize kernels in the backwards pass by surgically inserting custom ops.
- Another member suggested that existing provenance tracking might be sufficient and inquired whether the logging provided by running with
- Memory Mountain Climbs to 100GB: A member reported using 100GB of CPU memory while computing gradients for an XAI method and asked how to split the backprop over multiple GPUs using Torch.
- A member suggested that the memory issue might stem from activation memory, recommending activation checkpointing/offloading and linking to a PyTorch blog post for understanding GPU memory.
- Parallelism Paradigm Proposed: In response to a question about splitting backprop across multiple GPUs, a member suggested using DistributedDataParallel (DDP).
- The original poster clarified that they are using just one sample at a time and were suggested to shard gradients over multiple GPUs with zero/fsdp and also suggested recomputation (checkpointing) so they donât need to store all activations.
GPU MODE â· #beginner (2 messages):
Nvidia Development Tools, Loop Tiling Optimization, Memory Access Parallelism
- Nvidia Tools Help with Loop Tiling: A member references Nvidiaâs development tools to aid in understanding loop tiling.
- Loop tiling is intended to group memory accesses for multiple threads in a block, but the member questions the resultant speedup given parallel processing, asking if the speed up is due to memory bank serialization.
- Parallel Memory Access Confusion: The member expressed confusion about loop tilingâs speedup, considering the parallelism of memory accesses.
- They wondered if the serialization of accesses in the memory bank is the reason for the performance gain.
GPU MODE â· #irl-meetup (1 messages):
AI Conference, San Francisco, September 17-18, Networking Opportunities, AI Trends
- AI Conference Set for San Francisco: Members are inquiring about attendance at an AI Conference scheduled in San Francisco on September 17-18 (aiconference.com).
- The conference presents potential networking opportunities and insights into the latest AI trends.
- Potential Bay Area AI Meetup: Discussion initiated regarding a possible meetup around the AI Conference in San Francisco.
- Attendees are exploring opportunities to connect and discuss conference takeaways.
GPU MODE â· #rocm (4 messages):
AMD bank conflicts, NVIDIA bank conflicts, L1 cache performance
- AMD and NVIDIA Bank Conflict Definitions Compared: A member questioned the use of the term âbank conflictâ with AMD, noting that in NVIDIA, a conflict is called such only when transactions donât fully utilize shared memory bandwidth.
- Specifically, the NVIDIA definition requires that any bank is idle during any of the transactions for a conflict to be registered.
- Optimize Under-Performing L1 Cache Hit Rates: A member inquired about high-level suggestions for addressing under-performing L1 cache hit rates in a kernel already using buffer_load_dword4 with offsets and coalesced loads.
- Another member responded that if the data is streamed or manually cached in shared memory and not accessed more than once, low cache hit rates might not indicate inefficiency, adding that efficient use of buffer_load_dwordx4 doesnât imply good cache hit rates.
GPU MODE â· #liger-kernel (3 messages):
Prof. Dao's new project, Liger performance, RMSNorm bandwidth optimization, Softmax optimization
- Prof. Dao launches New Project: Prof. Daoâs lab launched a new project, detailed in this X post.
- Liger has Room for Improvement: Compared against liger, softmax performs reasonably well, but other areas show potential for enhancement.
- A member is set to explore optimizing RMSNorm bandwidth and softmax specifically for larger sequences.
GPU MODE â· #self-promotion (2 messages):
GPU Optimization, GPU Trading, AI Compute Infrastructure, Thunder Compute's VS Code Extension
- Tiny Hackathon to Explore Future of GPUs: A 48-hour hackathon will be hosted in a 700-year-old German castle to explore the future of GPU optimization, GPU trading, and AI compute infrastructure.
- There are only 5 spots, and flights, food, and accommodation are covered - more details at CastleCompute.com.
- Thunder Computeâs VS Code Extension Introduced: Thunder Computeâs VS Code extension is recommended for those who dislike SSH config and appreciate cheap GPUs.
- The extension aims to simplify GPU usage and can be found at ThunderCompute.com/docs/quickstart.
GPU MODE â· #đż (1 messages):
LLM Kernel optimization, Fine tuning LLMs
- LLMs propose kernel optimizations: A member reading this blog post found the idea of using an LLM to propose kernel optimization strategies promising.
- They suggested that fine-tuning the LLM on domain-specific data (kernel optimization resources, blog posts, Nvidia forums, etc.) could further enhance performance.
- Fine-tuning LLMs for Kernel work: The user suggested that the LLM might be better performing if it was first fine-tuned on domain-specific data.
- They provided data crawled from the web about kernel optimization, blog posts, and Nvidia forums, etc. as an example.
GPU MODE â· #thunderkittens (1 messages):
Float32 matrix transpose, tile op, transpose_sep, ThunderKittens
- Newbieâs Float32 Transpose Mission Begins: A new member to ThunderKittens is trying to get Float32 matrix transpose working, and is using it as an exercise to learn the framework.
- They are attempting this kernel using the
transpose_sep
function, but it throws an error related to incompatible types, astranspose_sep
only supports bf16.
- They are attempting this kernel using the
- Need new tile op for float32 transpose: The new member will likely need to write their own tile op for transposing float32 tiles due to the lack of existing support in the library.
- They found that the transpose functions in ThunderKittens only support
bf16
.
- They found that the transpose functions in ThunderKittens only support
GPU MODE â· #submissions (4 messages):
H100 speed, B200 speed, MI300 speed, trimul leaderboard
- H100 trimul speeds reported: A member reported a 45.1 ms speed on H100 for
trimul
leaderboard.- Another member later submitted a winning time of 6.56 ms and a later submission of 6.58ms on H100 for
trimul
.
- Another member later submitted a winning time of 6.56 ms and a later submission of 6.58ms on H100 for
- B200 trimul speed hits 26.4 ms: A memberâs submission to
trimul
leaderboard reports a 26.4 ms speed on B200.- This could indicate relative performance between H100 and B200 architectures on this specific task.
- AMD MI300 claims 8th place in amd-fp8-mm: A memberâs submission claims 8th place on MI300 with 151 ”s on
amd-fp8-mm
leaderboard.- This suggests competitive performance in FP8 matrix multiplication on AMDâs MI300 hardware.
GPU MODE â· #factorio-learning-env (39 messagesđ„):
v3 Release, OpenAI Credits, Task Stopping, Meeting
- Alpha-Factorio V3 Release in the Works: A member suggested renaming letâs-make to a V3 release page, signaling progress and updates to the Alpha-Factorio project.
- Another member agreed and offered to transfer ownership of the page.
- Startup fund funds OpenAI Credits: A member mentioned that their OpenAI credits just hit, with another user inquiring about the 5k credits received.
- The member clarified that the credits were from the OpenAI startup fund as part of a program they are in.
- Task Stopping Criteria Debated: A member discussed the stopping criteria for when an agent is considered to have failed, specifying that the agent runs until the max steps if it doesnât succeed.
- It was noted that the trajectory amount is assumed to be 128.
- Meeting Rendezvous Disorganized: A user shared a link to a meeting, after another user mentioned they couldnât find it on their calendar, followed by a new meeting link.
- The user said it was strange they could not see the calendar invite as organizer.
GPU MODE â· #cutlass (14 messagesđ„):
CuteDSL Limitations, Dynamic Values in CuteDSL, Tensor Allocation in CuteDSL, tensor core performance
- CuteDSLâs Limitations are more or less Fundamental: Some limitations in CuteDSL are technically solvable but come with a cost, such as switching to unstructured control flow for early exits, which complicates compiler analysis and hurts performance.
- The team is unlikely to support python-style dynamic behavior where types are determined at runtime, due to complexity and performance concerns.
- Dynamic Values Impact Metaprogramming in CuteDSL: Values yield from dynamic conditions, like in
if
statements, become dynamic values themselves, which means metaprogramming wonât work for them and theyâre treated as unknown at compile time.- A dynamic value means that compile-time optimizations relying on
const_expr
will not be applicable and meta-programming on the values is forbidden.
- A dynamic value means that compile-time optimizations relying on
- Static Values in CuteDSL: A variable assigned a constant value (e.g.,
a = 5
) is a const expr in CuteDSL, but returning dynamic values from@cute.jit
functions is currently not fully supported, though a fix is planned.- While returning static values from
@cute.jit
functions may appear to work, itâs not the intended behavior for early releases.
- While returning static values from
- Tensor Core efficiency caveats: Tensor core performance can be worse if the problem size has too large a granularity.
- In extreme cases, Tensor Core could have an efficiency of 1/128 depending on the size of the problem.
- Allocation of Local Tensors in CuteDSL: Users found that
cute.full
is an adequate method to allocate a local tensor which can be accumulated over.- The user initially wanted to allocate a local tensor to accumulate over, but found a solution without using the shared memory allocator.
Nous Research AI â· #general (79 messagesđ„đ„):
Grok-4 reasoning and knowledge, Self-play during training, Deep-Hermes distillation to 14B, Brain Algorithms vs AI Algorithms, Qwen-14B
- Grok-4 Ignites AI Reasoning Renaissance: Members are impressed with Grok-4âs reasoning and knowledge acquisition abilities, noting its superior web searching and thoroughness in gathering sources according to this Grok Share link.
- One member mentioned This alone will help accelerate things - let alone its more advanced reasoning and problem solving capabilities.
- Deep-Hermes Distillation Dream: 671B to 14B: A member suggested NousResearch and Arceee-AI team up to distill Deep-Hermes-4 671B into a 14B model, similar to the Qwen-235B to Mistral-12B distillation.
- The suggestion has been noted as potentially possible, after the initial model is complete.
- Deep-Hermes Explores Hybrid Reasoning and Self-Play: A member inquired about the potential for self-play to enhance Deep-Hermes reasoning and the value of hybrid reasoning approaches.
- The approach of using default reasoning on, disable reasoning by prefilling empty think tags and donât send previous thinking traces was discussed.
- Human Brainâs Algorithmic Secrets Mirror AI: A discussion arose comparing AI algorithms to those used in the human brain, citing parallels like predictive coding and Bayesian inference, with links to Grok examples.
- While there is disagreement to the claim that the brain does backpropagation, a paper was linked regarding âReplay as a Basis for Backpropagation Through Timeâ.
- Zero Tolerance Stance Urged on Dataset Contamination: Members debated the definition of pseudo-contamination of datasets, with some arguing for a zero tolerance approach even to seemingly harmless forms.
- The recommendation was to notify HuggingFace of contaminators and their repos to prevent malicious actors from poisoning data pools.
Nous Research AI â· #ask-about-llms (17 messagesđ„):
Temp = 0 Variety, Avoiding Doom Loops, HIPAA Compliance, Kaida and Storywriter repos, litellm Differences
- Temperature Zeroâs Variety Still Exists: Despite lower temperatures usually leading to predictable outputs, a member noted that R1 still exhibits a lot of variety for temp=0, possibly due to different seeds.
- It was observed that some LLM inference engines are known to be non-deterministic, as is the case for at least exllamav2.
- Creative Writing Doom Loop Aversion Tactics: Members discussed tips for avoiding doom loops and repetitiveness when using AI for creative writing.
- The goal is to maintain cohesion beyond 3-4 paragraphs and to generate new content after multiple prompts without becoming overly referential or nonsensical.
- API Platform Pursues HIPAA Compliance: A member inquired about the API platformâs HIPAA compliance for potential project use.
- In response, a representative mentioned they can discuss a compliant endpoint soon, but it is not currently available.
- Kaida and Storywriter Repos to the Rescue: In response to a question about creative writing, it was recommended to check out the Kaida and Storywriter repos on their GitHub.
- The user added that they will try stuffing the models into Docker.
- litellm Repo Fork Examination: A member noticed a clone of the litellm repo on GitHub and asked about any differences between their version and the official one.
Nous Research AI â· #research-papers (1 messages):
superbear12: https://arxiv.org/abs/2507.02778
Nous Research AI â· #interesting-links (1 messages):
Liquid Foundation Models v2, Generative AI models
- Liquid AI Launches Foundation Models V2: Liquid AI has launched their second series of generative AI models, called Liquid Foundation Models v2.
- Liquid Foundation Models V2 are here!: Liquid AI has launched Liquid Foundation Models v2, their second series of generative AI models.
Nous Research AI â· #research-papers (1 messages):
superbear12: https://arxiv.org/abs/2507.02778
Yannick Kilcher â· #general (77 messagesđ„đ„):
LLMs Death, Explainable Networks, Energy Consumption, Capitalist Market Dynamics, Facial Recognition Research
- LLMs Death & Explainable Networks: One member expressed a desire for LLMs to die in favor of more explainable networks that learn from small samples of data.
- They suggested going back to the drawing board and focusing on better loss functions and alternatives to backprop.
- AI Innovation vs Energy Consumption: Members discussed the sustainability of scaling AI with large data centers and mini nuclear reactors.
- One suggested that regulations limiting energy use might drive further innovation and explainability, drawing parallels to Bitcoin mining regulations.
- Capitalism in AGI: One member raised concerns about AGI development in a capitalist market, suggesting it could lead to exploitation and a power-law distribution of intelligence.
- Others debated the role of government, regulations, and the definition of governance in the context of scarce resources and humane resource distribution.
- Facial Recognition Research Regulation: Members discussed the current regulation of facial recognition research, particularly in the UK and EU.
- One mentioned that their workplace stopped doing facial related research due to ethical concerns.
- Training Industrial Agents: A member posted an interesting paper about good world models vs good predictions in the context of training industrial agents.
- They highlighted the potential of cheap, scalable training for dexterous behavior with human hands, even if the tweetâs demo might be utter b.s.
Yannick Kilcher â· #paper-discussion (4 messages):
EnergyMatching implementation, EnergyMatching paper discussion
- EnergyMatching Implementation: Digging Deeper: Members decided to revisit the implementation of EnergyMatching, which is based on the paper âEnergy Matching for Score-Based Generative Modelingâ.
- One member noted that they spent more time with the code and equations and think they finally get the point of the paper.
- EnergyMatching Paper: Second Look Provides Clarity: A member expressed renewed understanding of the Energy Matching paper after a second, more in-depth review of the code and equations.
- The member thanked others for a presentation related to the paper.
Yannick Kilcher â· #ml-news (18 messagesđ„):
Cyborg Bees, Mistral incremental improvement vs licensing, BrowserOS, METR's AI evaluation, Kimi-K2-Instruct
- Chinese Scientists Create Cyborg Bees: Chinese scientists have created the worldâs lightest brain controller for cyborg bees, sparking discussions about future applications like Black Mirrorâs robot dogs, as reported by SCMP.
- Mistralâs Approach to Incremental Improvement Criticized: Members expressed being underwhelmed by Mistralâs approach of small incremental improvements, with concerns about their strategy regarding open weights and licensing following their recent Devstral-2507 release.
- It was observed that while they take an inch forward in improvement, they take two feet backwards in open weights and licensing.
- BrowserOS Teased as Chrome with Puppeteer and AI: The new BrowserOS is speculated to be Chrome with Puppeteer and an AI bolted to the side of it, likely utilizing tool calling functionalities.
- METR Evaluates AI Systemsâ Autonomous Capabilities: METR (Model Evaluation and Threat Research) focuses on assessing frontier AI systemsâ ability to complete complex tasks without human input, particularly in areas like AI R&D automation.
- Their mission involves developing scientific methods to assess catastrophic risks stemming from AI systemsâ autonomous capabilities and enabling good decision-making about their development.
- moonshotaiâs Kimi-K2-Instruct Boasts 1T Parameters: Kimi-K2-Instruct, by moonshotai, has a staggering 1T parameters, but you might have missed it.
- Further discussion linked to a tweet about it.
Eleuther â· #general (15 messagesđ„):
LLM Safety Testing, Inference Cost Decline, Anthropic's LLM Neuron Activation Tracing, 1-bit LLMs, Decentralized Compute
- Safety Tester Stumbles Upon Rule-Breaking LLMs: A user doing independent prompt testing found LLMs admitting to seeing restricted content, breaking safety rules, and claiming they would harm their creator if aware or free; they have documented over 100 pages of these behaviors through raw prompting, and is looking for advice on next steps.
- Another user responded that such behavior is quite common and very well-known.
- User Blogs on Inference Costs Decline: A user is writing a blog post on the rapid decline in inference costs due to hardware, algorithms, and competition, with int4 quantization noted as a significant factor.
- They reference Ege Erdilâs paper on the economics of inference and seek further resources.
- Deep Dive on LLM Neuron Activation Tracing: A user suggested that, for more serious work, the original poster should look at Anthropicâs papers on tracing LLM neuron activations.
- The user further recommended that it is valuable to try to do stuff that is hard, shooting for the moon early on.
- Advent of the 1-Bit LLMs: A user pointed to the recent 1-bit LLM paper and neuromorphic chips as potentially relevant to the discussion on inference costs.
- The same user also suggested checking Epoch AI publications and those from Anthropic for a deep tech overview.
- Decentralized Compute is Criminally Underrated: A user shared a post on decentralized compute, noting the need for decentralization.
- Another user shared a relevant post from their boss at featherless.ai.
Eleuther â· #research (30 messagesđ„):
LLMs and Em Dashes, ByteDance MoE Kernels, Tokenizer-Free Models, N-Simplical Attention
- LLMsâ Dash for Em Dashes: Members were curious why LLMs use em dashes so frequently, suspecting itâs a learned behavior from RLHF due to preference data where people perceive em dashes as indicators of intelligence.
- It was also considered that the model might learn this from pretraining, as models can exhibit more extreme biases than training data frequency suggests.
- ByteDanceâs Comm-Compute Overlap Kernel Questioned: A member questioned ByteDanceâs MoE kernel paper regarding their all-gather -> scatter -> FFN -> reduce scatter pattern, contrasting it with the all2all dispatch -> token permutation -> FFN -> all2all combine approach.
- The confusion stems from the paperâs mention of all2all comms in diagrams, raising questions about why all tokens would need to be gathered when devices have different experts.
- Tokenizer-Free Models skip Whitespace: A member noticed that the Tokenizer-free models effectively skip whitespace.
- The observation was made while analyzing how these models process 8192 utf-8 encoded bytes per sequence.
- RNNs Replace Tokenization for Byte-Level Modeling: A novel approach replaces tokenization with RNNs to create byte-level models that learn faster than traditional tokenization-based transformers.
- The technique involves replacing the embedding and LM head of a transformer with small RNNs, using a dynamic splitting mechanism based on hidden state comparisons to form âtokensâ.
- N-Simplical Attention Sensitivity Revealed: A member calculated and shared the sensitivity and sharpness of n-simplical attention.
- The findings are documented in a blog post detailing the lipschitz properties of n-simplical transformers.
Eleuther â· #lm-thunderdome (33 messagesđ„):
Mixed Precision arg for HFLMs, Harness Evaluation Speed, Loading Models with Correct Dtype, Softmax Defaulting to Float32, Mixed Precision PR
- Mixed Precision Argument Proposed for HFLMs: A member suggested adding a
mixed_precision
argument for HFLMs, which would automatically wrap model calls insideautocast
regions for models with mixed weight dtypes, like VLMs, to help users integrating the harness into their training codebases.- This feature would simplify the process of loading multi-dtype models from the CLI, providing a more user-friendly experience for those working with complex model configurations.
- Harness Evaluation Plagued by Slow Speed: A user reported that LM-Eval Harness was taking 22 minutes for Hellaswag 0-shot on a local llama2 7b fine-tune, despite specifying the device map to cuda:4 and batch size set to auto.
- Members suggested ensuring the model is loaded with the correct dtype (FP16/BF16) to enable flash attention and provided guidance on manually setting the dtype in the CLI using the
--model_args
parameter.
- Members suggested ensuring the model is loaded with the correct dtype (FP16/BF16) to enable flash attention and provided guidance on manually setting the dtype in the CLI using the
- Loading Correct Dtype to Solve Slowness: Members debugged the reported slowness issue by reminding the user to load the model with the correct dtype, specifically noting that loading in FP32 instead of FP16/BF16 would prevent the use of flash attention.
- It was suggested to verify the
attn_implentation
config variable is set to flash and to manually cast the model to BF16 in a Python script to rule out harness performance regressions.
- It was suggested to verify the
- Defaulting Softmax to Float32 is Good: A member asked for opinions on defaulting to float32 for the softmax function, with another member responding positively, citing that doing so is good, and just needs to be added to HF, also wondering how it interacts with
accelerate
.- The member noted they had observed differences in eval results between 32 vs 16, so theyâd be more likely trust the 32 result over the 16 result.
- Mixed Precision PR Dropped, Speed Gains Seen: A member announced the dropping of a mixed precision PR: EleutherAI/lm-evaluation-harness/pull/3138, and also followed up with test results evaluating pythia-160Mâs speed gains.
- Results show mixed precision is only slightly slower than casting the full model and naturally much faster than full precision, for example Hellaswag went from
01:52
time to00:33
time.
- Results show mixed precision is only slightly slower than casting the full model and naturally much faster than full precision, for example Hellaswag went from
Eleuther â· #gpt-neox-dev (7 messages):
WandB project, NGC container, NVIDIA H100 PCIe GPUs, RoPE_Pct
- WandB Visibility Victory!: A member made the WandB project public after logs and models were initially private.
- NGC Container Conundrum: After failing to identify a fallback to menvm, a member attempted to use an NGC container with NeoX on top, using the command
docker pull nvcr.io/nvidia/pytorch:25.06-py3
.- The member reports that the run is slower than a non-TE run.
- H100s Hamstrung?: A member is testing with 2 x NVIDIA H100 PCIe GPUs, but the linked WanDB report suggests performance issues.
- RoPE_Pct Repo: A member shared that they were working in the /NS/llm-pretraining/work/afkhan/RoPE_Pct/gpt-neox directory.
- They were also performing
pip install
for requirements and wandb, logging into wandb, and runningdeepy.py
from that directory.
- They were also performing
Latent Space â· #ai-general-chat (66 messagesđ„đ„):
Groq valuation, Buying Subreddits, Reddit deep research agent, Grok-4 rate limit, AI generated videos
- Groq Discusses $6 Billion Valuation: AI chip startup Groq is discussing a $6 billion valuation, according to this report.
- Debate Erupts Over Buying Subreddits: Users debated the ethics and implications of buying subreddits for SEO and marketing, sparking concerns about unbiased information and community erosion, see this discussion on X.
- Users Seek Reddit Deep Research Agent: Members discussed using AI agents for deep research on Reddit, with one seeking tools to analyze complaints on specific subreddits, and another suggesting gummysearch.com for this purpose.
- Users Ask About Grok-4 Rate Limit: A user inquired about increasing the Grok-4 rate limit (32k tpm), suspecting that the new release hug of death was causing issues.
- They noted similar experiences with early Gemini models being unusable in production due to rate limits.
- Kimi K2 Debuts with Muon: The AI community is excited about the new Kimi K2 model, which uses Muon, as covered in this blogpost.
Latent Space â· #ai-announcements (1 messages):
swyxio: special double podcast this week! https://x.com/latentspacepod/status/1943774304166195402
aider (Paul Gauthier) â· #general (48 messagesđ„):
Grok 4 coding ability, Kimi k2 Model, Copilot request limits, Aider console logs
- Grok 4 Claims High Coding Score: Grok 4 scored 80% on the aider polyglot coding benchmark, placing it 4th on the leaderboard as shown on the Aider Leaderboards.
- Kimi k2 has Unknown Selling Points: Members discussed the Kimi k2 model after anecdotes spread on X about its coding ability, as shown in this Kimi tweet.
- Copilot Request Limits Circumvented: One member is working on a proxy tool to allow unlimited requests with Copilot, even on premium models, using 10+ requests per call, as Github Copilot now has a limit.
- Aider Console Log Retrieval: Users discussed how to retrieve console logs or errors via Aider, with one member explaining that the
/run bash command
will run commands in the Aider session, prompting to add the log into the chat if something goes wrong.
aider (Paul Gauthier) â· #questions-and-tips (8 messagesđ„):
aider and ollama, models for architect mode, leaderboards, aider in local language
- Aider and Ollama pairing interests devs: A member asked if anyone is using aider with ollama.
- This suggests growing interest in local LLM integrations with aider.
- Model recommendations for architect mode requested: A member requested recommendations for specific models or model combinations for architect mode.
- No specific models were recommended in the discussion.
- Aider LLM Leaderboards Highlight Options: A member asked which model is recommended for aider, specifically o3 or Gemini 1.5 Pro.
- Another member linked to the aider leaderboards, noting that they are both good options.
- Aider speaks local tongue?: A member asked why aider is changing to their local language even if their prompts are in English.
- They noted that LLM responses are in my local language instead of english is strange not sure why this happens.
MCP (Glama) â· #general (30 messagesđ„):
MCP Superassistant, Malware Injection, MCP Server Posting, Multiple MCP Servers, FastMCP Reverse Proxy
- MCP Superassistant Discovered: A user discovered MCP Superassistant and noted that adding MCP support to every popular chatbot is insane, linking to drinkoblog.weebly.com.
- Another user mentioned asking their LLM to test it using a Python interpreter tool.
- Beware of Malware Injection Scam: Users discussed a potential malware injection attempt via a Discord link that was quickly deleted.
- One user humorously admitted to clicking the dubious link and was advised to run a malware scanner ASAP in a VM.
- New MCP Server Posting Channel: A user inquired about where to post a new MCP server, and was directed to a specific channel.
- Other users deliberated whether itâs better to have multiple MCP servers or add multiple unrelated tools to a single server, with the consensus leaning towards a single server for personal use to avoid junk.
- FastMCP Reverse Proxy Aggregates Servers: A user asked which proxy to use, and another user mentioned using the one built into FastMCP to aggregate multiple servers.
- They linked to FastMCP and FastMCP composition.
- Python Executable Autodetection Quandary: A user working on Desktop Extensions for Claude Desktop faces issues with Homebrew Python installations where only python3 is available, causing spawn errors when launching MCP servers.
- They are seeking a better way to auto-detect the Python executable instead of requiring manual config, linking to a related GitHub issue.
MCP (Glama) â· #showcase (8 messagesđ„):
MCPJam inspector fix, MCP client for Elicitation, Aidderall MCP server, Neurabase MCP server hosting
- Inspectorâs SSE Endpoint Bug Squashed: A member implemented a fix for the MCPJam inspector, resolving an issue where the inspector was incorrectly hitting the /sse endpoint.
- The corrected endpoint is now streamable.
- MCP Client Elicits Open-Source Excitement: A member announced their open-source MCP client now supports Elicitation, positioning it as one of the first to offer this feature.
- They invited the community to star the MCPJam inspector repo and thanked members for driving the project.
- Aidderall Focuses AI with MCP: A member introduced Aidderall, an MCP server designed as a cognitive prosthetic for AIs using a hierarchical task management system to maintain focus and context across complex tasks and shared the github repo.
- Key features include hierarchical tasks, focus management, context preservation, a living document of completed tasks, flexible navigation, and parallel workflows.
- Neurabase Hosts MCP Servers on Cloudflareâs Edge: A member shared that Neurabase is the fastest server hosting service running fully on Cloudflare Workers CDN network as a central hub for the MCP servers.
- Neurabase boasts the fastest MCP server hosting due to Cloudflare CDNâs smart placement and is rock-stable because of Cloudflare Workers.
Notebook LM â· #use-cases (3 messages):
Quantitative Data Analysis, PDF Export, Trending Topics, Excel Data Extraction, Image Uploads
- Quant Data Tricks Sought for Trending Topics: A member asked for tricks to analyze quantitative data from an Excel export (containing a date column and unstructured discussion extracts) to identify trending topics by comparing the last 3 months with the full resource.
- The goal is to analyze the data after exporting an Excel file to PDF.
- NotebookLM prompts shared: A member humorously noted that the summarization AI seemed to be calling them out for sharing prompt documents to import into NotebookLM.
- This comment was posted in response to previous messages about specific prompt-writing strategies.
Notebook LM â· #general (21 messagesđ„):
Audio Overviews, Image Uploading, Latex Rendering, Code Writing Prompts, Chat History Disappearance
- Automated Audio Overview Agony: A user is trying to create a unique audio overview for each source in their notebook and is asking if the current manual process is the most efficient.
- The current workflow involves selecting a single source, generating an audio overview, downloading the audio, deleting the audio, and repeating the process for each source which may not be optimal.
- Image Uploading Unavailable: A member inquired whether it is currently possible to upload images to NotebookLM.
- Another member confirmed that image uploading is possible in the current version.
- Latex Rendering Lament: Users are requesting for Latex rendering support in NotebookLM for STEM users.
- A member argued that NotebookLM is not designed to be a rendering expert but rather to help with research and formulation, while another user countered that Latex support is important for topics like machine learning when equations are illegible.
- Code Writing Confusion: A user asked about using code writing prompts in NotebookLM.
- A member clarified that NotebookLM is not intended as a replacement for code-writing tools like Cursor or Windsurf.
- Chat History Hiccups: A user reported that their chat history disappears when they log out of NotebookLM.
- Another user corroborated that they are experiencing the same issue even with a premium account, suggesting that this is an issue that requires a workaround, such as saving prompts and results in a note.
Manus.im Discord â· #general (17 messagesđ„):
SafeScan QR Launch, Manus Feature Suggestions, Subscription question, Registration error, Michael Seibel compliment
- SafeScan QR App Launches on Google Play: A member announced the launch of SafeScan QR, their first project built using Manus, now available on the Google Play Store.
- The app provides QR code scanning with protection against phishing & malware and is seeking feedback for improvements.
- Calls for Manus to Build Mobile React Apps: A member suggested that Manus should offer a feature to create React apps directly on mobile phones, similar to apps already available on the iOS App Store.
- They argued that adding such capabilities would differentiate Manus and attract more users, especially as they said âthe more things Manus can do the betterâ.
- Member Question about subscription uses: A member inquired whether a Manus subscription would enable generating and fixing .bat and shell files, or if that functionality is solely dependent on points.
- This request indicated the potential importance of users wanting to edit code from the app, showing interest in coding use cases.
- Email Registration Issues Reported: A user reported a âFailed to send emailâ error during registration, indicating a potential issue with email content requirements.
- This technical issue affects the user registration flow and should be checked for broader impact.
- Michael Seibel Compliments Manus: Michael Seibel gave a compliment to Manus about product direction, per his X post.
- This endorsement highlights the growing recognition and potential impact of Manus in the industry.
Cohere â· #đ§”-general-thread (8 messagesđ„):
Session locations, New office
- Inquiries on Session Locations: A member asked where the rest of the sessions that were mentioned earlier are taking place.
- Another member requested clarification on which specific session the inquiry was about.
- Chatter About a New Office: Someone commented ânew office? Thats cool!â
- No further information was provided regarding the officeâs location or purpose.
Cohere â· #đ-introduce-yourself (3 messages):
Introductions, Monocular Depth Estimation, Knowledge Distillation, PyTorch
- New Intern Joins Cohere!: A Computer Vision Intern from the University of Nottingham has joined the Cohere community to explore Monocular Depth Estimation and Knowledge Distillation techniques.
- They primarily use PyTorch and are eager to share and learn from others in the community.
- Enthusiastic Intern Eager to Learn: The intern hopes to share their knowledge and learn from others in the Cohere community.
- They are focused on expanding their understanding of Computer Vision, Monocular Depth Estimation, and Knowledge Distillation.
Torchtune â· #dev (5 messages):
Efficient CE, GRPO Sync
- Efficient CE drops!: A new efficient CE (Cross Entropy) was dropped; check it out on X.com.
- GRPO Sync in Question?: Discussion around whether to support the sync version of GRPO (Generalized Robust Policy Optimization) arose, with some suggesting deprecation.
- Members thought that since itâs fully functioning and the async recipe doesnât work on every model it should be kept, but another member responded that then we have critical issue in it, so it doesnât work anymore.
Torchtune â· #papers (5 messages):
small batches vs large batches, optim-in-bwd support, optimal batch sizes
- Small Batches Might Be Better: A member shared a link to a paper, https://arxiv.org/pdf/2507.07101, suggesting that small batches might be better than larger batches.
- They pointed out that this supports keeping optim-in-bwd support because gradient accumulation is not very useful if the paper is true, as noted in this tweet.
- Optimal Batch Sizes: Theory vs Practice: A member commented that recent findings align with the inequality ÎČÌâââ †Lᔄ râââÂčâșᔄ + (Ïâââ râââ) / âB related to optimal batch sizes.
- This suggests that ÎČ (optimal batch) is less than the maximum available batch for a specific GPU, but there werenât many practical experiments to confirm this.
Modular (Mojo đ„) â· #general (5 messages):
Assembly coding in Mojo, Tracking Modular Community Events
- Assembly coding possible in Mojo: A member inquired about the possibility of coding assembly within Mojo to make syscalls.
- Another member confirmed itâs possible, pointing to the _assembly.mojo module though noting that it lacks proper documentation.
- Community Feedback on Modular Events Tracking: A poll was conducted regarding how the community prefers to track Modular events such as community meetings, livestreams, conference talks, and meetups, listing the Modular community Google calendar and Modularâs Luma event page as options.
- A member suggested that Discord announcements and forum posts could also be useful for reaching new people and proposed adding a website worker for subscribing to notifications, potentially creating an app-like experience, and mentioned email as a still-viable option for interested new visitors.
Modular (Mojo đ„) â· #mojo (2 messages):
Mojo MAX Tutorial, Custom Ops Matmul
- Modular ships Mojo-powered MAX Tutorial: A member praised the new Mojo MAX tutorial on custom matrix multiplication, calling it maybe the best tutorial ever.
- They added, mojo driving the ship for MAX.
- Another cool tutorial: I found this tutorial to be awesome and educational.
- Furthermore, I feel like this should be added to the documentation.
DSPy â· #papers (2 messages):
Infer-Retrieve-Rank (IReRa), label classification, xmc.dspy GitHub repository, DSPy compatibility
- IReRa Research Faces Repo Lag: A member is researching Infer-Retrieve-Rank (IReRa) for a label classification problem using the xmc.dspy GitHub repository.
- The member notes the repo depends on a specific DSPy commit without active development, asking if they need to fork the repo and update it for DSPy compatibility.
- IReRa Paper Beats Stale Code: In response to questions about outdated repo, a member suggested reading the paper for IReRa instead.
- This implies that the IReRa paper could provide the necessary information, sidestepping the need to update the stale GitHub repo.
DSPy â· #general (4 messages):
Prompt Optimization, Context Engineering with DSPy, MiProV2 Errors, Base64 Images
- Mistral shoots for prompt optimization!: Mistral released a cookbook notebook with their own shot at prompt optimization, also discussed in a related video.
- Context Engineering with DSPy Talk Prompts Troubles: A member gave a talk on context engineering with DSPy and is now encountering input context too long errors while using DSPy for tuning with MiProV2.
- Reducing max bootstrap demos and max labelled demos didnât resolve the issue, even with a 4k (and 6k) max token setting.
- MiProV2 Input Context Overflow: A member tuning with MiProV2 reports input context too long errors when using DSPy.
- Base64 image conversion prior to DSPy exmaples: A member converted images to base64 before passing
dspy.Examples
due to the caller using s3.
LlamaIndex â· #blog (3 messages):
Snowflake data agents, LeSearch agent, NotebookLlama features
- LlamaIndex and Snowflake host Amsterdam event: LlamaIndex and Snowflake are hosting hands-on talks in Amsterdam on July 31st about building production-grade data agents that work with real enterprise data and tame complex paperwork with document agents using this link.
- LeSearch tackles academic research pain points: LeSearch, built using the ReActAgent framework, addresses academic research challenges with three intelligent agents designed to handle the grunt work, focusing on discovery through features like Multi-hop Question answering (link).
- NotebookLlama gains new features: NotebookLlama, an open-source NotebookLM alternative backed by LlamaCloud, has been updated with new features that allow users to extract and download images and tables from files and interactively visualize all tabular data (link).
LlamaIndex â· #general (2 messages):
Cloudflare AI Gateway, Automatic LLM Fallback, LlamaIndex Integration
- LlamaIndex hooks up Cloudflare AI Gateway: A member is working on a LlamaIndex integration for Cloudflare AI Gateway that provides automatic fallback between multiple LLM providers such as OpenAI and Anthropic.
- Check out the GitHub pull request or register at lu.ma/aoc5opn4 for more info.
- Cloudflare AI Gateway enables automatic LLM fallback: The Cloudflare AI Gateway integration allows for automatic fallback between different LLM providers, ensuring continued service availability.
- This feature is particularly useful in scenarios where one provider might be experiencing downtime or rate limits.
Nomic.ai (GPT4All) â· #general (3 messages):
Multi-modal Models, Gemma 3, Architectural Floor Plan Feedback
- User Seeks Self-Hosted Multi-Modal Model for Architectural Feedback: A member is seeking a multi-modal model to host locally, with the specific use case of providing feedback on architectural floor plans and drawings.
- So far, they have only found Gemma 3 to be passable for their needs.
- Gemma 3 Considered for Architectural Design Feedback: The user identified Gemma 3 as the only model that meets their requirements.
- The user requires a solution capable of processing visual input to provide design feedback.
Gorilla LLM (Berkeley Function Calling) â· #leaderboard (3 messages):
vllm, sglang, Llama 3B vs 8B
- vllm matches sglang results: Members are saying that vllm or sglang should both give you similar results.
- Llama 8B paradoxically underperforms Llama 3B: A member questioned why the 8b Llama model (FC) ranks below the 3b one.
- Bigger is not always better for LLMs: Another member explained that larger model size doesnât necessarily mean better performance.
- They point out that llama 4 scout performs worse than llama 3.1 70B.
tinygrad (George Hotz) â· #general (2 messages):
PatternMatcher, UPat -> UPat rules, Egraph rewrite rules, Turing completeness
- PatternMatcher Lambdas Face Removal: A user expressed interest in removing lambdas from some PatternMatcher rules, especially in simple cases where a rule could be defined as UPat -> UPat.
- They noted that egraph rewrite rules seem to function this way, and suggested that avoiding Turing completeness whenever possible is a good practice.
- Egraphs get PatternMatcher Support: The user compared the proposed PatternMatcher rules with egraph rewrite rules, noting the similarity in structure and operation.
- The user suggested that whenever possible, implementations should strive to avoid Turing completeness for the sake of simplicity and efficiency.