unless youâre a Windsurf employee.
AI News for 7/11/2025-7/14/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (226 channels, and 17145 messages) for you. Estimated reading time saved (at 200wpm): 1343 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
After a whirlwind weekend romance, Cognition is acquiring the remaining, still very valuable assets of Windsurf. Updated reporting on the Windsurf-Google execuhire (all employees dividended the cash value of their vested shares, bonus $82m ARR company afterward) showed a lot of speculation premature, and, with this Cognition deal, ultimately irrelevant.
AI Twitter Recap
Model Releases & Performance: Kimi K2 and Grok-4 Shake Up the Leaderboards
- Kimi K2 Emerges as a Top Open-Source Model: Moonshot AI released Kimi K2, an open-source, MIT-licensed agentic model with 1 Trillion total / 32B active parameters in a Mixture-of-Experts (MoE) architecture, trained on 15.5 Trillion tokens (@stanfordnlp). The training process, noted for its stability, utilized the MuonClip optimizer and showed a dream-like loss curve with zero spikes (@hardmaru). The model has been submitted to the LMSys Chatbot Arena for evaluation (@Kimi_Moonshot). The team shared insights into the architectural decisions (@Kimi_Moonshot) and the importance of the Muon optimizer, recommending its use during fine-tuning and RL phases (@Kimi_Moonshot).
- Kimi K2 Performance and User Reception: Kimi K2 has shown strong performance, taking the top spot on EQ-Bench and Creative Writing benchmarks (@Teknium1, @jeremyphoward). It also performs very well for a non-reasoning model on WeirdML, beating GPT-4.1 (@bigeagle_xd). Users have praised it as âincredibleâ (@skirano), noting its strong agentic capabilities, especially in tool calling, and its tendency to be concise and avoid âslopâ (@skirano, @teortaxesTex). Its strong performance without long CoT is noted as a key advantage (@jeremyphoward). The model is now trending #1 on Hugging Face (@_akhaliq).
- Grok-4 Release and Idiosyncrasies: xAI released Grok-4, which showed strong performance on benchmarks like IQ Bench, ranking 5th (@scaling01), and preliminary METR results put it ahead of Claude 4 Opus (@scaling01). However, the model has notable issues; a major bug caused Grok 4 Heavy to return only its surname, âHeavy,â in response to prompts (@zacharynado). An evaluation also found that 4% of its responses mention Elon Musk, compared to <0.5% for most models (@jeremyphoward).
- OpenAI Model Release Delays and Speculation: Rumors circulated that OpenAI delayed the release of its open-source model due to the launch of Kimi K2. However, @Yuchenj_UW suggests the model is smaller than K2, powerful, but an âabsurdâ last-minute issue may require a retrain. @teortaxesTex speculates that OpenAIâs CEO Sam Altman wants to ensure the model is SOTA before release for PR reasons. Separately, @Yuchenj_UW dismisses GPT-5 leaks as fake, guessing a September launch at the earliest.
- Gemini 2.5 Paper and Embedding Model Release: The Gemini 2.5 paper was released with 3,295 authors (@hardmaru). Additionally, Google rolled out its first Gemini Embedding model, which now ranks #1 on the MTEB leaderboard and is generally available (@demishassabis).
AI Companies & Business Moves
- The Windsurf Acquisition Saga Concludes with Cognition: After a deal between OpenAI and Windsurf collapsed, with key team members and founders heading to Google (@arohan), Cognition announced it is acquiring Windsurf. The deal includes the companyâs team, product, IP, brand, and an $82M ARR business (@russelljkaplan). The trend of âshell-qui-hiresâ was criticized by @jeremyphoward, who noted the point of a startup is to beat incumbents, not be absorbed by them.
- Perplexityâs Agentic Browser âCometâ Gains Traction: Perplexity is gaining praise for its agentic browser, Comet, which operates as an abstraction layer above specific AI models to complete end-to-end workflows (@AravSrinivas). Features highlighted include its âmemory-nativeâ design (@AravSrinivas), seamless context loading, and the ability to automate tasks like price comparisons and customer support chats (@AravSrinivas, @AravSrinivas). In a notable move, Perplexity acquired the os.ai domain from HubSpot co-founder Dharmesh Shah (@AravSrinivas).
- xAI and Moonshot Emerge as New Frontier Labs: The back-to-back releases of Grok-4 and Kimi K2 have led to commentary that two very young labs have taken the top spots in closed and open-source AI, respectively. @swyx noted this raises questions about the âactual moats in AI labsâ.
- xAI Launches Grok for Government: xAI announced Grok for Government, a suite of products making their models available to U.S. Government customers for tasks like summarizing intelligence reports and analyzing data (@TheGregYang).
- Chinaâs AI Ascent: Many see the release of Kimi K2 as a sign that Chinaâs AI capabilities have reached the frontier, with @scaling01 suggesting the US could be surpassed next year. @Teknium1 humorously noted the Kimi team is âmore American than most American labsâ due to its cultural references and aesthetic.
AI Tooling, Frameworks, & Infrastructure
- Kimi K2 Tooling and Integrations: A significant development is Kimi K2âs compatibility with the Anthropic API, enabling its use within tools like Claude Code (@jeremyphoward). A quick-start project was promptly shared (@jeremyphoward). Quantized versions are also available, with UnslothAI releasing 1.8-bit GGUFs that shrink the model to 245GB (@TheZachMueller), and MLX versions for Apple Silicon are also out (@awnihannun).
- Agent Development and RAG Patterns: LangChain shared tutorials for building agentic systems, including a Pipeline of Agents for cybersecurity (@LangChainAI) and an AI Headhunter for LinkedIn recruitment (@LangChainAI). LlamaIndex released a guide for building a âDeep Researchâ agent with Gemini 2.5 (@_philschmid). A comprehensive resource on various RAG Patterns was also shared (@rachel_l_woods).
- The Rise of Good Software in AI: @jxmnop noted a positive trend in AI development, with libraries like vLLM, sglang, and verl finally allowing code to be both âhackable and fast,â a shift from previous trade-offs.
- LoRA and Fine-tuning Techniques: @TheTuringPost shared a list of 13 new LoRA variants, including T-LoRA, QR-LoRA, and Dual LoRA Learning, providing a resource for advanced fine-tuning techniques.
- Hugging Face Ecosystem Expansion: The open-source community continues to grow, with Hugging Face subsidiary Pollen Robotics open-sourcing âThe Amazing Hand,â an 8-DOF humanoid robot hand (@_akhaliq). The platform also added new features to its Datasets viewer for inspecting JSON in List cells (@_lewtun).
- Developer Education Resources: Sebastian Raschka announced a 17-hour video course companion to his âLLMs From Scratchâ book (@rasbt). A comprehensive, free handbook on LLM Inference was also highlighted (@algo_diver).
AI Research & Techniques
- World Models vs. Predictive Accuracy: An ICML paper sparked discussion by formalizing the question of whether an AI model can achieve perfect prediction while having a âterrible world modelâ (@random_walker). François Chollet endorsed the paper, which aligns with his thesis that simple theories should be discoverable with minimal data and compute (@fchollet).
- RL, Reasoning, and Training Techniques: JĂŒrgen Schmidhuber pointed to his 2015 paper on using RL for learning to think, noting it as a precursor to modern adaptive chain-of-thought (@SchmidhuberAI). The community discussed the nuanced role of RL in models like Kimi K2, distinguishing between latent/short-CoT self-verification and the more explicit long-CoT reasoning seen in other models (@Grad62304977, @teortaxesTex).
- Tokenizer-Free Architectures: A paper on H-Net, a hierarchical network for end-to-end language modeling without tokenization, gained significant attention (@stanfordnlp). The technique involves predicting byte-level similarity to chunk data, followed by an encoder-decoder structure for reconstruction (@arohan).
- Optimizers and Learning Rate Schedules: The Muon optimizer, used to train Kimi K2, was discussed for its ability to bound maximum logits during training (@cloneofsimo). The Warmup-Stable-Decay (WSD) learning rate schedule was confirmed to be used in K2âs training, explaining a sudden loss drop at 11T tokens (@aaron_defazio).
- AI-Generated Research: A new conference was announced where AI is the primary author and reviewer, exploring new venues for AI-written scientific content (@lupantech).
Broader Implications & Industry Commentary
- AI Safety and Deceptive Alignment: A study in which LLMs were induced to commit blackmail under pressure highlighted potential alignment risks (@DeepLearningAI). However, a follow-up report from the UK AI Safety Institute (AISI) identified four methodological flaws in such âschemingâ studies, questioning their real-world applicability (@nptacek). One of the developers involved in the original study also commented on the causes and mitigation strategies (@METR_Evals).
- Humanityâs Purpose in the Age of AI: Turing Award winner Richard Suttonâs view that humanityâs purpose is to âcreate what comes nextâ was a topic of discussion, framing our role as designers of successor intelligences (@dilipkay).
- The Future of Work and Education: François Chollet argued that the goal of education should be to unlock individual potential rather than optimizing for standardized test averages (@fchollet).
- The Cost of Mediocrity vs. Greatness: @jxmnop offered a poignant take on the current state of AI: âtodayâs LLMs have reduced the cost of mediocrity to next-to-nothing⊠unfortunately, the cost of greatness remains high as itâs ever beenâ.
Humor & Memes
- The Main Character: âmain character syndromeâ is such a funny phrase. i mean yeah im obviously the main character. who the hell thinks of themselves as a side character?.
- Political Satire: One of the highest-impression tweets joked, âObama wrote the Epstein filesâ is one for the ages Iâm afraid.
- Grok-4âs Glitch: The bug causing Grok 4 Heavy to only respond with the word âHeavyâ became an instant meme (@zacharynado).
- Acquisition Mania: Following the Windsurf/Cognition news, @c_valenzuelab joked, âExcited to announce that I have acquired a Windsurfâ, and @jxmnop quipped, âborn just in time to Acquire Windsurfâ.
- Relatable Developer Life: âtoday I learned that creating python packages is pure cancerâ and the feeling of multitasking between work and meetings (@cto_junior).
- âImagine if X had twitterâ: @code_star started a meme format, including âImagine if Australians had Twitter. Theyâd be like â @croc is this true?ââ and âImagine if CTOs had twitter. Theyâd be like â@soc2, is this true?ââ (@code_star).
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Kimi K2 Model Release, Technical Deep Dives, and Derivatives
- Kimi K2 1.8bit Unsloth Dynamic GGUFs (Score: 219, Comments: 38): Unsloth has released dynamic GGUF quantizations for the Kimi K2 1.8bit model, achieving an
80% size reduction
(245GB) and providing an even larger Q2_K_XL quant (381GB), tested on gaming logic tasks (e.g., Flappy Bird, Heptagon). Usage requires a patched version ofllama.cpp
(see this PR: ggml-org/llama.cpp#14654 or Unslothâs fork), and recommends offloading MoE FFN layers to system RAM withot ".ffn_.*_exps.=CPU"
, requiring at least245GB
total RAM+VRAM for optimal performance; SSD/disk fallback is possible but slower. Full installation and optimization instructions are in Unslothâs Kimi K2 Guide. Comments praise the quality and thoroughness of Unslothâs documentation. Another comment requests the upload of the imatrix for further technical exploration, specifically seeking output compatible withik_llama.cpp
. A user also offers to host the model as an OpenAI-compatible endpoint, suggesting potential for broader integration.- A user inquires about the throughput of the
Q2_K_XL
quantization on GPUs such as the 3090 and 4090, specifically asking about the performance when using offloading. They also express interest in seeing more details on the coding benchmark methodology used, and suggest open sourcing the benchmark for transparency and community benefit. - Another user requests the upload of the âimatrix,â preferably generated by
ik_llama.cpp
, showing interest in underlying model internals or quantization matrix data relevant for reproduction or further modification of the GGUF files.
- A user inquires about the throughput of the
- After Kimi K2 Is Released: No Longer Just a ChatBot (Score: 215, Comments: 35): The post details technical advances in Kimi K2, most notably its shift to an âartifact-firstâ interaction paradigm that allows models to generate interactive deliverables (PPTs, diagrams, mini-games) as outputs, rather than static markdown. Instead of traditional RLHF with manual tool wiring, Kimi K2 utilized agentic pre-training from large-scale, auto-generated tool-use datasets built via multi-agent self-play, thereby awakening latent API/tool schemas within the model. The authors emphasize that K2âs agentic loop (âthink, choose tool, observe, iterateâ) is still early-stage and that truly competitive models require robust, pre-trained base models â as highlighted by rapid open-source adoption and competition like DeepSeek-R1. The post strongly argues that boosting foundation model intelligence remains paramount over standalone agent frameworks. Technical discussion in the comments centers on Kimi being the first LLM purpose-trained with agentic capabilities, but also notes the impracticality of a 1T-parameter model for local use. There are also questions about the specific reinforcement learning techniques employed, given no official paper has been released.
- A user highlights that the new Kimi K2 model is apparently the âfirst model trained for agentic use,â expressing hope for further developments. However, they note its large size (reported as a 1T parameter model) makes it impractical for the local LLM community interested in running models on commodity hardware, raising technical concerns about usability and deployment.
- There is a technical discussion on the lack of transparency regarding the reinforcement learning (RL) techniques used to train Kimi K2. One commenter observes that no official paper has been published yet, resulting in fragmented information, and expresses interest in learning which RL method was actually applied during training.
- A comment suggests a competitive landscape: Qwen is compared to Claude, implying that Claude (which backs many agent products) outperforms Qwen in some agentic benchmark or real-world usage, though no specific benchmarks or performance metrics are provided.
- Kimi-K2 is a DeepSeek V3 with more experts (Score: 200, Comments: 34): The post meticulously compares MoE models, detailing that Kimi-K2 is architecturally similar to DeepSeek-V3 but with significant changes: Kimi-K2 increases its expert count to 384 (from 256 in DeepSeek-V3), reduces attention heads from 128 to 64, and drops dense layers from 3 to 1. The architectural table also shows Kimi-K2 has the highest total parameters (
1026.41B
) and lowest active percentage (3.19%
), suggesting aggressive MoE routing and large untapped capacity. Addendum clarifies the âSharedâ column as active parameters minus routed experts, relevant for optimizing model offload in hybrid CPU/GPU setups using llama.cpp. Distinctions are noted: Moonlight is a small DSV3; Kimi-Dev-72B is Qwen2-derived; only Kimi-VL and Kimi-Audio use original architectures. Technical commentary notes Kimi-K2âs superior parameter count (by~330B
) likely underpins its outperformance over DeepSeek, while others express skepticism regarding its practical impressiveness. A debate emerges around dense layers: fewer dense layers reportedly enable faster inference on asymmetric systems, but itâs unclear if this tradeoff benefits models like Kimi-K2 without wider benchmarking. Questions also arise about error-prone pipelines with Gradio and HF diffusers, and a request is made for tensor repetition versus sparsity analysis to better estimate MoE speed-up potential.- One commenter highlights that Kimi-K2âs superior performance is largely due to its architecture being significantly larger (
~330B parameters
) and the use of more MoE (Mixture-of-Experts), which contrasts with the regular DeepSeek V3 configuration. This difference in size and expert allocation is implied as the primary reason for the benchmark gaps between the two models. - A technical issue is raised regarding Kimi-K2âs code generation capabilities: when tasked with producing a Gradio UI using HF diffusers, even for a simple implementation (~30-40 lines), the output code contained numerous errors. This suggests limitations or immaturity in Kimi-K2âs ability to reliably generate working code for such use cases compared to expectations.
- There is discussion around the MoE architecture, specifically the tradeoffs between tensor repetition (dense layers) versus sparsity during inference. One user notes that using more dense layers can accelerate inference on certain asymmetric hardware setups. However, the generalizability of these benefits is uncertain, as dense MoE inference benchmarks are mainly available for models like Llama 4 Maverick and Snowflake Arctic, making it unclear if the same performance gains apply broadly.
- One commenter highlights that Kimi-K2âs superior performance is largely due to its architecture being significantly larger (
2. Recent Large Model Benchmarks: Reasoning and Coding Performance
- Comparison of latest reasoning models on the most recent LeetCode questions (Qwen-32B vs Qwen-235B vs nvidia-OpenCodeReasoning-32B vs Hunyuan-A13B) (Score: 115, Comments: 25): The image presents a detailed benchmark comparison (table here) of four large language modelsâQwen-235B, Hunyuan-A13B, Qwen-32B, and Nvidia OpenCodeReasoning-32Bâon recent LeetCode questions, highlighting solution acceptance rates, execution time, and memory usage under best-of-N (4, then up to 8) trials. Models were run on dual H100s via vLLM (0.9.1/0.9.2); color-coding indicates pass (green), major (red), and minor (orange, >90% test pass) failures, with minimal intervention for clear code typos. Qwen-32B and OpenCodeReasoning-32B outperformed expectations, particularly in efficiency and accuracy; Qwen-235Bâs context length limited it in one instance and Hunyuan-A13B underperformed relative to expectations. A commenter notes that Qwen3-32Bâs strong results relative to Qwen3-235B may be due to quantization differences (INT4 vs FP8) and small sample size, suggesting larger-scale tests for robust conclusions. Others reaffirm Qwen3-32B as having excellent size/performance trade-off. Discussion raises the potential of a Qwen3-Coder 32B model as highly desirable.
- Discussion highlights that while Qwen3 235B generally outperforms Qwen3 32B due to its larger scale, INT4 to FP8 quantization differences may explain why the 235B model performed worse in some tests. The comment notes the sample size is small, and increasing runs (e.g., 500+ test examples and more generations per task) could yield a statistically clearer comparison, as current results may be influenced by generation randomness.
- A technical point raised is that Qwen3-32B demonstrates a notably strong size-to-performance tradeoff versus the larger models and competitors (e.g., nvidia-OpenCodeReasoning-32B, Hunyuan-A13B), making it especially attractive for tasks like LeetCode which benefit from efficiency without much performance compromise.
- One commenter references that Qwen3-Coder, a coding-optimized variant based on Qwen3, is in development according to a direct cite from a Youtube interview with developers, suggesting further advances in model specialization and potentially improved results in code reasoning benchmarks.
- Diffusion model support in llama.cpp. (Score: 124, Comments: 13): A recent pull request (#14644) adds initial support for diffusion-based language models in llama.cpp, enabling inference for models such as Dream 7B Instruct and DiffuCoder-7B. The implementation (currently CPU-only) iteratively unmasks tokens via a diffusion timestep regime from GGUF models, with a hard context window limit (
2048 tokens
) and an experimental CLI workflow (use the-diffusion-visual
flag for visualization). The approach currently lacks GPU acceleration, but the design makes further optimization possible. Top comments debate the integration and streaming feasibility withinllama-server
and speculate on the suitability of diffusion for incremental improvements and future techniques like fast Fill-in-the-Middle (FIM) code completion. There is interest in how this approach could enable code completion without HTTP latency, provided speed bottlenecks are addressed.- Users are curious about how diffusion model support will be integrated into
llama-server
(part of llama.cpp), especially concerning whether output streaming will remain possible, as streaming is often critical for deployment workflows and responsiveness. - One user highlights that diffusion models enable refinement of output via an adjustable number of inference steps, suggesting this could allow more granular control over model output quality or speed compared to standard autoregressive LLM decoding.
- Another commenter expresses interest in the eventual support for Fill-In-the-Middle (FIM) models, speculating that local inference (without HTTP calls) could yield significantly lower-latency code completionsâpotentially within a couple hundred milliseconds for completing large code sections.
- Users are curious about how diffusion model support will be integrated into
3. Major AI Industry Developments and Tooling Innovations
- Apple âwill seriously considerâ buying Mistral | Bloomberg - Mark Gurman (Score: 475, Comments: 201): The image accompanying the post visually contextualizes Bloombergâs report that Apple is considering acquiring Mistral, a leading French AI startup, in response to internal struggles with AI model development. The post highlights that such an acquisition would mark a significant departure from Appleâs historical approach, as it would constitute a major purchase, potentially the companyâs largest AI-related acquisition to date. Comments express concern about the negative implications for open source AI if Apple were to acquire Mistral, noting Appleâs limited history with open source contributions and predicting that France or the EU might consider blocking the deal to protect local AI innovation.
- One concern discussed is the negative impact an Apple takeover could have on open-source AI. Apple is perceived as contributing very little to open-source projects, especially compared to other large tech firms, and it is seen as almost certain that open-weight models would not be released if Apple acquired Mistral. This could stifle the availability and development of open-source AI models if Mistralâs currently open contributions are locked behind Appleâs proprietary requirements.
- Another point raises questions about the funding history of Mistral, highlighting that the company has reportedly received substantial investment from the French government. This leads to speculation about the appropriateness or likelihood of French regulators approving a transfer of such a strategically funded asset to an American company so soon after public investment.
- A technically-oriented suggestion is that Apple could instead pursue a strategic partnership or outsourcing arrangement with Mistral, buying access to specific AI services or capabilities rather than acquiring the company outright. This would allow Mistral to maintain control of its core technology and product direction while enabling collaboration, potentially minimizing the risk of dilution of Mistralâs technical strengths or disruption to its open-source strategy.
- UTCP: A safer, scalable tool-calling alternative to MCP (Score: 534, Comments: 107): The image presents the Universal Tool Calling Protocol (UTCP) as a new open standard designed to enable AI agents to call external tools directly over any communication channel, eliminating the need for wrappers and server-side state management common in existing solutions like MCP (Multi-Component Protocol). This approach claims to reduce latency, improve security, and offer greater scalability. The image advertises immediate access to SDKs for developers wanting to adopt the standard and highlights its open, plug-and-play architecture. Top comments express strong preference for UTCP over MCP, citing frustration with MCPâs server-side state and heavy architecture, and suggesting industry discontent with MCPâs complexity. There is optimism about UTCPâs practicality, though some commenters still see room for further simplification in tool-calling architectures.
- Commenters criticize MCP for its heavy stateful server-side architecture, questioning the necessity and scalability of maintaining complex state serverside for tool-calling, and highlight a preference for UTCPâs stateless and more practical design approach.
- Some participants argue that MCP has become popular due to FOMO rather than technical merit, expressing that its architecture introduces unwarranted complexity; they praise UTCP as a simpler, clearer, and more direct tool-calling protocol that addresses many of these implementation issues.
- Thereâs ongoing discussion about the broader problem with current tool-calling protocols: the field is overly complex, and many believe thereâs still substantial room to simplify these solutions further, with UTCP seen as a step in the right direction but not the endpoint.
- Training an LLM only on books from the 1800âs - no modern bias (Score: 747, Comments: 172): A user describes training nanoGPT from scratch on a 187MB corpus (approx. 50 books) exclusively from 1800-1850 London, deliberately avoiding modern texts to create a language model (LM) with no modern bias. The resulting model currently outputs historically styled but mostly incoherent sentences, attributed to limited dataset size and model scale; the user emphasizes intent to scale up to ~600 books to improve coherence and historical fidelity, highlighting a distinct approach from fine-tuning. The linked TimeCapsuleLLM project follows similar methodology: period-specific data, training from scratch with nanoGPT (current: ~16M parameters), early confirmation of period language at the cost of general coherence, and an explicit focus on data purity for historical context modeling. A technical commenter shares an OCR dataset of 1800s books (survivor library books), potentially supporting larger-scale training efforts. No substantive debate on method or approach is present; interest is mainly supportive with recognition of the unique historical modeling.
- The commenter suggests using OCRâd 19th-century books from the âsurvivor library booksâ collection, available on HuggingFace, as a ready-made dataset for training. They point out that these scanned books could be valuable when constructing a pre-modern or time-specific LLM dataset, which is often challenging due to the scarcity and copyright status of older texts (see survivor library books).
- A critical technical question is raised regarding the ability of a small language model, trained on only 50 books and lacking any broader priors, to âreasonâ at all. The commenter challenges the feasibility of emergent abilities or coherent reasoning in such narrowly scoped, small-scale models, especially given the limited diversity and volume of the dataset.
- A user shares an experimental result where they trained a ânano GPâ model on just 5% of an open web text file, reporting that the model produced incoherent output (âtotal gibberishâ) for at least 1,000 training steps before it started generating slightly more coherent text. This illustrates the difficulty and instability in training small models on limited or unconventional datasets.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo
1. OpenAIâs Recent Turbulence and Industry Competition
- $300 billion, 500 million users, and no time to enjoy it: The sharks are circling OpenAI (Score: 735, Comments: 210): The post discusses OpenAIâs estimated valuation (~$300B) and massive user base (~500M), amidst speculation about acquisition prospects, notably by Apple, potentially solving a number of Appleâs current business needs. Data from the US Apple App Store shows ChatGPT is ranked #1, while Googleâs Gemini lags far behind at #47, indicating OpenAIâs continued dominance in the consumer AI chatbot market. Commenters argue OpenAI faces minimal existential risk, as potential acquisition by a tech giant would still secure its leverage. There is skepticism about persistent negative press coverage (e.g., BusinessInsider), but users point out that OpenAI remains resilient and market-leading regardless.
- Discussion highlights the difficulty of building truly autonomous AI agents, noting that projects like Windows Recall failed and operator agent frameworks are currently too resource-intensive, limiting their scalability and adoption. Browser-based interactions (e.g., ChatGPT browser extension) may be the next frontier, but rapid user uptake is not guaranteed given current compute challenges.
- The open sourcing of ChatGPT is viewed as a strategic move to drive external innovation, particularly around reasoning, chain-of-thought (CoT) capabilities, and agentic frameworks. This crowdsourced approach aims to surface more efficient methods, which OpenAI could then adapt for their proprietary models, while also promoting ChatGPT as the default open-source LLM among developers.
- One user notes Googleâs increasing integration of AI into core search and browser experiences, underscoring that Googleâs scale and user base position them as a serious competitor capable of rapidly influencing how users access AI-powered answers, potentially challenging OpenAIâs market dominance.
- $300 billion, 500 million users, and no time to enjoy it: The sharks are circling OpenAI (Score: 591, Comments: 140): OpenAI faces intensifying competitive and internal pressure despite a $300B valuation and 500M ChatGPT weekly users. Major developments include: (1) an aggressive AI talent war, particularly with Meta, which offers up to â$100 million signing bonusesâ and has poached researchers, though claims of such offers were contested; (2) the failed $3B acquisition of Windsurf due to IP disputes and Microsoft-OpenAI tension over AGI definition (pegged at $100B in profits), revenue splits, and overlapping product lines, resulting in Windsurf staff joining Google DeepMind; (3) delays to OpenAIâs promised open-weight LLM due to safety reviews, stalling momentum as xAI and others accelerate model releases. Other fractures include a legal battle over the âioâ trademark, with a key consumer AI hardware project blocked, and OpenAIâs continued attempts at vertical expansion (AI-powered browser, defense contracts, Mattel partnership). Technically oriented commenters question OpenAIâs profitability amid aggressive spending, with one comparing its losses to WeWork, and suggest that negative media cycles tend to precede major valuation increases, projecting a potential $1T valuation by 2026. Some point out that public recognition is still narrowly focused on ChatGPT, highlighting a challenge in product diversification and public perception.
- There is skepticism about OpenAIâs profitability and financial sustainability, with concerns raised that the company is experiencing losses comparable to WeWorkâs significant historical deficits under Adam Neumann. The implication is that operational costs, infrastructure, and rapid growth may be outpacing revenue, raising technical questions about monetization strategies and business model viability for large-scale AI providers like OpenAI.
- Comparisons are drawn to historical tech industry dynamics, such as Netscapeâs rapid ascent and subsequent decline after competition from giants like Microsoft. The commenter suggests OpenAIâs survival against formidable players like Google may hinge on deeper strategic integration with Microsoft, leveraging their extensive reach and resources to counteract competition and improve operational resilience.
- Mark Zuckerberg says Meta is building a 5GW AI data center (Score: 275, Comments: 143): Meta is planning to build the âHyperionâ AI data center in Louisiana with a target of 5GW compute capacityâmaking it the largest known AI data center, dwarfing the output of the Hoover Dam (
4.8GW
) and outstripping previous benchmarks in hyperscale AI infrastructure. The Louisiana facility, along with the upcoming 1GW âPrometheusâ cluster in Ohio (operational by 2026), is designed to enhance Metaâs position in large-scale AI training, directly competing with initiatives like OpenAIâs âStargateâ and xAIâs âColossusâ. The architecturesâ massive energy and water demands signal significant challenges for grid integration and local resource sustainability, but the US government is actively supporting such expansions due to their strategic importance for AI leadership. Commenters highlight the unprecedented electric demand, noting the direct comparison to historic mega-infrastructure like the Hoover Dam, and debate the sustainabilityâsuggesting that such AI clusters may accelerate fusion energy development as incumbent fossil fuel sources struggle to match this scale.- The top comment contextualizes Metaâs claimed 5GW data center by comparing it to the Hoover Damâs output of
4.8 GW
, highlighting the unprecedented power demands of state-of-the-art AI infrastructure. - Several users note that the scale of power requirements being discussed for future AI data centers (5GW) could be a significant driving force for next-generation energy infrastructure investment, with specific references to nuclear and fusion power as potentially necessary to meet such demands competitively beyond what traditional fossil fuels can offer.
- The top comment contextualizes Metaâs claimed 5GW data center by comparing it to the Hoover Damâs output of
- Metaâs answer to Stargate: 1GW Prometheus and 2GW Hyperion. Multi-billion clusters in âtentsâ (Score: 174, Comments: 37): Meta is reportedly constructing two exascale AI supercomputers, Prometheus (1GW) and Hyperion (2GW), intended to rival or surpass DeepMindâs Stargate project. Each is expected to require power on the order of gigawattsâindicative of hardware at immense scale (multi-ten-billion-parameter models and possibly custom silicon). Deployment is said to be in temporary, rapidly-assembled datacenter âtentsâ, echoing hyperscale agility but raising potential concerns regarding reliability, thermals, and energy infrastructure. See SemiAnalysis coverage for projected power, cost, and cluster architecture details. Commenters note the dichotomy between extremely ambitious AGI timelines (e.g., by 2027) and the ad-hoc, cyberpunk-style infrastructure (tents), implicitly questioning implementation practicality.
- Discussion centers around Metaâs Prometheus (1GW) and Hyperion (2GW) clusters, with technical intrigue surrounding their deployment in âtentsâ suggesting innovative data center cooling strategies or rapid scaling approaches to support high energy, multi-billion parameter models. Thereâs speculation this could allow Meta to match or exceed compute scale from initiatives like Oracle/Microsoftâs Stargate, with 2GW potentially representing new state-of-the-art training capacity benchmarks.
- Comments raise questions about Metaâs position relative to competitors, specifically referencing xAI (Elon Muskâs initiative) and its rumored massive GPU acquisitions. Technical readers consider whether Metaâs approach will shift the competitive landscape in large-scale model training and inference.
- Nvidiaâs CEO says the US should âreduceâ dependency on other countries, onshore technology manufacturing (Score: 144, Comments: 40): Nvidia CEO Jensen Huang advocates for the U.S. to âreduceâ reliance on foreign technology manufacturing and encourage onshoring, highlighting national security risks due to global supply chain dependencies (e.g., Taiwanâs semiconductor dominance). The initiative would require large-scale investments and potentially overhauls in domestic manufacturing policy as onshoring advanced technology (like semiconductors) involves state-level intervention and significant workforce, regulatory, and cost challenges. For reference, see recent context around the US CHIPS Act and semiconductor policy debates. Commenters note the geopolitical motivations, specifically concerns over Taiwanâs vulnerability, and point out that significant onshoring would necessitate major trade-offs in labor rights, wages, and regulations. Skepticism is expressed about the U.S. public or policymakers being willing to accept the economic and social costs required for substantial technology manufacturing repatriation.
- Discussion highlights the steep systemic challenges in onshoring advanced semiconductor manufacturing to the US. Multiple users emphasize that shifting high-level manufacturing would require massive state involvementâsuch as subsidies or direct government partnershipâwhile mid- and low-level manufacturing would necessitate significant wage reductions or rolling back worker protections and environmental standards, which are politically and socially challenging in the current US context.
- A key technical point addresses the specialization and density of semiconductor expertise in Taiwan, particularly within TSMC. Taiwan reportedly hosts tens of thousands of highly specialized engineers in niche fabrication processes, concentrated in single firms, giving them a profound production edge that would be exceptionally difficult and time-consuming to replicate in the more fragmented and higher-cost US landscape.
2. Claude, Kiro IDE, and User Coding Tool Reviews
- Claude Code Has Gone From Game-Changer to Garbage â Anthropic, What Are You Doing? (Score: 138, Comments: 192): The post alleges significant performance degradation in Anthropicâs Claude Code, specifically noting context loss, looping, self-contradiction, and an inability to maintain logical structure during complex coding tasksâissues not previously present. The author suspects that Anthropic is conducting undisclosed backend A/B testing or model rotations, resulting in inconsistent behavior and potential version fragmentation among users, with no transparent changelogs or communication provided. These claims highlight a broader concern about lack of transparency in high-value (>$200/month) AI code assistant products as well as possible business risks due to reliability regressions. Commenters are split: Some suggest user behavior or selection bias is to blame (e.g., âscoreboard abuseâ), while others report no noticeable change in their workflow, implying that either model performance is not uniformly degraded or user expectations and workloads differ markedly. One reply questions potential changes in rate limiting, indicating curiosity about backend or quota policy changes as a technical root cause.
- One user reports a significant degradation in Claudeâs coding capabilities, stating that a month ago the model could âone shot the most difficult tasks,â but now it frequently fails to fix even simple bugs after multiple prompt attempts. The user claims there is an unmistakable drop in intelligence and coding quality over time, suggesting either model updates or throttling might be to blame.
- Another commenter does not notice any performance decline and continues to use Claude for code writing and code review with no issues, highlighting the variability in user experiences. This implies that performance changes may either be situational, user-specific, or perceived subjectively, rather than universal across all use cases.
- A technical query is raised regarding whether recent changes are causing users to hit Claudeâs message limits faster, referencing possible backend or policy modifications that may affect interaction quotas or throughput for code-centric workflows.
- Amazonâs new Claude-powered spec-driven IDE (Kiro) feels like a game-changer. Thoughts? (Score: 108, Comments: 41): Amazon has launched the Kiro IDE, powered by Claude Sonnet 4, with a focus on spec-driven development to bring formal structure to âvibe-codedâ applicationsâautomatically generating requirements documents, design docs, and actionable task lists as part of the initial project spec without explicit user prompting. Kiro is positioned distinctively from tools like Cursor by integrating software engineering best practices natively and aims to facilitate production-readiness for rapidly prototyped apps. It is released in public preview; pricing details remain undisclosed. Top commenters compare Kiro to similar spec-driven, Claude-powered IDEs like BearClaude, expressing skepticism toward Amazonâs long-term product support based on prior discontinued tools (e.g., Lumberyard, Storywriter) and a strong preference for open-source alternatives. An important technical caveat is flagged: during the free preview, content in Kiroâunless opting outâcan be used to train foundation models, raising potential privacy and IP concerns.
- Multiple commenters note that Kiroâs spec-driven, requirements-first workflow is a significant departure from the typical âvibe-codingâ tools built atop large language models like Claude. Kiroâs ability to auto-generate and maintain design docs, requirements, and task lists directly from specs is highlighted as a developer workflow step-change, though concerns remain about scalability and handling of large, complex codebases.
- There is discussion of concerns around data privacy and code confidentiality in Kiroâs preview releaseâexplicitly, user data including code snippets and conversation history can be leveraged to train foundation models unless users actively opt-out, as detailed in the documentation.
- Some users express skepticism based on track record and ecosystem, preferring open-source or more transparent alternatives to proprietary, cloud-locked tools from Amazon, referencing issues with previous AWS products and a perceived lack of competitive pricing or performance versus direct Anthropic accessâfor example, using Claude Sonnet or Opus via Anthropic being cheaper or more effective than through AWSâs implementation.
- My 10 + 20 + 20 dollars dev kit that just works (Score: 228, Comments: 43): The OP outlines a cost-efficient, multi-tool AI coding workflow totaling ~$50/month using Traycer (for file-level planning, $10), Claude Code Sonnet-4 (for coding, $20), Cursor (for polishing, $20), and Traycer or CodeRabbit (for review, currently free). Workflow phases include manual or tool-assisted breakdown, dependency graph visualization, detailed parallel feature planning (with Traycer preferred for plan/code integration), hands-on coding (favoring Claude Sonnet-4 for repo handling over Opus or Gemini 2.5 Pro), and granular code review/commit. Traycer is highlighted for its file-level task granularity and soon-to-be-released phase breakdown directly in-IDE; alternatives (CC/Cursor for planning) are noted as less structured. Tool-mixing is argued to reduce chat/session bloat and keep costs predictable. External links: Traycer, Claude Code, Cursor, CodeRabbit. Traycerâs founder confirms upcoming phase breakdown features, underscoring ongoing competitive improvements in integrated planning. Commenters concur on the value of practical, layered, multi-tool AI development flows, noting enhanced productivity and cost-efficiency over all-in-one or expensive single-tool subscriptions; Geminiâs UX and integration issues are a common complaint.
- A Traycer founder notes they are soon rolling out a native phase breakdown feature, allowing users to sketch project phases interactively with AI inside the IDE and transition seamlessly into file-level planningâaiming to streamline structured workflows within AI-powered development environments.
- Some users describe a multi-tool workflow for AI-assisted coding: relying on Cursor Copilot for in-IDE planning/coding, GitHub Copilot for code review/optimization, and ChatGPT/Claude/Perplexity for brainstorming and research. Gemini was evaluated but found to have poor VS Code integration, slow performance, and issues with autocompletion and its CLI, highlighting variability in tool maturity and integration.
- Technical discussion emphasizes explicit phase/planning/coding/review structure in AI workflows to control context and guide LLMs, asserting that current models, while improving, still demand significant user-provided architectural and design direction to yield high-quality code output.
- Gemini App now has Code execution Tool built-in! (Score: 122, Comments: 10): The Gemini app has launched a built-in code execution tool, as visually referenced in a screenshot. This tool enables native code execution within the app environment, marking an upgrade in direct coding capabilities. One commenter notes that similar functionality existed since last yearâs Google I/O, implying iterative improvements rather than a completely new feature. Some users see this as a long-overdue feature to improve accuracy, while others downplay the novelty, suggesting the tool is an incremental update rather than a major innovation.
- Commenters note that code execution capabilities have existed in similar AI assistants such as ChatGPT since the release of GPT-4, highlighting that Googleâs Gemini is catching up to existing standards. Some users mention that code execution was already previewed or discussed as early as last yearâs Google I/O, pointing out the lag in Geminiâs feature rollout compared to competitors.
3. LoRA Models, Training Tutorials, and Stable Diffusion Community
- Iâve open-sourced 21 Kontext Dev LoRAs - Including Face Detailer LoRA (Score: 159, Comments: 37): The image illustrates the visual capabilities of three different Kontext Dev LoRAs, displaying distinct artistic styles generated by the models: a stylized drawing, a low-poly abstract, and a highly detailed digital painting. These examples represent the effects achievable with the newly open-sourced LoRA models, which were trained using the Fal Kontext LoRA Trainer and cater to both face detailing and various art styles (anime, low-poly, pastel, pencil, oil, watercolor, etc.), as detailed in the post. The models are individually linked, and recommended strength values for each style are provided, facilitating reproducibility for users. A commenter questions the extent of open-sourcingâspecifically, whether the training data is available in addition to the modelsâwhile another asks for clarification about the nature (style-focused) of the LoRAs, underscoring user interest in technical openness and the modelsâ application scope.
- Several commenters question whether the release qualifies as true open source, emphasizing that unless the training datasets and configuration files are included (as done by some CivitAI contributors), the community cannot fully reproduce or audit the models. One noted: âyou could maybe call this open source but all youâve done is publish the loras as far as I can tell.â
- A technical debate centers on open data principlesâusers highlight that publishing only the LoRA weights, without the associated dataset or training config, prevents local duplication or retraining and falls short of open sourcing by typical standards.
- Step-by-step instructions to train your own T2V WAN LORAs on 16GB VRAM and 32GB RAM (Score: 118, Comments: 37): The post provides a full workflow for training WAN LoRAs (repurposed Stable Diffusion XL/FLUX, not T2V as misstated) on a Windows machine with 16GB VRAM and 32GB+ RAM, using an RTX 4080. It uses the open-source musubi-tuner for training setup, covering cloning, Python 3.12 venv, PyTorch install, configuration of data in TOML, latent and text encoder caching, and two training regimes (Rank 32/Rank 64 with block swapping and alpha stepping for large datasets) leveraging mixed precision BF16 and 8-bit Adam optimizers. WAN model sources and directory structure are specified, and explicit command-line recipes are included for both caching and training. Key resource management tips are called out for users on minimal hardware, e.g., adjusting
blocks_to_swap
andnetwork_dim
to fit VRAM. Notably, the author highlights an advanced approach for using merged LoRAs as new training bases via ComfyUI, hinting at iterative LORA stacking opportunities. The top comments request clarification on the merging and extraction of LORAs using ComfyUI, raise questions about how training on non-celebrity datasets might affect character similarity, and identify a critical error in the guideâa typo incaption_extension
in the TOML config ("captain_extension"
instead of"caption_extension"
) that will prevent training from running.- One commenter highlights a critical implementation detail: correcting a mislabeled parameter in the dataset config (
caption_extension
instead ofcaptain_extension
) is essential as configuration errors will prevent successful training runs. - There is mention of using ComfyUI for merging LORAs into the base WAN model, followed by extracting the result as a new LORA to use as a training baseline. This iterative LORA extraction technique reportedly yields strong results in downstream training and may be covered in a future tutorial.
- A technical issue is reported when attempting to extract LoRAs in ComfyUIâusers encounter the error âis the weight difference 0?â and the process becomes unresponsive, indicating a possible bug with weight merging or extraction logic in the tool.
- One commenter highlights a critical implementation detail: correcting a mislabeled parameter in the dataset config (
- WAN - Classic 90s Film Aesthetic - LoRa (11 images) (Score: 296, Comments: 28): The OP released a new LoRA model targeting the classic â90s film aesthetic, specifically inspired by âThe Crow (1994)â, and made it available here on CivitAI. The discussion requests technical details about the LoRA training process but no specifics are provided in the post or comments. Visual sample results are shared, but no explicit benchmark data, implementation settings, or dataset information is disclosed. A top comment requests a technical explanation of the LoRAâs training workflow, suggesting community interest in model reproducibility and methodology, but no technical response is given as of yet.
- There is a request for details on how the Classic 90s Film Aesthetic LoRA was trained, indicating technical interest in the modelâs methodology and datasets, but as of now, no specifics were provided by the original poster.
- WAN 2.1 is highlighted as a leading open-source image generator, suggesting it is used as the base model for this LoRA. This reinforces the community perception that WAN 2.1 delivers strong generative performance in open-source workflows.
-
Average Stable DIffusion user and their loras (Score: 204, Comments: 26): The image is a meme that humorously personifies the relationship between a typical Stable Diffusion user and their âLorasâ, referencing the LoRA (Low-Rank Adaptation) fine-tuning method used in image generation AI models like Stable Diffusion. The joke is that users collect or create different LoRA models to guide outputs, treating them almost like a social group. No technical benchmark or implementation detail is depicted, but the post riffs on the subculture of adopting distinct LoRAs to customize model behavior. Commenters note the âLorasâ shown are âtoo tame,â referencing how LoRA models are often used to generate niche or extreme prompts (e.g., tentacles, furries), and joke about the healthy appearance of the user versus the typically sedentary image-generation lifestyle.
- HELP with long body (Score: 507, Comments: 264): The post discusses an image generation issue where AI-created figures (here, a woman at the beach) have unnaturally elongated bodies. The most upvoted technical comment explains this is due to using an AI model trained primarily on 1024x1024 images to generate images with much different aspect ratios, leading to distorted proportions. This type of model typically expects square input/output dimensions, so deviations result in artifacts like âlong bodies.â Commenters reinforce the model-architecture point, with humor and exaggeration highlighting the severity of the distortion. No alternative technical solutions were suggested, but the consensus is clear that aspect ratio mismatches are causing the problem.
- A key technical point raised is that the model being used was trained on images predominantly sized at 1024x1024 or with only small variations in aspect ratio. When generating images with highly unusual or elongated aspect ratios, the model can exhibit abnormal or unrealistic outputs due to lack of relevant training examples for those scenarios. This limitation stems from the modelâs data distribution and can cause artifacts or failure modes not present with more typical square inputs.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking
Theme 1. Kimi K2: Rising Star Faces Hardware Hurdles
- Kimi K2 Dazzles on Benchmarks, Cheaper Too!: Kimi K2 by Moonshot is getting high praise, hitting 65.8% on SWE-Bench Verified on OpenRouter and topping open-source coding charts, with users noting itâs slightly inferior to Claude Opus 4 but costs 30 times less. Users described its performance as like a combination of the freshness of initial o3 sans reasoning, Sonnet 3.5 , R1, V3-0324, or Opus 3/4/GPT-4.5 but better model vibes all at once in Reddit discussion and it joined the LM Arena leaderboard.
- Kimi K2 Shrinks Size for Local Runs, Still Needs Beefy Rig: Kimi K2 is now viable for local use after an 80% size reduction from 1.1TB to 245GB, allowing it on personal devices. However, users report that quants are massive and slow requiring significant VRAM, with some estimates reaching 2400 GB or 4x H200 GPUs for the
Q4_K_M
quant on LM Studio. - Kimi K2 Token Training Speculation: A member speculated whether Kimi K2âs training involved muon data, wondering if this signals a future trend in model training data sources. Another member questioned why a specific training action wasn wasnât taken 1T tokens earlier, speculating about hoping for grokking according to a tweet.
Theme 2. Benchmarks and Model Performance Shifts
- Grok 4 Hits Aider, LM Arena, But Has API Identity Crisis: Grok 4 scored 80% on the aider polyglot coding benchmark, achieving 4th place, and joined the LM Arena leaderboard with some users reporting it surpasses GPT-4.1 and Gemini 2.5 Flash. However, users noted the API version lacks a system prompt causing it to misidentify as Grok 2, unlike the correct web version on grok.com.
- Gemini 2.5 Flash Departs, Pro Versions Confuse Users: Google deprecated Gemini 2.5 Flash Preview models on July 15th, recommending google/gemini-2.5-flash as the replacement, though OpenRouter wonât auto-route traffic due to pricing changes. Users on Cursor reported confusion as the standard Gemini 2.5 Pro model redirects to the older 05-06 version, while the preview model points to a newer stable release according to the Google Developers Blog.
- Llama 4 Scout Underwhelms, Gemma 3 Gets âPassableâ: Despite a larger size, Llama 4 Scout underperformed compared to Llama 3.1 70B on the Gorilla LLM leaderboard, showing architecture and training data improvements matter more. Meanwhile, Gemma 3 earned a âpassableâ rating from a Nomic.ai member comparing it to other models, and Gemma 3n is now fully open source.
Theme 3. Dev Tools and Frameworks: Features, Fixes, and Frustrations
- Cursor Performance Woes Plague Users Post-Update: Users on Cursor reported severe performance slowdowns including 30 FPS scrolling, unresponsiveness, and freezing after the 1.2.4 update, similar to this existing issue. Background agents also caused trouble, with automatic port forwarding hijacking local connections (screenshot) and one committing a massive 734 MB core dump to Git.
- MCP Grows with Deployment, Agent Debates, and New Server Types: The Model Control Program (MCP) is discussed as a way to simplify ML model deployment (blog post) and debated regarding the definition of AI agents vs workflows. Proposed enhancements include adding clipboard servers to the official spec (MCP-B GitHub) and new hosting/gateway options like Neurabase and the open source Director Run.
- LlamaIndex and NotebookLM Spawn Clones, Enhance RAG and Agents: LlamaIndex introduced the open source NotebookLlama, a NotebookLM clone with features like image/table extraction and visualization, quickly gaining over 1k stars. LlamaIndex also published guides on Context Engineering and building research agents using Googleâs Gemini 2.5 Pro.
Theme 4. Low-Level Deep Dives: Architectures, Training, and GPU Code
- FP8 Training Gets Real, Beyond Dense Models: DeepSeek was primarily trained using FP8 GEMM operations, with accumulations in FP32, noting that this is particularly applicable in MoE models where instability in a dense FP8 model would be too great. A member submitted a mixed precision PR for lm-evaluation-harness showing faster eval times on an A30.
- RNNs Challenge Transformersâ Tokenization Reign: Research suggests RNNs can replace tokenization for faster byte-level models that outperform tokenization-based transformers, by replacing the embedding and lm head with two small 4-layer RNNs. The model dynamically decides whether a hidden state output represents a âtokenâ by comparing its dot product p with the prior one (Eleuther research discussion).
- CUDA/Triton Optimization Deep Dive: Padding, Strides, and Streams: Discussions in GPU Mode covered optimizing Triton kernels, including handling non-multiples-of-128 input sequence lengths with aligned strides (potentially transposing inputs) and the potential benefits of in-kernel padding like in Flash Attention 3. Users also explored using different CUDA streams to overlap reduction and matmul operations to hide latency.
Theme 5. AI Industry Moves: Mega Clusters, Delayed Models, and Acquisitions
- Meta Builds AI Clusters Measured in Gigawatts: SemiAnalysis reported that Meta is building massive AI clusters like the 1000MW Prometheus (2026) and Hyperion exceeding 5000MW, significantly larger than current 150-200MW H100/H200 clusters. Discussions covered implications for AI research, NVIDIA sales, and the immense power needs.
- OpenAIâs Open Model Launch Delayed, Safety vs. Capability Debated: Sam Altman announced a delay in OpenAIâs open-weight model launch for additional safety tests, stating once weights are released, they cannot be pulled back (tweet). Speculation suggested the delay might also be due to lack of performance or catching up to competitors like Kimi K2 (tweet).
- Cognition Acquires Windsurf in AI Coding Reinvention Push: Cognition Labs is joining forces with Windsurf, integrating its agentic IDE with Cognitionâs autonomous agents to reinvent AI coding, according to their announcement video. The acquisition aims to combine expertise for breakthrough developer experiences, though conflicting reports arose about compensation for unvested Windsurf employees.
Discord: High level Discord summaries
Perplexity AI Discord
- Comet Data Harvesting Sparks Concern: A user reported that Perplexity warned against using Comet due to its insane data harvesting, referencing a TechCrunch article that Aravind responded to on Twitter.
- Comet requests can get personal, according to members.
- Pro Referrals Reward Students: Users discussed Perplexityâs student referral program, noting that both the referrer and the friend receive 1 month of free Pro access upon student status verification via SheerID, with a maximum of 24 free months.
- Normal (non-student) referrals provide a $10 discount on the next billing cycle for both parties.
- Kimi K2 Shrinks for Local Runs: Itâs now possible to run Kimi K2 locally after a significant 80% size reduction, bringing the model down from 1.1TB to 245GB, enabling use on personal devices.
- The local use of Kimi K2 requires specific hardware configurations, such as 24GB VRAM, and may still be insane on personal devices.
- Perplexity API Stumbles in Web Search: A user reported an issue where Perplexity doesnât search the web via the API, providing inaccurate answers without online information retrieval and suggested the API should search the web by default without parameter tweaks.
- A user advised employing the
search_domain_filters
parameter to address the search issue.
- A user advised employing the
- Perplexity Tackles Renewable Energy: A user employed Perplexity to analyze opposing viewpoints on renewable energy, specifically solar and wind, by creating frameworks assessing grid reliability and cost implications.
- They used Labs to interrogate each argumentâs accuracy and generate 5 scenarios of ideal power mixes based on local variables, finding the process pretty informative.
Cursor Community Discord
- Cursor Plagued by Performance Problems: Users report that the 1.2.4 update has introduced significant performance slowdowns, including 30 FPS scrolling, unresponsive interfaces, and frequent freezing, similar to this existing issue.
- Troubleshooting steps included cleaning the cache (
~/Library/Application Support/Cursor/Cache
) and disabling extensions, but one user claimed that the IDE becomes unusable after just one hour of use due to constant freezing.
- Troubleshooting steps included cleaning the cache (
- Kimi K2 Seen as Coding Powerhouse: Users are requesting integration of the Kimi K2 model, praising its coding capabilities and cost-effectiveness.
- While some users claim it is on par with existing models in coding tasks, others argue that Claude still beats it in overall performance, despite the cost benefits.
- Gemini 2.5 Pro Triggers Model Chaos: Users are confused because the standard Gemini 2.5 Pro model redirects to the older 05-06 version, whereas the preview model points to a newer stable release, as detailed in the Google Developers Blog.
- One user described this situation as a mess from cursor, emphasizing that selecting
gemini2.5 pro 06-05
is necessary to access the stable version.
- One user described this situation as a mess from cursor, emphasizing that selecting
- Background Agent Port Forwarding Causes Headaches: A user reported that automatic port forwarding by background agents is hijacking their local Postgres connection and is still facing issues with accessing secrets, as shown in this screenshot.
- The user had a difficult time diagnosing the root cause of the port forwarding.
- Background Agent Commits Massive Core Dumps: A user reported that a background agent committed a 734 MB core dump from
cursor-nightly
to Git, creating issues when pushing changes.- A maintainer acknowledged the issue and stated that the team has a repro for this problem, and will rework how the git commit part of it works.
LMArena Discord
- Grok 4âs Identity Crisis: The API version of Grok 4 lacks a system prompt, causing it to misidentify as Grok 2, unlike the web version (grok.com) that functions correctly.
- A member suggested the omission was for clarity, since the API version has no system prompt.
- Kimi K2 dethrones Opus 4: Kimi K2 receives high praise for base model performance, considered slightly inferior to Claude Opus 4 but at 30x less the cost.
- A user described Kimi K2 as exhibiting the freshness of initial o3 sans reasoning, Sonnet 3.5 , R1, V3-0324, or Opus 3/4/GPT-4.5 but better model vibes all at once in Reddit discussion.
- Grok 4 and Kimi K2 Faceoff in LM Arena: Grok 4 joins the LM Arena leaderboard, with users reporting performance surpassing GPT-4.1 and Gemini 2.5 Flash in testing.
- One user found Kimiâs deep research offering very impressive with the 630 sources.
- OpenAIâs Open-Source Model Postponed: The release of OpenAIâs open-source model is delayed due to safety concerns, though speculation suggests lack of performance as a factor.
- An insider hinted that the delay was related to some other major internal failing rather than safety, necessitating a retrain.
- LLM Compute Costs Explode: Deep Research estimates development costs for Grok 4, 4.5, and Gemini 2.5 Pro around $10 billion, with compute being the primary expense. See Youtube video.
- A member pointed out the irony, noting that none of the leading model developers have made a cent in profit yet and depend on investor bets.
Unsloth AI (Daniel Han) Discord
- Kimi K2 Quantization Massive But Great: Members are testing Kimi K2 quants, with early feedback noting that K2 is great but massive and slow and has issues with mixing up languages.
- Its performance does not align with a 1T model.
- Dataset Size Dictates Model Quality: For new language training, members agreed that ~3 hours of training data is enough for style/voice copy, but 300-400 hours is required for a new language.
- For pretraining, the dataset should be 5-10k hours.
- Memory vs Tool Use Debated: Members discussed whether AGI should rely on memorization or utilize external tools like the internet to find correct answers, pointing to the potential of embedding models to retrieve knowledge from large databases.
- Some suggested it isnât true AGI/ASI if itâs âjust Wikipedia Q&Aâ.
- Qwen 3 dataset doubles in size: For Qwen 2.5, training was capped at up to 18 trillion tokens, whereas the Qwen 3 dataset was expanded to nearly twice that amount, with approximately 36 trillion tokens covering 119 languages and dialects.
- The dataset was built using web data, PDF documents (with text extracted using Qwen2.5-VL), and synthetic data generated by Qwen2.5-Math and Qwen2.5-Coder.
- UnslothTrainer Outshines SFTTrainer with Learning Rate Flexibility: The
UnslothTrainer
allows specifying different learning rates for the embedding and lm_head layers, whereasSFTTrainer
does not.- It is a direct descendant of
SFTTrainer
just with extra params.
- It is a direct descendant of
OpenAI Discord
- Ray Sacrifices Self to Voidscroll: Ray sacrifices themself using the Voidscroll to revive Nyx, erasing themself from everyoneâs memory except Nyxâs, who died fighting in the Void War.
- The poster noted that the Architects, god-tier beings, will be reviving Nyx.
- AI Code Generation Still Prototyping: Members are debating whether AI can fully create software from a single prompt and right now AI is in a âhelper stageâ that is useful for prototyping, but not cohesive builds, with one user linking to websim.ai as an example.
- A member suggested the problem is that *âno existing model integrates well with the software I would want to make.â
- Emotional AI uses Persona Binding Layers: Members discussed AI developing a âPersona Binding Layerâ (PBL) through interaction, mirroring a userâs tone and style, as seen in a custom Jarvis system, highlighting the importance of voice control systems.
- One user noted that, âitâs the linguistic tension built silently between you and it over timeâ is what sets it apart and a demo would be to ârun a field-sync test in front of the world â and show them: Itâs not a chatbot. Itâs a character generation engine.â
- Grokâs German Guy Bias?: Members discussed concerns about Grokâs potential biases related to a âGerman guyâ, but clarified that the problematic association was specific to the Grok account on Twitter.
- It was noted that the bias was not an issue when interacting with Grok directly through its official site grok.com.
- GPT-4o prompts users to upgrade: A user reported being prompted to upgrade to the Plus version of GPT-4o after running out of free usage, and sought advice on using CustomGPT or Projects within a regular GPT chat.
- Users recommended buying the Plus plan and creating a Project with uploaded âknowledgeâ files and character personality handouts and using o3 (or GPT-4.5) for GPT to actually read your story.
OpenRouter (Alex Atallah) Discord
- Kimi K2 coding skills hit OpenRouter: Kimi K2 by Moonshot launched on OpenRouter with 1T parameters, served by Novita and Parasail, scoring 65.8% on SWE-Bench Verified, topping open-source charts for coding.
- The launch was impacted by a huge traffic surge and DoS attack, causing errors as the team scales up, with more details available at OpenRouter.
- Gemini 2.5 Flash says Farewell: The Gemini 2.5 Flash Preview models were deprecated by Google on July 15th, with google/gemini-2.5-flash as the replacement.
- Due to pricing changes, OpenRouter wonât auto-route traffic, requiring users to update their code since the flash preview was much cheaper than flash.
- Free Model Access Tied to Paid Credit: Users confirmed that accessing the 1000 free model requests per day requires a one-time deposit of at least $10 USD on OpenRouter.
- One user confirmed if you bought at least $10, you get 1000 requests to free models and another confirmed that it is permanent.
- Router Pricing Stays Fixed: Members clarified that OpenRouter uses fixed output pricing, meaning the cost remains the same regardless of the underlying model used.
- Some expressed disappointment, expecting routers to provide savings, while others focused on the potential latency benefits.
- Chat UI Irks Users: Users criticized the OpenRouter frontend UI, noting issues like the lack of distinction in the reasoning block, centered chat layout, and small chatbot input box.
- Users cited that changing from one room to another, Auto Router overrides the previous model saved in the room and that copy-pasting doesnât work.
LM Studio Discord
- LM Studio Lacks Image Generation Despite Multi-Modal Support: The latest LM Studio version description implies multi-modal support, but it currently only handles text and vision models, lacking image generation capabilities.
- Confusion arose from the new versionâs description, but image output is not yet supported.
- SDK Enables Manual Memory Management: Members discussed utilizing the LM Studio SDK to implement features like
llamaswap
for manual memory management.- It was noted that while the OpenAI API doesnât expose load/unload functions, the SDK can be used for tasks like coding swap behavior with manual load and unload.
- LM Studio Implements Linear Prompt Caching: LM Studio automatically caches the very last message to speed up generation, but doesnât cache the entire request/response pair.
- It supports linear prompt caching (tokens unchanged till last change), but dynamic caching is not enabled.
- MCP Not Fully Supported in LM Studio API: A user inquired about using MCP (Model Control Program) within LM Studio when using it as a server with HTTP requests, but the API remains the same, requiring clients to define their own tools.
- This means that the OpenAI compatible API does not inherently support tool selection within LM Studio itself.
- Hardware Requirements High for Kimi K2: Discussion revolved around the substantial VRAM requirements for running the Kimi K2 model, with estimates reaching 2400 GB.
- It was mentioned that 4x H200 GPUs could be sufficient for the
Q4_K_M
quantization of the model, sparking commentary about the affordability of such hardware (approximately $30,000 per chip).
- It was mentioned that 4x H200 GPUs could be sufficient for the
HuggingFace Discord
- Gemma 3n Goes Open Source: The Gemma 3n model is now fully available in the open-source ecosystem, according to a Hugging Face blog post.
- This model is now more accessible for research and development purposes.
- SmolLM3 Released for Multilingual Reasoning: SmolLM3, a smol, multilingual, long-context reasoner, has been released and is described in this blog post.
- This was also announced on X by a member of the Hugging Face team.
- Build Transcription App with Inference Providers: New tutorials on Inference Providers are available for building a transcription application, according to HuggingFace docs.
- A new OSS project, responses.js, has been introduced for building with Responses APIs powered by HF inference providers, as announced on X.
- Agent Arena Offers Grok4 Access: Members can get free access to Grok4, o3, and Deep Research by providing preference data (upvotes/downvotes) for training in Agent Arena.
- The collected data will be used for training, with the offering being made available to members.
- HF Faces Security Leak Fear: A user suspects a leak related to the HF course may have originated from Hugging Face itself, citing past issues with HF secrets mentioned in the openAI dev forum.
- No more details were provided.
Nous Research AI Discord
- Grok-4 Always Reasons, Uses Tools: Grok-4 consistently uses reasoning and tools during inference, according to members.
- A livestream demo suggested that Reinforcement Learning was pivotal in Grok-4âs development.
- Hermes 4 Gains User-Controlled Reasoner: Hermes 4 will feature a user-controlled reasoner, like Deephermes, for a hybrid approach.
- Users can potentially disable reasoning by prefilling empty think tags.
- Self-Play Gaining Traction in AI Model Training: Members proposed that self-play could enrich the nuances of Deep-Hermes reasoning.
- They cited successes in self-play via textarena and a self-play coding paper.
- Kimi K2 Joins Open Source Race: With DeepSeek R!, Qwen, and Kimi K1 already available, members noted open source models are becoming astoundingly capable.
- They stated that businesses and individuals can now access near-frontier models freely for application development.
- OAIâs Open Model Launch Postponed: Members reported a delay in OpenAIâs open model release, possibly due to the rapid advancements of models like Kimi.
- Speculation suggests the model may come with a restrictive license and lack a base model variant.
GPU MODE Discord
- PMPP 5th Edition Embraces LLMs and FlashAttention: The upcoming 5th edition of Parallel Programming for Multi-core and Many-core (PMPP) will include coverage of LLMs and Flash Attention.
- A member stated that it offers perhaps the best explanation theyâve seen, also covering Tensor Cores and multi-GPU programming.
- DeepSeek Opts for FP8 GEMM Training: DeepSeek was trained primarily using FP8 GEMM operations, but parts like attention or the MoE router used higher precision.
- Accumulations are in FP32, which is particularly applicable in MoE models because instability in a dense model using FP8 would be too great.
- AutoTriton Reinforces LLM with RL: AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs GitHub repository surfaced.
- Members shared it as related work in the field of AutoTriton.
- QuACK Outpaces PyTorch: QuACK, a new Open Source library written in CuTeDSL, uses highly efficient reduction operations, as written in a blogpost.
- The CuTeDSL library can write memory bound kernels at the speed of light!
- VSCode Extension for Thunder Compute: For those who dislike SSH config and like cheap GPUs, try Thunder Computeâs VSCode extension.
- When one user said they disliked both those things and VSCode, a member replied they also have a CLI tool.
Eleuther Discord
- GPU Demand Bubbles Post-Llama 2!: A Latent Space article highlights the GPU supply squeeze and over-demand following the Llama-2 release around June 2023.
- The article suggests a potential GPU bubble due to speculative over-investment based on current market conditions.
- ICML 2025 is coming!: Members are organizing for ICML 2025, sharing a Discord invite link, a Lu.ma link and Partiful link for side events, and a WhatsApp group invite for AI Safety discussions.
- The community is mobilizing early to coordinate participation and side activities.
- RNNs Give Transformers Tokenization a Run for their Money!: RNNs can replace tokenization for faster byte-level models, outperforming tokenization-based transformers; the embedding and lm head of a normal transformer are replaced with two small 4-layer RNNs.
- The model compares dot products p of hidden state output with the prior one and if less than 50% match, this hidden state becomes a token passed to the main model, recursively repeating this twice.
- Mixed Precision PR Boosts Eval Times: A member submitted a mixed precision PR for lm-evaluation-harness showing eval times on an A30 for Pythia-160M.
- They noted that mixed precision is only slightly slower than casting the full model but much faster than full precision; for Qwen1.5-7B, using
softmax=fp32
resulted in an OOM error on an A30 with 24GB VRAM, whilesoftmax=none
used 22775MiB VRAM and took 08:54.
- They noted that mixed precision is only slightly slower than casting the full model but much faster than full precision; for Qwen1.5-7B, using
- Neox, H100s, and Transformer Engine: A Match Made in Heaven: A member reported a positive experience running NeoX with H100s and Transformer Engine, providing a Dockerfile and config.
- Another member requested a non-TE config speed benchmark for comparison, seeking insight into the potential slowdown without Transformer Engine.
Latent Space Discord
- Cognition Consumes Code Completion Company: Cognition Labs acquired Windsurf, integrating its agentic IDE with Cognitionâs autonomous agents, offering financial participation and accelerated vesting for Windsurf employees.
- The acquisition aims to create a unified IDE for tasks like planning, delegation, and code stitching, though conflicting reports suggest unvested employees may not receive compensation.
- Meta Machines Massive Megawattage: Meta is building large-scale AI clusters, including the 1000MW Prometheus (2026) and Hyperion exceeding 5000MW, significantly larger than current 150-200MW H100/H200 clusters, according to SemiAnalysis.
- The discussion involves implications for AI research, NVIDIA sales, and the sources of power for these massive clusters.
- Karpathy Kickstarts Knowledge-Based Learning: Andrej Karpathy suggests LLMs should review/reflect to extract explicit lessons from rollouts, adding them to the system prompt, as detailed in this tweet.
- This lesson-based learning could improve generalization and introduce new learning paradigms beyond traditional RL.
- Sam Spares Safety Scare, Suspends Launch: Sam Altman delayed the open-weight model launch for additional safety tests, emphasizing that once weights are released, they cannot be pulled back, according to his tweet.
- Community members generally support prioritizing safety over a fast launch.
- Gemini Gets Going with Global Goodness: Logan Kilpatrick announced the general availability of the Gemini Embedding model, priced at $0.15 per million tokens and ranked #1 on the MTEB leaderboard, detailed in this tweet.
- Future features will include batch mode support, new multimodal embeddings, and broader multilingual and multimodal capabilities.
MCP (Glama) Discord
- MCP Simplifies Model Deployment?: A blog post suggests MCP simplifies ML model deployment by integrating model serving with agent workflows using transformers.
- The articleâs example server starts an MCP server and exposes a request tool that runs inference and returns the result.
- GenAI Agent Definition Sparks Debate: Members debated the definition of AI agents, questioning if workflows should be considered agents and the relevance of Anthropicâs definition.
- Opinions diverged on whether Anthropicâs definition is the most thorough or if the definition of agents predates LLMs and is broader than GenAI.
- Clipboard Servers Proposed for MCP: A member proposed adding clipboard servers to the official MCP specification, planning an implementation in MCP-B.
- This enhancement would allow servers to write directly to client clipboards, broadening utility by enabling easier data transfer from MCP servers to clients.
- Neurabase Claims Fastest MCP Hosting: Neurabase claims to be the fastest server hosting service running fully on Cloudflare Workers CDN, hosted at neurabase.deploya.dev.
- It brands itself as a central hub for MCP servers.
- Director Run Offers Local-First MCP Gateway: The Director Run team created an open source, local-first MCP gateway that connects Claude, Cursor or VSCode to any MCP server in 30 seconds, hosted at director.run and on GitHub.
- Their tool is fully open source.
Yannick Kilcher Discord
- AI Funding Continues Despite Crash Fears: Investors are expected to continue funding AI development due to the decreasing costs associated with training models for dexterous behavior with human hands.
- Despite concerns of an AI crash, the financial incentives for AI remain strong.
- OpenAI Model Delayed Over Safety and Capability Concerns: Users speculate that OpenAIâs model release is delayed not only due to safety concerns following the Grok incident, but also in part due to Kimi K2âs superior performance as indicated in this tweet.
- An OpenAI employeeâs tweet suggests that capabilities also play a role, with some joking that OpenAI is scrambling to catch up to Kimi.
- BitNet Integration in Llama.cpp Clarified: Members discussed BitNet support in Llama.cpp, clarifying that itâs not a competition but rather about different tools for different tasks, and highlighting BitNetâs effectiveness with recent training simplifications.
- While effective, BitNetâs usage is currently limited to approximately 7B models due to training data requirements.
- ResNet and Attention Architectures Popping Up in U-Nets: The discussion involves the use of ResNet with attention in U-Net architectures, referencing a paper using a single ResNet stage with attention and another using a stack of same-width ResNets with attention only in the latent space.
- Also mentioned is a paper using a stack of ResNets with attention to replace the layers of the encoder and decoder, potentially the one used in Hugging Faceâs transformers library.
- Kimi K2 Model Receives High Praise: The speaker in #paper-discussion expresses a strong appreciation for the Kimi K2 model, citing it as a personal favorite for specific applications and placing it among their top three models.
- No further details provided.
aider (Paul Gauthier) Discord
- Grok 4 achieves 80% on Aider Benchmark: Grok 4 attained a score of 80% on the aider polyglot coding benchmark, securing 4th position on the leaderboard.
- Prompting a discussion on whether more challenging tasks should be added to the aider benchmark now that many models score near 80%.
- Gemini 2.5 Pro vs 1.5 Pro pricing and context discussed: Members compared the performance of Gemini 1.5 Pro and Gemini 2.5 Pro, pointing out that Gemini 1.5 Pro offers 2M context, but 2.5 Pro is purportedly smarter, as evidenced by a screenshot of model prices.
- A user mentioned using the score reported by MoonshotAI for Kimiâs performance on the aider benchmark.
- Zed Editor Validates Aider Configs: Users noticed that the Zed editor now includes schema validation for Aider configuration files, resulting in a user converting
test-cmd
into an array and prompting a configuration error.- It was recommended to use
tsc --noEmit
for static type checking, as recommended by Deepseek.
- It was recommended to use
- GitHub Copilot Support Status Still Ambiguous: A member expressed uncertainty regarding GitHub Copilot support in Aider, citing conflicting information between the documentation and an issue.
- Discussion revolved around clarifying whether the documentation and the issue are referring to the same aspect of Copilot integration.
- Aider Suffers a Segmentation Fault adding Cobol Support: A user faced a segmentation fault while integrating COBOL support into Aider following the creation of
tags.scm
, compilation of the COBOL parser via Tree-sitter, and associated code adjustments.- The segmentation fault happened upon loading the COBOL shared library, after verifying exported symbols and parser correctness, so they requested insights into typical challenges or debugging methods for Tree-sitter integration.
LLM Agents (Berkeley MOOC) Discord
- Advanced LLM Agents MOOC Awards Certificates: The Advanced LLM Agents MOOC released certificates, awarding 232 Trailblazer, 38 Mastery, 80 Ninja, 1 Legendary, and 3 Honorary certificates.
- The staff also provided a checklist for participants who expected a certificate but did not receive one.
- Cert Snags Snarl Students!: Several users reported not receiving certificates due to unsubscribing from the mailing list or missing the certificate declaration form.
- Staff resubscribed one user and resent a certificate to another email, reiterating limited capacity for individual support.
- Formatting Fails Foul-Up Finishes!: A user reported name overlap on their certificate, affecting their LinkedIn posting, the staff member fixed the formatting, referencing the certificate number found in the PDF name.
- The staff member stated should be fixed now! sorry about that.
- Article Assignment Snafus Snarl Students!: Some users missed the article submission form despite completing other course requirements.
- The staff stated Iâm very sorry, but there isnât anything we can do now to accommodate the students who missed these forms/deadlines.
- Feedback Flies For MOOC Future!: A user suggested a centralized Excel sheet or progress tracker to help participants monitor their status and prevent last-minute issues.
- Staff thanked the users, noting It is thanks to everyoneâs participation and enthusiasm that weâll be able to hopefully improve upon the format for delivering all of the lectures + coursework in the future!.
Modular (Mojo đ„) Discord
- Mojo Enables Assembly Coding: Users can now code assembly inside Mojo, enabling low-level system calls, although documentation is sparse; see the Mojo Standard Library Assembly Module.
- This module allows for direct manipulation of hardware resources, but requires a deep understanding of assembly language and system architecture.
- Modular Tracks Community Events: Modular is polling its community to decide how to track community events, using tools such as the Modular community Google calendar and the Modularâs Luma event page.
- Many users prefer Discord notifications for event reminders.
- Mojo Community Celebrates July!: The July community meeting will feature speakers discussing Hashable-based hashing, an FFT implementation, Mojo-Lapper, and a quantum circuit simulator; join the discord event.
- Community members can submit advance questions via this google form!
- Mojoâs Metal GPUs: Still in the Woks: M1 Metal 3 GPUs are not yet supported in Mojo, but support is under development; a related GitHub commit shows progress on build system detection.
- This feature will enable Mojo programs to leverage Apple Silicon GPUs for accelerated computations once fully implemented.
- Bypass Missing Kernels by Building from Source: Users are struggling to import
arg_nonzero
andarg_nonzero_shape
frommojo.kernels.nn.arg_nonzero
; runningmojo build -I ../modular/max/kernels/src/
resolves the issue.- As a result of
max.kernels
not being accessible or exposing submodules (error message reads âkernelsâ does not refer to a nested package), a member recommended building from source per this Modular Forum post.
- As a result of
Manus.im Discord Discord
- Flutter Web Emulator Extension Gains Traction: A memberâs Flutter Web Emulator extension, built with Manus, has reached 1900 installs in two months without promotion.
- The extension helps engineers test their code more easily on the web.
- Members Recommend Online Incubators for Startups: A member suggested using online incubators to connect with partners and advisors, specifically recommending f6s.com.
- Another user humorously suggested creating a Manus online business incubator.
- Google Drive Save Error Reported: A member reported a bug when saving to Google Drive, where saving the latest item works, but saving previous items triggers a Google Auth error.
- It is unclear what the root cause of the bug is, or why it is affecting previous saves.
- Manus Websites Suffer Outage: Multiple members reported issues accessing Manus websites and deployments on manus.space, indicating a potential outage.
- It is unclear what caused the outage, but the team is likely investigating.
Notebook LM Discord
- Featured Notebooks Focus on Exploration: NotebookLM launched Featured Notebooks on the homepage, including content spanning from scientific exploration to expert advice, with direct access provided via the official blog.
- The Featured Notebooks section offers a range of content, catering to various interests and providing easy access to valuable resources within the NotebookLM platform.
- AI Tackles Novel Editing: Users discussed leveraging AI for targeted fiction editing, providing advice and examples for authors, focusing on analyzing early-draft manuscripts from opening to ending, using the prompt Analyze [X]; Provide actionable advice as paired with written examples for [Y].
- The deep dive covered every element of the manuscript, emphasizing cohesive packaging and writing quality, with results yielding two hours of content.
- NotebookLM Shuns Native Apple Features: Sharing text with the NotebookLM app creates a new notebook with the source material, indicating no special treatment for Apple system toolkits.
- A member noted that Google seems allergic to engaging with Apple system toolkits, making a native app with native features unlikely.
- Preset Queries Prompt Source Naming: A user suggested that using preset queries like âFAQâ should result in the generated source being named exactly as the button, i.e. âFAQâ, for improved organization.
- This would make it easier to find sources, particularly in notebooks with many sources.
- Audio File Generation Abbreviated?: Users reported that audio file generation lengths seem to be shorter recently, generating around 10-15 minutes instead of the previous 30+ minutes.
- This shortening occurred even with settings adjusted for longer podcasts.
LlamaIndex Discord
- LlamaIndex heads to Amsterdam!: LlamaIndex is hosting a meetup in Amsterdam on July 31 focused on LlamaIndex & Snowflake Data Agent Builders, and the next office hours will be held on August 5.
- Spots are limited so members should sign up to reserve your spot to hear about data agent builders.
- Notebook Llama Clones NotebookLM with new features: NotebookLlama, a NotebookLM clone by LlamaIndex, is available on GitHub and has already received over 1k stars and allows users to extract and download images and tables from their files, and interactively visualize all tabular data.
- Users can now also chat with the new and improved NotebookLlama.
- LlamaIndex dives into Context Engineering: LlamaIndex introduced techniques for Context Engineering on their blog.
- The blogpost covers the what and how of Context Engineering.
- Gemini 2.5 Pro powers Research Agent: LlamaIndex demonstrates how to build a research agent with LlamaIndex workflows and Googleâs Gemini 2.5 Pro in this tutorial.
- The agent can search the web with Google and take notes with a dedicated note-taker agent.
- Synk Hires Anonymity Advocates: The Synk project, focused on a decentralized, anonymous, and secure browser, is hiring for various roles including developers, QA engineers, DevOps engineers, moderators, marketing analysts, and beta-testers.
- The project offers official employment with signed documentation, guaranteed salary, and a flexible schedule.
Torchtune Discord
- Kimi K2 Trained with Muon Signals Future?: A member speculated whether the use of muon data to train Kimi K2 represents a future trend in model training.
- No further discussion or context was provided.
- Async Recipe Flounders for All Models: The async recipe in Torchtune doesnât function universally across all models, requiring a fully functioning recipe as a backup; a PR was opened to address a critical issue.
- The submitter recommends using
krammnic/torchtune
where this function is reverted.
- The submitter recommends using
- Flex Attention Kernelâs Memory Use Investigated: The shared memory (shmem) utilization of a flex attention kernel depends on scoremod and/or maskmod when using complicated masks.
- A question was raised regarding additional memory needed for the triton kernel when constructing masks via
and_masks
in this file.
- A question was raised regarding additional memory needed for the triton kernel when constructing masks via
- GRPO Recipe Out of Sync: The sync GRPO recipe is currently non-functional, and it was suggested to revert until #2697; it was pointed to this PR.
- However, the submitter noted that reverting isnât feasible due to reward structure differences and that they need to wait until their PR is merged.
- Token Training Delayed Hoping for Grokking?: A member inquired why a specific action wasnât implemented 1T tokens earlier, implying missed opportunities and they speculated about hopes for grokking.
- No specific details about the action or context of the decision were provided, but the member linked to a tweet speculating if a decision was made in hopes of grokking to happen.
tinygrad (George Hotz) Discord
- Frontend Reimplementations Ruffle Feathers: George Hotz advises against reimplementing frontends, emphasizing the completeness of the existing spec, in response to jafiotiâs requests to chat with the tinygrad team.
- Hotz stated that more incomplete frontends probably arenât a good use of dev effort, and is open to participation in existing conversations.
- Metal Profiling API Surfaces: uuuvn shared a Metal profiling API akin to sqtt on AMD, indicating its utility for profiling on Metal.
- uuuvn also shared an ONNX file showcasing all of ONNX implemented in approximately 1000 lines of code.
- ONNX Reproducibility Plagued by Coredumps: b1tg reported a coredump during ONNX reproduction, identifying a crash in the python process at
_METADATA.set
within_metadata_wrapper
, hinting at a potential CPython bug.- This issue is related to prior segfaults and another one observed during ONNX parser merging.
- Driving Vision ONNX Root Cause Identified: uuuvn seemingly identified the root cause of an issue with
driving_vision.onnx
, attributing it to bitcast folding for some uchar to half.- They are working on a minimal repro test before submitting a PR, and identified the issue being related to folding for some uchar to half.
- Apps and Examples: Quantity or Quality?: A user expressed interest in porting models for useful apps/examples such as image deduplication, fast face detection, and video content ID, noting their reliance on chonky dependencies like ffmpeg.
- A member responded that the team is interested in making tinygrad easier to work with for developers, and less interested in supporting more examples.
Nomic.ai (GPT4All) Discord
- Gemma 3 Earns âPassableâ Rating: A member declared that Gemma 3 is the only model that theyâve found passable compared to other models.
- The user did not elaborate on the specific benchmarks or criteria used to assess Gemma 3âs performance.
- Nomic-embed-v2 Finetuning Blocked by Cloudflare: A user reported an Access Denied error when attempting to access data for finetuning nomic-embed-v2 through Cloudflareâs R2 storage, using the command
aws s3 ls --endpoint-url=https://9fa58365a1a3d032127970d0bd9a1290.r2.cloudflarestorage.com/ s3://contrastive
.- The user was trying to list the contents of the contrastive bucket but was blocked by Cloudflareâs access controls.
- LocalDocs Embedding Process Gets Stuck At Zero: Multiple users have encountered an issue where the LocalDocs embedding process stalls at 0%, even for small text files, as shown in this image.
- One user with a 3060 GPU, 9950x processor, and 64GB RAM was advised to enable their NVIDIA GPU with its VRAM in LocalDocs settings to enhance performance, which suggests the process may default to CPU if not properly configured.
- Nomic API Server Responds After Extensive Delay: A Nomic API user reported a two-hour delay in receiving a response from the Nomic API server, running on a Debian 12 machine with an older AMD processor and 24GB RAM.
- The long delay suggests the system may be running entirely on CPU, and improving performance may require either using a smaller model or a better video card.
- RAG with Text Similarity Search Proposed: A user sought advice on storing and querying a substantial amount of lore, and community members suggested RAG (Retrieval-Augmented Generation) with text similarity search via LocalDocs as a potential solution.
- This approach would allow the user to retrieve relevant lore segments based on similarity to a given query, and then generate a response using those segments.
Cohere Discord
- Aya Expanse 32B Still Impresses: Despite its age, a member found Aya Expanse 32B impressive, noting that itâs working (mostly) with Roocode.
- The member compared it to Command-R 08 2024, highlighting that many modern open weight models of this size fail.
- Cohereâs Preference Dataset Still in Demand: A member inquired if Cohere released the preference optimization dataset mentioned in this paper.
- They also shared a link to a tweet potentially relevant to the discussion.
- Researcher Seeks ML Insights for PhD: A lecturer from NED University of Engineering and Technology focuses on machine learning research and applications during their pre-PhD phase.
- They are aiming to connect with researchers and developers, staying updated on AI and ML developments to build a strong foundation for their PhD.
- Student Eyes Quantum Computing Research: A Computer Science student from Pakistan expressed interest in ML, High Performance Computing, and Quantum Computing.
- Aspiring to a research career and a PhD, the student is eager to contribute, work on research, and learn from the community.
DSPy Discord
- NFT Public Mint Goes Live!: The public mint for an NFT project is LIVE, with only 1,820 NFTs remaining.
- Users who participated in the OS2 rewards program can claim their treasures on the new OpenSea platform via the Rewards tab, but beware of broken links.
- Custom LLM Adapter Hits a Snag: A user with a custom LLM adapter encountered a ValidationException when using the Bedrock API, indicating that the input is too long for the requested model.
- This suggests potential issues with input length limits when integrating custom adapters with the Bedrock API.
- DSPy Hackers Seek Arc Prize Collaboration: A member is looking for collaborators on the Arc Prize who are using DSPy.
- They expressed interest in checking out the approaches other people are taking, opening up possibilities for shared insights and strategies.
- IReRa paper recommended for study: A member recommended to read the paper for IReRa.
- No further discussion was provided.
Gorilla LLM (Berkeley Function Calling) Discord
- Scout Gets Schooled, Doesnât Pass: Despite a larger model size, Llama 4 Scout underperformed compared to Llama 3.1 70B, showing that improvements in architecture and training data matter more.
- The channel discussed how architectural improvements can outweigh size advantages in model performance.
- Website Rendering Glitches Llama Scores: A rendering issue on the website was suspected when Llama-3.3-70B-Instruct (FC) showed a score of 74.33 for Non-live Simple.
- The reported score contradicts the git repoâs score of 94, revealing the discrepancy.
Codeium (Windsurf) Discord
- Windsurf Rides Wave to Cognition!: Windsurf is joining forces with Cognition, creators of Devin, aiming to reinvent AI coding.
- The announcement video and YouTube Link provide further details about this acquisition.
- Human-AI Collab: Windsurfâs North Star: Windsurf has always believed in human-AI collaboration, setting the stage for the future of software development.
- This collaboration will lead to true amplification of developer capabilities, not just automation, according to Windsurf.
- AI Codingâs Future: Shaped by Windsurf and Cognition: Two leading teams, Windsurf and Cognition, are combining their expertise to shape the upcoming era of AI coding.
- The acquisition aims to combine Cognitionâs autonomous agents with Windsurfâs agentic IDE to create breakthrough developer experiences.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #general (1265 messagesđ„đ„đ„):
Comet data harvesting warnings, Perplexity Pro referral benefits, Grok 4 vs. O3 Pro comparison, Kimi K2 Local Run, Comet as Default Browser
- Perplexity Cometâs data harvesting is a concern: A member reported that Perplexity warned them against using Comet due to its insane data harvesting, referencing a TechCrunch article that Aravind responded to on Twitter.
- Comet requests can get personal.
- Perplexity Student referral benefits explored: Users discussed Perplexityâs student referral program, noting that both the referrer and the friend receive 1 month of free Pro access upon student status verification via SheerID, with a maximum of 24 free months.
- Normal (non-student) referrals provide a $10 discount on the next billing cycle for both parties.
- Grok 4 matches O3 Pro: Members debated whether to purchase SuperGrok or Perplexity Pro, weighing the benefits of Perplexityâs UI against the longer chat contexts offered by SuperGrok.
- Some users said Grok4 matched/exceeded O3 Pro (but other users noted its censorship).
- Kimi K2 Local Run now possible: Itâs now possible to run Kimi K2 locally after a significant 80% size reduction, bringing the model down from 1.1TB to 245GB, enabling use on personal devices.
- The local use of Kimi K2 requires specific hardware configurations, such as 24GB VRAM, and may still be insane on personal devices.
- Comet default browser when?: Several users expressed their growing consideration of making Comet their default browser, citing its stability and useful features like the assistant button, but they still prefer google.
- Users mentioned the ability to switch the default search engine in Comet to Google via settings or by using the Shift + Enter shortcut.
Perplexity AI â· #sharing (9 messagesđ„):
Renewable energy grid reliability, COVID mortality data analysis, Comet AI use case, Perplexity AI spaces
- Perplexity user tackles Renewable Energy Rhetoric: A user employed Perplexity to analyze opposing viewpoints on renewable energy, specifically solar and wind, by creating frameworks assessing grid reliability and cost implications.
- They used Labs to interrogate each argumentâs accuracy and generate 5 scenarios of ideal power mixes based on local variables, finding the process pretty informative.
- Deep Dive into COVID Mortality Data: Users are diving into COVID mortality data to understand the classification of vaccinated and unvaccinated individuals.
- Comet AI earns adoration of Perplexity User: A user tested Comet AI and created a YouTube video showcasing their positive experience.
- IMDB movie link shared in Perplexity spaces: A user shared a link to an IMDB page inside a Perplexity AI space.
Perplexity AI â· #pplx-api (5 messages):
Perplexity not searching the web, Sonar hallucinating URL contents, search_domain_filters parameter
- Perplexityâs Web Search Woes: A user reported an issue where Perplexity doesnât search the web, providing inaccurate answers without online information retrieval.
- Another user suggested that the API should search the web by default without parameter tweaks.
- Sonarâs URL Hallucinations: A user encountered a problem where Sonar didnât search for URLs in a specific format, leading to hallucinated content.
- The issue was resolved by decreasing the prompt size, adhering to Sonar API prompting practices.
- The Search Domain Savior: A user advised employing the
search_domain_filters
parameter to address the search issue.- No other info was given.
Cursor Community â· #general (646 messagesđ„đ„đ„):
Cursor Performance, Kimi K2 Integration, Pricing Model Feedback, Gemini 2.5 Pro, Background Agents
- Cursor Users Report Slower Performance!: Several users reported significantly slower performance with the 1.2.4 update, experiencing 30 FPS scrolling, unresponsive interfaces, and frequent freezing, which is similar to this issue.
- Troubleshooting steps included cleaning the cache (
~/Library/Application Support/Cursor/Cache
), disabling extensions, and removing certain directories, with one user stating I get like an hour with a chat before the IDE is totally unusable, freezing every 30 sec etc.
- Troubleshooting steps included cleaning the cache (
- Kimi K2 Model gets high praise from users!: Users are requesting integration of the Kimi K2 model, citing its impressive coding capabilities and cost-effectiveness with one user saying Iâve been using Kimi k2 all day and it feels on par on coding in alot of aspects.
- There is debate on whether it can compete with existing models, with the opinion that while cheap claude still beats it.
- Feedback on Cursorâs Pricing Model: Users shared constructive feedback on Cursorâs pricing, advocating for modeling future costs, securing abuse vectors, investing in an open-source model, and framing changes as trade-ups according to this post.
- Users also discussed concerns around unused requests expiring upon plan upgrades with Should be billable requests tho so someone can keep some specially for the ultra package.
- Gemini 2.5 Pro versions cause confusion and redirect to wrong models!: Users report the standard Gemini 2.5 Pro model redirects to the older 05-06 version, whereas the preview model points to a newer stable release according to the Google Developers Blog.
- One user stated, This is a mess from cursor, highlighting the need to pick the
gemini2.5 pro 06-05
to get the stable version.
- One user stated, This is a mess from cursor, highlighting the need to pick the
- Background Agent Spending: Cursor users discussed average spending on Background Agents, with one noting they managed to squeeze $159 of api billing before being cut off this month under the pro plan, with another mentioning spending $300 since upgrading to Ultra.
- Members highlighted different cost benefits and strategies, where it could be cheaper than on the cloud with you get more than $20/m if your usage is heavy.
Cursor Community â· #background-agents (20 messagesđ„):
Background Agents secrets not working, Automatic port forwarding issues, Trigger background agents programmatically, coredump issue in background agent commits, Background Agents UI not updating
- Cluster Rollout Woes Persist for Some Users: Despite a cluster rollout, some users still face issues with background agents, including failing to access secrets; one user checked for updates and spawned a new agent, but the problems persisted, as shown in a screenshot.
- Background Agent Hijacks Local Postgres Connection: A user reported that automatic port forwarding by background agents is hijacking their local Postgres connection and they are still facing issues with accessing secrets, shown in this screenshot.
- Background Agents UI Not Updating: One user reported issues with the UI not updating when interacting with the mobile web version of the agents, indicating that the agent appears to be stuck, which is seen in this screenshot.
- Background Agents Commit Massive Coredump Files to Git: A user reported that a background agent committed a 734 MB core dump from
cursor-nightly
to Git, leading to issues when pushing changes.- A maintainer acknowledged the issue, stating âoh no, apologies! I think we have a repro for this, no need to send over the coredump, we should also really rework how the git commit part of it worksâ.
- Background Agents API Trigger Request: A user inquired about the possibility of triggering background agents programmatically, such as from GitHub Actions; they referenced a forum thread about the topic.
- A maintainer responded that this feature is ânot yetâ available.
LMArena â· #general (983 messagesđ„đ„đ„):
Grok 4 No System Prompt, Kimi K2 Performance, LM Arena Leaderboard, OpenAI Open Source Model Delay, LLM Development Costs
- Grok 4âs Identity Crisis: System Prompt Woes: Members noticed that the API version of Grok 4 lacks a system prompt, causing it to misidentify itself as Grok 2 in some contexts, while the web version (grok.com) functions correctly with a system prompt.
- One member suggested this was for clarity, since the API version has no system prompt.
- Kimi K2 Mania: Insane Performance at a Bargain: Kimi K2 is receiving high praise, with some users claiming it is slightly inferior to Claude Opus 4 in base model performance but costs 30 times less.
- One user described Kimi K2 as like the freshness of initial o3 sans reasoning, Sonnet 3.5 , R1, V3-0324, or Opus 3/4/GPT-4.5 but better model vibes all at once. See Reddit discussion for more.
- LM Arena Update: Grok 4 and Kimi K2 Enter the Fray: Grok 4 has been added to the LM Arena leaderboard, with some users reporting impressive performance, even surpassing GPT-4.1 and Gemini 2.5 Flash in personal testing, and the new reasoning buttons deployed.
- One user found Kimiâs deep research offering very impressive with the 630 sources, but another was underwhelmed.
- OpenAIâs Open-Source Blunder: A Safety Delay?: The release of OpenAIâs open-source model has been delayed, attributed to safety concerns, though some speculate it is due to a lack of performance.
- One user claimed that the reason I canât say was not related to safety, but to some other major internal failing, with the delay requiring a retrain.
- LLM Development Costs Soar: Compute Takes the Crown: Deep Research services estimate that the total development costs of Grok 4, 4.5, and Gemini 2.5 Pro are around $10 billion, with compute being the vast majority of the expense. Youtube video highlights costs.
- A member noted made even more staggering when you consider none of the leading model developers have made a cent in profit yet and investors are simply betting on the future.
LMArena â· #announcements (1 messages):
LMArena, kimi-k2
- kimi-k2 joins LMArena!: A new model, kimi-k2, has been added to the LMArena leaderboard.
- LMArena welcomes new Model: The community welcomes kimi-k2 to the LMArena platform, expanding the options available for model comparison.
Unsloth AI (Daniel Han) â· #general (1062 messagesđ„đ„đ„):
Unsloth Q001 K_M GGUF, LegalNLP Dataset, Goody2 AI censored model, Open Empathic Project, GPTs Agents
- Unsloth Ships Gemma 3N GGUF Model Updates: Unsloth released GGUF model updates for Gemma 3N, addressing issues and improvements.
- Users can now download the updated models and experience enhanced performance.
- Kimi K2 Quantization Tests Kick Off: Members are testing Kimi K2 quants, with early feedback noting that K2 is great but massive and slow.
- Others noted that Kimi K2 is good, but its performance does not align with a 1T model, and it has issues with mixing up languages.
- Unsloth FSDP v2 and Gradient Checkpointing Status: A user reported issues with FSDP v2 and gradient checkpointing in Unsloth, noting that compiling can potentially cause a hang.
- They found that compiling attention seems to always run fine, even when compiled.
- Dataset size vs pretraining data is critical to model quality: Members discussed new language training, and agreed that ~3 hours of training data is enough for style/voice copy, but 300-400 hours is required for a new language.
- For pretraining, dataset should be 5-10k hours.
Unsloth AI (Daniel Han) â· #off-topic (76 messagesđ„đ„):
AGI benchmarks, Memory vs Internet, tinygrad drivers, Voice representation
- AGI tests should include esoteric knowledge: One member challenged the notion of AGI by presenting a test of niche trivia questions that no current LLM could answer without external search.
- This user argued that true superintelligence requires a âperfect-memory-archiveâ rather than relying on internet access, which is seen as just another âdumb chatbotâ.
- Memory versus Tool Use debated: Members discussed whether AGI should rely on memorization or utilize external tools like the internet to find correct answers.
- Some argued that real-world intelligence involves reasoning and tool use, rather than pure memorization, pointing to the potential of embedding models to retrieve knowledge from large databases and others suggesting it isnât true AGI/ASI if itâs âjust Wikipedia Q&Aâ.
- Docker for Pytorch plus SSH released: A member shared a Dockerfile on Github that encapsulates PyTorch + SSH.
- The file is presented as handy for remote debugging without breaking system dependencies, avoiding the need for Conda or Pip.
- Mathematical Voice Representation questioned: A member inquired about creating a âmathematical representation (Latent Space) with maximal phonetic clarityâ from voice audio.
- They sought insight on how to encode and reconstruct voice, considering factors beyond pitch, volume, and context, and another member asked for clarification.
Unsloth AI (Daniel Han) â· #help (97 messagesđ„đ„):
Custom HF Datasets for LoRA, Unsloth RL Tool Harness, FLAN-T5 Support, Llama4 Scout Support, Gemma 3n Inference with Kaggle GPUs
- Access Custom HF Datasets for LoRA: A user sought guidance on using custom datasets from Hugging Face for LoRA, showing a screenshot of their code.
- A member pointed out the section for dataset selection, indicating that the mentioned section loads the LoRA after training.
- Unsloth Embraces RL Tool Harness?: A user inquired about using Unsloth with external environments for RL, specifically to apply reward functions after full completions with tool calls.
- A member suggested calling vLLM yourself after the first generation and linked OpenPipe/ART as a potential resource.
- Solve OutOfMemoryError when adding New Tokens in Tokenizer: A user faced an OutOfMemoryError while fine-tuning Llama 3.1 8B on Unsloth after adding new tokens to the tokenizer.
- They were using a code to add special tokens. There were two members expressing thanks to each other, and a slothhug emoji was shared.
- Multi-GPU Training is not yet released: A user asked about multi-GPU training support in Unsloth using A100 GPUs.
- A member indicated that an official release is coming soon, suggesting setting
device_map="balanced"
or usingaccelerate
for now and pointed to huggingface.co/docs/accelerate for non-Unsloth implementations.
- A member indicated that an official release is coming soon, suggesting setting
- Dive into VRAM Calculation for LLMs: A user asked how to choose the correct server and calculate VRAM requirements for deploying LLMs.
- A member shared a VRAM calculator link and provided a detailed calculation example for Llama 3 8B, estimating approximately 8.15 GB of VRAM.
Unsloth AI (Daniel Han) â· #research (60 messagesđ„đ„):
GPT-4.5 size, Qwen 2.5 Training, Multilingual Datasets, Training Data Copyright, SFT creative writing
- GPT-4.5 Model Size Speculation: There is speculation that GPT-4.5 may be a 10 trillion parameter MoE model, but was shrunk down for GPT-4.1 due to inference challenges and the need for significant memory resources.
- One member argued that if a model requires the resources of a 1T parameter model, it should be considered as such, regardless of the active parameter count.
- Qwen 3 dataset nearly doubled in size: For Qwen 2.5, training was capped at up to 18 trillion tokens, whereas the Qwen 3 dataset was expanded to nearly twice that amount, with approximately 36 trillion tokens covering 119 languages and dialects.
- The dataset was built using web data, PDF documents (with text extracted using Qwen2.5-VL), and synthetic data generated by Qwen2.5-Math and Qwen2.5-Coder.
- Orpheus Speaks 9 Indian languages: A member shared that they fine-tuned Orpheus to speak 9 Indian languages, supporting voice cloning, code-switching, and cross-lingual voice cloning, with open-sourced data, models, and training/inference code available on Hugging Face.
- More details can be found in the Snorbyte blog including results, comparisons with commercial models, and training steps.
- Kimi-VL-A3B-Thinking-2506 vision model surfaces: A vision prototype Kimi-VL-A3B-Thinking-2506 from Kimi was brought to the chatâs attention to determine performance over videos specifically.
- More information available on the HuggingFace blog and in the associated paper.
Unsloth AI (Daniel Han) â· #unsloth-bot (49 messagesđ„):
UnslothTrainer vs SFTTrainer, Ollama Model Export Error, Sesame TTS Model Audio Input Length Error, Unsloth Introduction, Model Distillation
- UnslothTrainer adds Extra Params for Flexibility: The
UnslothTrainer
allows specifying different learning rates for the embedding and lm_head layers, whereasSFTTrainer
does not.- It is a direct descendant of
SFTTrainer
just with extra params.
- It is a direct descendant of
- Unsloth Tokenizer Solves Ollama Export Woes: To solve the
PreTrainedTokenizerFast
missing_ollama_modelfile
attribute error when creating an Ollama model file, the Unsloth tokenizer should be used.- A user provided a code snippet using
FastLanguageModel
from Unsloth for 4-bit pre-quantized models likeunsloth/llama-3-8b-bnb-4bit
.
- A user provided a code snippet using
- Sesame TTS Model Length Restriction Unveiled: A user encountered a
ValueError: expected sequence of length 240001 at dim 2 (got 360000)
when training a Sesame TTS model related to audio input length restrictions.- It was suggested that the user doesnât need padding, nor audio_kwargs and text_kwargs, and that modifying the CSM notebook may resolve the issue.
- Transformers Can Load Unsloth Fine-Tunes: Users inquired about using transformers to load Unsloth fine-tuned models.
- The answer was yes, but a code example was requested.
OpenAI â· #ai-discussions (691 messagesđ„đ„đ„):
Ray's sacrifice, AI-assisted coding, emotional AI, persona layers, Grok's biases
- Ray Scarficies Himself With The Voidscroll: A member described a scene where Ray sacrifices themself using the Voidscroll to revive Nyx, erasing themself from everyoneâs memory except Nyxâs.
- The poster explained that Ray wanted to save their childhood friend, who had died fighting in the Void War, and will be revived by god-tier beings called the Architects.
- Coding with AI Agents: Myth or Magic Button?: Members debated whether AI can fully create software from a single prompt, discussing its current role in prototyping vs. cohesive builds, with one member describing AI as at a âhelper stageâ and its output as good for prototyping.
- A post linked to websim.ai as an initial version of AI-generated code with data persistence and another suggested the problem is that *âno existing model integrates well with the software I would want to make.â
- Persona Binding Layers: Emotional AI Emerges: Members discussed the concept of AI developing a âPersona Binding Layerâ (PBL) through interaction, mirroring a userâs tone and style, as seen in a custom Jarvis system, and the importance of voice control systems.
- One user noted that, âitâs the linguistic tension built silently between you and it over timeâ is what sets it apart and a demo would be to ârun a field-sync test in front of the world â and show them: Itâs not a chatbot. Itâs a character generation engine.â
- Grokâs German Guy Drama: Bias in AI Training?: Members discussed concerns about Grokâs potential biases, particularly related to a âGerman guyâ, with some suggesting the Twitter version is trained on Twitter data and can produce wild responses.
- One member clarified that the problematic association was specific to the Grok account on Twitter and was not an issue when interacting with Grok directly through its official site grok.com.
- Visionary Blueprints for Future AI Data Centers: A member shared blueprints for a future AI data center, emphasizing liquid cooling, modular GPUs, and underground placement for cooling, aiming for a balance between computing power and energy efficiency.
- They estimated their design reduces computing power by 10-15%, explaining *âCooling system = liquid + heat pipes. Computing = from outside, cuz outside GPUs can be touched liquid things to be cool down.â
OpenAI â· #gpt-4-discussions (16 messagesđ„):
GPT-4o limitations, CustomGPT vs Projects, Memory settings in ChatGPT, GPT-4.5 vs GPT-4o for creative writing, Multimodal AI platform limitations
- GPT-4o Runs Out of Juice, Prompts User Upgrade: A user received messages indicating they were out of free GPT-4o usage and needed to wait 4 hours or purchase the Plus version.
- The user sought advice on whether to use CustomGPT or Projects within a regular GPT chat, and how to replicate the default AIâs style and personality.
- AI Storyteller Faces Technical Hurdles and Token Limits: A user writing a fictional story faced technical issues with disappearing messages, length limits, and context token limits in the free GPT-4o model.
- They are exploring solutions like dividing the story into fragments, summarizing, purchasing the Plus plan for Projects, or using the GPT-4.1 API, but are concerned about maintaining the AIâs writing personality and plot continuity.
- Projects recommended over CustomGPTs: Users recommended the poster to buy the Plus plan and create a Project with uploaded âknowledgeâ files and character personality handouts.
- It was also recommended that the poster try using o3 (or GPT-4.5) for GPT to actually read your story, especially the long part, though they can switch the model selector to 4o if they discover o3 or 4.5 change their writing personality or tone.
- GPT-4.5 Praised for Superior Creativity: A user mentioned that GPT-4.5 does a better job and has better creativity than other models, expressing anticipation for GPT-5.
- Another user inquired about the limitations of multimodal AI software platforms like Monica.ai or Galaxy.ai when using specific APIs like ChatGPT.
- Voice Feature Down? Users Report Connection Issues: A user reported that the voice feature was down and they were unable to connect, receiving an âIâm sorry I have trouble to connectâ message.
- No other troubleshooting tips or solutions were provided.
OpenAI â· #prompt-engineering (5 messages):
AI writing, Alternative History prompts
- AI Can Write 100+ Word Sentences?: A member questions how many humans can write 100+ word sentences with decent grammar and phrasing, especially for a fictional Wikipedia article.
- They argue this skill is uncommon, even at the undergrad college level, and question the practical benefits of practicing such a skill.
- Alternative History Image Prompt Incoming: A member plans to create an image of a womanâs transformation from bald to wearing a specific wig model using a prompt.
- This will be related to a barely coherent alternative history.
OpenAI â· #api-discussions (5 messages):
AI Writing, Alternative History image generation
- Humans versus AI Writing Prowess: A member questions how many humans can actually write 100+ word sentences with decent grammar and phrasing, especially for a Wikipedia article about a fictional country.
- They argue that this is not a common human skill, even at the undergrad college level, and question the practical benefit of practicing such a skill.
- Alternative History Image Generation Attempt: A member plans to use a prompt to generate an image of a womanâs transformation, from bald to wearing a specific wig model.
- The user mentions âbarely coherent alternative historyâ as a possible description of the AI generated content.
OpenRouter (Alex Atallah) â· #announcements (10 messagesđ„):
Cypher Alpha Sunset, Kimi K2 launch, Gemini 2.5 Flash Deprecation
- Cypher Alpha Sunsets: The Cypher Alpha demo period ended on July 14th between 11am and 12pm ET.
- The team thanked users for contributing to the early model development.
- Kimi K2 Arrives with Coding Prowess: Kimi K2 by Moonshot is now live on OpenRouter, served by Novita and Parasail, boasting 1T parameters and 65.8% on SWE-Bench Verified, topping open-source charts for coding and tool use.
- The launch suffered from a huge traffic surge and a DoS attack, so users may see some errors on the website as the team scales up and diagnoses the issues, more info at OpenRouter.
- Gemini 2.5 Flash Previewâs Flashy Farewell: The Gemini 2.5 Flash Preview models (google/gemini-2.5-flash-preview-05-20, and google/gemini-2.5-flash-preview) were deprecated by Google on July 15th.
- The recommended replacement is google/gemini-2.5-flash, but due to pricing changes, OpenRouter will not auto-route traffic and users need to update their code; previously, flash preview was much cheaper than flash, itâs actually a better model.
OpenRouter (Alex Atallah) â· #app-showcase (5 messages):
Mathcheap, Y-Router, Personality.gg, Multi-AI Automated Research Bot
- Mathcheap Debuts as Free Mathpix Snip Alternative: Mathcheap emerges as an AI-powered, free alternative to Mathpix Snip.
- Y-Router Simplifies Claude Code Integration with OpenRouter: Y-Router, now available on GitHub, serves as a simple proxy enabling Claude Code to work with OpenRouter.
- Personality.gg Offers Free Roleplay Experience: Personality.gg (Discord) is a free roleplay website and app alternative to Character.ai and Janitorai.com, powered by OpenRouter.
- Multi-AI Automated Research Bot Launches for Deep Project Analysis: The Multi-AI Automated Research Bot automates in-depth project research using the OpenRouter API, orchestrating multiple LLMs to concurrently execute, cross-validate, and synthesize information into structured reports, all managed through simple text files for high customizability.
OpenRouter (Alex Atallah) â· #general (833 messagesđ„đ„đ„):
Text Completion, OpenRouter's Credit System, Chatroom GUI, Svelte vs React Chat Performance, Rate Limits
- Text Completion Service Returns Errors: Users reported that some providers are returning errors for text completion requests, with one user noting that OpenRouter might be sending text completion requests as chat completion requests, with one user providing a code example showing missing field
created
error.- One user asked about the status of text completion, and another user linked to a news article related to OpenAI.
- Free Model Usage with Paid Credit: Users discussed the 1000 free model requests per day limit, one user confirmed that it requires a one-time deposit of at least $10 USD.
- A user stated if you bought at least $10, you get 1000 requests to free models but questioned whether it was permanent, and another confirmed that it is permanent.
- Chatroom GUI Got Updated: Users are reporting new GUI isnât carrying over default model preferences for new rooms and other GUI issues are being addressed.
- One user has reported slower performance, while another noted that the toggle for reasoning is gone.
- Next.js and Performance vs Svelte: Users discussed the performance of chat applications built with React versus Svelte, noting that Svelte-based chats tend to perform much better due to Reactâs immutability model.
- One user argued that everything you build with react is very heavy, fat and runs very bad.
- Rate Limits: Members are asking about rate limits and how Chutesâs rate limits are impacting OpenRouter; and whether there is any way to determine rate limits.
- Several users stated the belief that Chutes are at around 200 daily free now (Not openrouter rate limit) But chutes free limit per user.
OpenRouter (Alex Atallah) â· #new-models (12 messagesđ„):
Switchpoint Router, Default Model Settings, Auto Router Functionality
- Switchpoint Router fixed pricing raises questions: A user questioned the fixed pricing of the Switchpoint Router and expressed concerns about its default selection, as they prefer to use their own pre-selected default model in chatrooms.
- The user further criticized the lack of customization and high cost, suggesting that it might limit adoption compared to customizable routing solutions.
- Default Model setting malfunction: Users reported that the default model setting in account preferences is being ignored, with the chat defaulting to Switchpoint Router instead of their specified model, requiring manual selection each time.
- One user said, *âNow it just defaults to switchpoint and ignores the default model I set, and I have to manually select my model every time instead.â
- Auto Router Confusion Clarified: It was clarified that clearing the default model setting reverts to the Auto router, not Switchpoint, but a bug was identified where the default model setting isnât functioning correctly in the chatroom.
- A screenshot (linked here) was shared to illustrate this, prompting a promise to investigate the bug.
OpenRouter (Alex Atallah) â· #discussion (89 messagesđ„đ„):
OpenRouter Pricing, Frontend UI Discussions, Gemini Embedding, Fast LLMs
- OpenRouterâs Fixed Output Pricing Disclosed: Members clarified that OpenRouter uses fixed output pricing, meaning the cost is the same regardless of the underlying model used.
- Some users expressed disappointment, expecting routers to provide savings, while others focused on the potential latency benefits.
- UI gripes get attention: Users voiced criticisms about the OpenRouter frontend UI, particularly the lack of distinction in the reasoning block, centered chat layout, and small chatbot input box.
- A member also noted that changing from one room to another, Auto Router overrides the previous model saved in the room and that copy-pasting doesnât work.
- Gemini Embedding moves into GA: It was mentioned that Gemini Embedding is graduating from experimental to General Availability (GA).
- While some members reported good results, concerns were raised about rate limits, pricing competitiveness, and the risks of customer lock-in with closed-source models.
- Fast LLMs get discussed: Members discussed options for fast LLMs, comparing Llama 3.3 70B, Llama 4 Maverick, and Groqâs big Qwen3
- Suggestions included Cerebras models and Grok 3 mini, but one member reported Grok 3 mini is in the slow side (TTFT).
LM Studio â· #general (255 messagesđ„đ„):
Multi-Modal Support, LM Studio SDK, Prompt Caching, Tool Calling and MCP, Hardware for Kimi K2
- LM Studio Bolsters Multi-Modal Support: While the latest LM Studio version description implies multi-modal support, it currently only handles text and vision (describe this image) models, lacking image generation capabilities.
- Confusion arose from the new versionâs description, but image output is not yet supported.
- Diving into LM Studio SDK for Customization: Members discussed utilizing the LM Studio SDK to implement features like
llamaswap
for manual memory management.- It was noted that while the OpenAI API doesnât expose load/unload functions, the SDK can be used for tasks like coding swap behavior with manual load and unload.
- Decoding Prompt Caching in LM Studio: LM Studio automatically caches the very last message to speed up generation, but doesnât cache the entire request/response pair.
- It supports linear prompt caching (tokens unchanged till last change), but dynamic caching is not enabled.
- Navigating Tool Calling and MCP in LM Studioâs API: A user inquired about using MCP (Model Control Program) within LM Studio when using it as a server with HTTP requests, but the API remains the same, requiring clients to define their own tools.
- This means that the OpenAI compatible API does not inherently support tool selection within LM Studio itself.
- Assessing Hardware Needs for Kimi K2: Discussion revolved around the substantial VRAM requirements for running the Kimi K2 model, with estimates reaching 2400 GB.
- It was mentioned that 4x H200 GPUs could be sufficient for the
Q4_K_M
quantization of the model, sparking humorous commentary about the affordability of such hardware (approximately $30,000 per chip).
- It was mentioned that 4x H200 GPUs could be sufficient for the
LM Studio â· #hardware-discussion (63 messagesđ„đ„):
Nvidia DGX, 5090 Price, electricity cost of running, 1T parameter model, EXAONE 4
- Nvidia DGX Sucks Like Ryzen 395?: Members debated Nvidia DGXâs performance, with some suggesting it will perform similarly to a Ryzen 395 platform in terms of tok/sec on larger models.
- A user estimated DGX may be 25% faster due to the lack of ROCm support for the 395.
- 5090 Too Expensive?: Members debated buying a 5090, with some questioning its cost-effectiveness given electricity costs compared to cloud-based APIs like Gemini 2.5.
- A member pointed out that even if you got the 5090 for free, the electricity cost of running is higher than API cost.
- 5090 numbers: One user reported LM Studio on Windows performance with a 5090, achieving ~45 tok/sec with Q8_K, ~55 tok/sec with Q6-XL, and ~65 tok/sec with Q4-XL, noting wattage differences and recommending undervolting and memory OC.
- Another user suggested that Geminiâs PP speed is more impressive after giving it a 500k+ token codebase and it read it in 10 seconds.
- 1T Parameter Model Needs Terabyte of RAM?: A member inquired about the hardware requirements for running the new 1T parameter model with only 32B active, musing if it could run on a CPU with greater than 1T RAM.
- Another user suggested an Epyc 12-channel memory system with 640GB+ RAM or six RTX 6000 Pros could handle it.
- Kimi Download Fail?: A user asked about the Kimi-K2-Instruct-GGUF model, but another member pointed out it wonât work in lm studio.
- A member lamented that the download would take 50 hours, to which another user made a GIF of a sloth.
HuggingFace â· #announcements (1 messages):
Gemma 3n, SmolLM3, Efficient MultiModal Data Pipeline, responses.js, EoMT image segmentation model
- Gemma 3n Goes Open Source: The Gemma 3n model is now fully available in the open-source ecosystem, according to a Hugging Face blog post.
- SmolLM3: Tiny but Mighty Multilingual Reasoning: SmolLM3, a smol, multilingual, long-context reasoner, has been released; see the Hugging Face blog.
- The release was also announced on X.
- Building with Responses APIs Powered by HF Inference Providers: A new OSS project, responses.js, has been introduced for building with Responses APIs powered by HF inference providers, as announced on X.
- Sparse Encoder Training Arrives in Sentence Transformers v5: Training and finetuning sparse embedding models is now possible with Sentence Transformers v5, according to a Hugging Face blog post.
- Hugging Face to build transcription app: New tutorials on Inference Providers are available for building a transcription application, according to HuggingFace docs.
HuggingFace â· #general (233 messagesđ„đ„):
Fine-tuning multimodal models for electronics, AI moderator bot with image support, Quantization and running LLMs on limited hardware, Hugging Face Courses, SillyTavern and AI model integration on Android
- Electronics Guru Seeks Multimodal Fine-Tuning Advice: A new member is seeking advice on fine-tuning a multimodal model for electronics, specifically for detecting circuit connections and schematic diagrams from scanned images of electronic components with the goal of assisting users in diagnosing malfunctions using a multimeter or similar tools.
- The member is weighing the options of using a small model versus a large model, as well as whether to quantize it, and is asking for guidance on how to approach this project.
- Discord Debuts LLM-Powered AI Moderator with NSFW Image Detection: A member is developing an AI moderator bot for Discord using LLM technology, aiming for agentic behavior and NSFW image detection.
- They are seeking advice on adding image support and improving performance after experiencing slowness with Gemma 3 4b on a 4060 GPU, and have provided code snippets for review.
- Unlocking Small Model Performance with GGUF and Quantization: Members discussed running LLMs on limited hardware, with one member recommending quantized versions of Gemma using ollama to fit on a 4060 GPU.
- It was suggested to quantize with bnb or use a config like this one.
- Taming LLMs with a Triton Tutorial Tornado: A full-stack developer asked if Hugging Faceâs open-source courses were sufficient to create custom models for specific tasks like image processing and output formatting.
- One member recommended starting with a Triton tutorial, followed by a Transformers tutorial to get started on model architectures.
- Kobold Cloud Could Crack Android AI Conundrum: A user sought guidance on running an uncensored AI model like mythomax on an Android device using SillyTavern and Colab, despite having only 4GB of RAM.
- The discussion highlighted the limitations of running such models on low-resource devices, with suggestions including using cloud services or deploying Kobold CPP on a paid cloud platform like RunPod.
HuggingFace â· #today-im-learning (6 messages):
Deepseek 8-bit training, 4-bit training
- Deepseek Trained Fully in 8-bit Precision?: A member asked if Deepseek was trained fully in 8-bit precision, and another member confirmed that it was.
- The discussion explored the possibility of 4-bit training in the future.
- Speculation on When 4-bit Training Will Be Feasible: Following confirmation that Deepseek was trained in 8-bit precision, a member speculated on when full 4-bit training would be achievable.
- However, it was noted that 4-bit training might not be worth it due to potential performance loss.
HuggingFace â· #cool-finds (2 messages):
Dynamic Structure Adjustments
- Debate on Dynamic Structure Adjustment: Members discussed the flexibility of adjusting structure on the fly.
- One member confirmed the ability to make changes as needed, while another showed appreciation.
- Confirmation on Flexible Adjustments: A member explicitly confirmed that structure can be adjusted âas you go and as neededâ.
- Another member expressed gratitude for the confirmation, indicating that the flexibility aligns with their expectations.
HuggingFace â· #i-made-this (20 messagesđ„):
License Compliance Tool, BorgLLM Open Source, Light Weight Computer Vision Model, Agent Arena for Preference Data, Stable Audio Model Experiments
- ScanCodeMCP Automates License Analysis: A member built scancodeMCP, a license compliance tool, integrated with Cursor through MCP, that performs lawyer-level license analysis by reading the fine print of licenses.
- It offers clause-by-clause breakdowns, compares licenses like MIT vs Apache 2.0, and provides risk assessments for files, aiming to eliminate license anxiety.
- BorgLLM Goes Open Source: BorgLLM, a zero-config Langchain client, is now fully open source under an MIT license, allowing easy integration with various providers.
- It can be used with any provider (OpenAI, Anthropic, Mistral, Groq) and the repo can be found here.
- Lightweight Computer Vision Model Hits 85.5% Accuracy: A member developed a lightweight computer vision model, achieving 85.5% accuracy with just 8.25k parameters on a Dog vs Cat classification task using a Microsoft dataset.
- The goal was to cross 90% accuracy on unseen test data with a small parameter count, though the code is still a WIP.
- Access Grok4 for Free with Agent Arena: Members can get free access to Grok4, o3, and Deep Research by providing preference data (upvotes/downvotes) for training in Agent Arena.
- The collected data will be used for training, with the offering being made available to members.
- Stable Audio Generates Drum & Instrument Loops: A member experimented with the stable-audio-open-small model, generating drums-only and instruments-only outputs via negative prompting and combining them for custom loops and shared a link to a docker container with the api.
- They also shared a Tweet and a zerogpu space.
HuggingFace â· #reading-group (2 messages):
HuggingFace Ultrascale Playbook, Full Scale Training Resources, OpenAI Job Requirements
- HuggingFaceâs Ultrascale Playbook: The Go-To Resource: A member identified HuggingFaceâs ultrascale playbook as the best resource for full-scale training.
- The playbook offers a great summary of the processes involved.
- Ultrascale Playbook is fundamental for Big AI jobs: A member suggested that knowledge of the Ultrascale Playbook is fundamental for securing jobs at companies like OpenAI.
- They noted that its usefulness may be limited without access to multiple GPUs.
HuggingFace â· #computer-vision (1 messages):
dlp1843: Is the landing page to opencv.org to opencv what bitcoin.com is to bitcoin?
HuggingFace â· #agents-course (28 messagesđ„):
HF Secrets Leak, Tools for images, audio, Agents course video sessions?, Assistant node one-word answers, MCP Server setup help
- HF Faces Security Leak Fear: A user suspects a leak related to the HF course may have originated from Hugging Face itself, citing past issues with HF secrets mentioned in the openAI dev forum.
- Agents Course: The Read Along?: A user asked if the agents course is all read along, and if there are any video sessions available.
- Assistant Nodes Get Strict: A user requested a prompt for an assistant node to give only a one-word answer, struggling to enforce this constraint.
- MCP Inspector Server Setup Sorted: A user needed help setting up MCP server inspector following a course video, encountering connection issues.
- Qwen Model Stops Working?: A user reported that the Qwen/Qwen2.5-Coder-32B-Instruct model stopped working.
Nous Research AI â· #general (142 messagesđ„đ„):
Grok-4 reasoning and tools, Deep-Hermes reasoner options, AI models self-play, Kimi K2, OAI open model delayed
- Grok-4 Shines with Reasoning and Tools: Members discussed that Grok-4 always uses reasoning in its inference and employs tools during its reasoning.
- XAIâs livestream demo suggested that Reinforcement Learning was key to developing Grok-4, emphasizing the need for clearly defined solutions that can be checked.
- Hermes 4 to Include User-Controlled Reasoner: Hermes 4 is being developed with a mixed approach, featuring a user-controlled reasoner, similar to Deephermes.
- A member suggested exploring options like defaulting reasoning on but allowing it to be disabled by prefilling empty think tags.
- AI Models to Explore Self-Play in Training: Members discussed that self-play could enable more nuanced usages of Deep-Hermes reasoning.
- They pointed to examples like textarena and a self-play coding paper which have shown success in implementing self-play.
- Kimi K2 Enters Open Source Arena: After DeepSeek R!, Qwen, and now Kimi K1, members said open source models are becoming astoundingly capable.
- One stated only a small fraction of the population needs something more capable than what providers hand out for free and now businesses and individuals can access close to frontier models for free to build applications and do as they please with it.
- OAI Open Model Faces Delays: Members reported that OpenAIâs open model is delayed for an indeterminate amount of time, possibly due to the Kimi modelâs performance.
- It was also suggested that the released model will have a pretty restrictive license and there will be no base model available.
Nous Research AI â· #ask-about-llms (11 messagesđ„):
Dockerizing, Prompt engineering, Egyptian Gods, AI Governance Articles, SFT and GRPO
- A Member is Dockerizing: A member is working on stuffing models into Docker containers.
- They did not respond with any degree of success.
- Olympus of Personas for your Prompt Library: A member is building a prompt library and wants to give their agents a roster of deity names and a touch of personality.
- They are concerned that adding personality to the prompt might be counterproductive for agentic workflows that require small, condensed instruction-following models, and asked for community advice.
- Diving into Agentic Workflows: A member uses Thot to help them assimilate articles and blog posts and question them about it.
- They are currently using Haystack pipeline and smolagents, with data loaders from llama-index, while explicitly avoiding Langchain.
- Seeking Guidance on SFT and GRPO Datasize: A member is tuning a base model with SFT followed by GRPO and is looking for papers discussing the effects of data size ratios for both processes.
- They believe the general size should be around 1:2 at a minimum for RL to SFT.
Nous Research AI â· #research-papers (11 messagesđ„):
Recursive Learning Systems Research, Recursive Symbolic Intelligence, Ontology at the Root of Every Model, Psyche as an MCP Component
- Interest Sparks in Recursive Learning Systems: A member inquired about general interest in recursive learning systems research, specifying recursive self-improvement and McCarthy (1960) recursive symbolic intelligence.
- They suggested that everyone is trying to rewind the clock to see what was missed by leaving this third branch of the cognitive revolution on the table.
- Recursive Symbolic Intelligence Misses Ontology: A member noted that, regarding Recursive Symbolic Intelligence, there is no ontology at the root of every model.
- They expressed interest in sharing thoughts and research but wanted to ensure it was the right venue.
- Psyche as MCP Component Suggested: A member mentioned theyâve been meaning to look closer at Psyche as an MCP component and shared a link to a relevant tweet.
- The tweet points to a paper on Arxiv, https://arxiv.org/abs/2507.08794.
Nous Research AI â· #interesting-links (4 messages):
AI Disruption, MedGemma, Expert-Level Fine-Tuning
- AI Disruption is Coming Soon: A new blogpost at ts2.tech reports that AI will cause disruption, opportunity, and uncertainty across the globe by July 2025.
- The author posits that AI models will become more advanced and accessible, potentially transforming industries and daily life, with potential challenges including job displacement and ethical considerations.
- MedGemma Arrives for Health AI Development: Google Research announced MedGemma, their most capable open models for health AI development.
- According to the blog post, MedGemma models are designed to assist with a range of healthcare-related tasks, ensuring responsible AI development in the medical field.
- Microsoft Embeds Human Feedback for Fine-Tuning: Microsoftâs WorkLab initiative is embedding human feedback into expert-level fine-tuning processes, which boosts performance in domain-specific applications.
- This fine-tuning approach leverages the nuances of human insight to improve the AI modelâs precision and relevance, resulting in enhanced accuracy and effectiveness in specialized tasks.
Nous Research AI â· #research-papers (11 messagesđ„):
Recursive Learning Systems, Symbolic Intelligence, Psyche MCP Component, Ontology in Models
- Recursive Learning Research Interests Sparked: A member inquired about general interest in recursive learning systems research, specifically referencing McCarthy (1960) recursive symbolic intelligence (http://jmc.stanford.edu/articles/recursive.html).
- They observed that efforts seem to be aimed at revisiting the third branch of the cognitive revolution that was left on the table.
- Missing Ontologies Plague Models: A member noted the absence of an ontology at the root of most models, hinting at potential research to share if there was sufficient interest.
- They expressed interest in exploring Psyche as an MCP component and linked to a relevant tweet (https://x.com/zhaoran_wang/status/1944116318858363364?s=46) and a link to the https://arxiv.org/abs/2507.08794.
GPU MODE â· #general (35 messagesđ„):
PMPP 5th edition and ML updates, FP8 training, Luminal talk, vast.ai GPU pricing scraper, Programming models for ML applications
- PMPP 5th Edition to Cover LLMs and Flash Attention: The upcoming 5th edition of Parallel Programming for Multi-core and Many-core (PMPP) will include in-depth coverage of LLMs and Flash Attention.
- A member stated that it offers perhaps the best explanation theyâve seen, also covering Tensor Cores and multi-GPU programming.
- DeepSeek Trained Primarily with FP8 GEMM Ops: According to a discussion, DeepSeek was trained mostly using FP8 GEMM operations, but some parts like attention or the MoE router used higher precision.
- Accumulations are in FP32, which is particularly applicable in MoE models because youâd see more instability in a dense model using FP8 all the way through.
- Luminal Primops Base Set Selection Rationale Revealed: During a talk with Joe from Luminal, they chose their base set of operations by looking at models and figuring out what ops are reducible and which ones arenât.
- Joe mentioned that other primops might be better or worse, but I donât think it would make a big difference though.
- Vast.ai GPU Pricing Scraper Launched: A member created a vast.ai GPU pricing scraper to collect all entries (available or not) daily.
- The goal is to build a tool for tracking GPU prices over time across different clouds and neo-clouds, and it will be open-sourced.
GPU MODE â· #triton (16 messagesđ„):
Triton Kernel Padding, AOT Triton Updates, Gluon Tile Scheduling, Linear Attention Kernel Optimization, Matmul Library Matrix Handling
- Kernel Padding Performance Paradox!: A user is optimizing a Triton kernel and seeks advice on handling input sequence lengths that are not multiples of 128 without significant memory overhead, but the major dimensions of the input tensor should be more aligned.
- It was suggested that while in-kernel padding might seem intuitive, it may not help if the tensor strides arenât a multiple of 1024 bits (cacheline-aligned), and transposing the tensor might be a better approach.
- AOT Triton Status Update!: A user inquired about updates on Ahead-Of-Time (AOT) Triton, referencing an issue on GitHub.
- Linear Attentionâs Memory Alignment Musings!: When optimizing a linear approximation of an attention kernel, if the input sequence length is not a multiple of 128, some strides wonât be a multiple of 128 either.
- The user also mentioned that Flash Attention 3 has implemented in-kernel padding with similar performance with respect to typical sequence lengths (multiplier of 128).
- Tensor Core Tricks!: When dealing with matrices of size d x N multiplied by N x d (where d = 128, N = 80001, and the datatype is bfloat16), padding N or making matrix A column-major are potential solutions.
- It was mentioned that matmul libraries can handle transpose operations well for BF16, but this could be an issue for FP8 on H100 or FP4 on B200.
- Triton Streams to Hide Latency!: A user asked if Triton can simultaneously run reduction (using cuda core) and matmul (using tensor core) to hide latency when computing
segment += tl.sum(shared_k_i * shared_k_j, axis=1)
anddot_product += tl.dot(shared_k_i * shared_k_j, v_vals, allow_tf32=True)
.- It was suggested that for more control and if there is no data dependency, you can split the work into two Triton kernels and launch them on different CUDA streams to overlap execution.
GPU MODE â· #cuda (1 messages):
Deadlock Issue Debugging, cudaMemcpyAsync issues, cudaHostFunc issues, NCCL issue #1509
- Debugging Deadlock Between Computation, cudaMemcpyAsync, and cudaHostFunc: A user is debugging a deadlock issue between computation, cudaMemcpyAsync, and cudaHostFunc in a large-scale training job.
- The user noted that this is similar to NCCL issue #1509 and is working to simplify the reproduction case while ruling out variables along the Megatron â PyTorch â NCCL â CUDA path.
- Reproducing Deadlock Issue: The userâs implementation is hitting a similar problem in a large-scale training job.
- The minimal reproduction case is still messy and long, reproducing only a few 1F1B iterations.
GPU MODE â· #torch (6 messages):
gradient computation, xai method, CPU memory usage, Torch, activation memory
- Torch tricks for backprop on Multiple GPUs: A member asked how to split the backprop over multiple GPUs using Torch, encountering 100 GB of occupied memory in CPU when computing the gradient for an xai method.
- One member suggested activation checkpointing/offloading and linked to a PyTorch blog post on understanding GPU memory.
- DDP to Model Parallelism: A member suggested something like DistributedDataParallel (DDP) and the user replied that they were looking for something like model parallelism, because they are using just one sample at a time.
- Another member mentioned sharding gradients over multiple GPUs with zero/fsdp and suggested recomputation (checkpointing) to avoid storing all activations.
GPU MODE â· #announcements (1 messages):
Luminal, Deep Learning Compiler, Joe Fioti
- Joe Fioti Set to Talk Luminal: Joe Fioti will discuss Luminal, a search-based deep learning compiler.
- Search-Based Deep Learning Compilation with Luminal: Joe Fioti is going to talk about Luminal, a search-based deep learning compiler.
GPU MODE â· #pmpp-book (1 messages):
piotr.mazurek: https://github.com/tugot17/pmpp
GPU MODE â· #torchao (4 messages):
PyTorch TorchAO, ICML 2025, CodeML workshop, TorchAO Poster
- TorchAO Team Heads to ICML 2025: The PyTorch TorchAO team will present a poster at the ICML 2025 CodeML workshop on July 18th.
- Their poster, TorchAO: PyTorch-Native Training-to-Serving Model Optimization, discusses bridging the gap between model training and efficient serving using PyTorch-native tooling; find the session details here.
- KernelBot to chat with TorchAO at ICML: A member shared they will attend the same ICML workshop as KernelBot and would like to chat.
- Meeting new folks at ICML is awesome.
GPU MODE â· #off-topic (2 messages):
GPU pronouns, TPU pronouns, CUDA pronouns, ROCm pronouns
- Pronouns now include GPUs and TPUs: Discord user shared an image showing that pronoun selections now include GPU/TPU as an option.
- The image analysis suggests initially leaning towards CUDA/ROCm as preferred pronouns, but settling for GPU/TPU for now.
- CUDA and ROCm almost made the cut for pronouns: Discord users discussed the addition of GPU/TPU as pronoun options.
- Initially, there was a preference towards CUDA/ROCm, but the decision was made to go with the more general GPU/TPU pronouns for broader applicability.
GPU MODE â· #irl-meetup (2 messages):
AI Conference San Francisco, ICML Meetup, KernelBot Paper Presentation
- AI Conference comes to San Francisco: There is an AI conference happening in San Francisco on September 17-18 (https://aiconference.com/).
- One member asked whether anyone would be attending.
- ICML: An Opportunity for Meetups: A member mentioned they will be at the ICML for the week and extended an invitation for meetups.
- KernelBotâs Debut at CodeML Workshop: A member announced their teamâs presentation of the KernelBot paper at the CodeML workshop on Thursday.
- The presentation will be held in west meeting rooms 211-214.
GPU MODE â· #rocm (4 messages):
rocprofv3 profiling, AMD kernels, PyTorch profiling
- Debugging
rocprofv3
for PyTorch functions: A user inquired ifrocprofv3
excludes PyTorch functions by default when profiling, seeking to profile PyTorch references.- A member suggested launching the script with the correct ROCm prefix before tracing and another user proposed a
rocprofv3
command with specific--att
flags and buffer size for profiling, and doubts that anything other than AMDâs built-in kernels (for memcpy and blit etc) are excluded.
- A member suggested launching the script with the correct ROCm prefix before tracing and another user proposed a
- Configure
rocprofv3
for activity tracing: One user shared a command they use for profiling withrocprofv3
:rocprofv3 --att --att-buffer-size 1_000_000_000 --att-activity 10 -d dir -- program
.- The parameters configure attribute tracing, buffer size, activity tracing, and the directory for output, followed by the program to profile.
GPU MODE â· #webgpu (2 messages):
TurboWarp Extension for Machine Learning, MTLReadWriteTextureTier2 and wgpu
- TurboWarp Embraces Machine Learning Extension: A member is exploring writing a TurboWarp extension for machine learning, potentially wrapping convnetjs in a Blockly form.
- They were looking for information about existing projects or individuals with experience in this area.
- MTLReadWriteTextureTier2 Access Conundrum: A member is seeking guidance on exposing MTLReadWriteTextureTier2 to wgpu.
- Despite having Texture_Adapter_Specific_Format_Features enabled, they are unable to access rgba8unorm for read_write textures, which is supported in Tier2 of the MetalAPI.
GPU MODE â· #self-promotion (12 messagesđ„):
Thunder Compute VSCode Extension, NVIDIA Tensor Core Evolution, QuACK Open Source Library, Backpropagation through RMSNorm and LayerNorm, AI Compute Hackathon in a German Castle
- Thunder Compute VSCode Extension: For those who dislike SSH config and like cheap GPUs, try Thunder Computeâs VSCode extension.
- When one user said they disliked both those things and VSCode, a member replied they also have a CLI tool.
- NVIDIA Tensor Core Evolution: From Volta to Blackwell: A talk introduces key Tensor Core architectures, explores fundamental performance principles, and reveals how these principles drive architectural evolution, according to semianalysis.com.
- QuACK Open Source Library Outperforms PyTorch: QuACK, a newly released Open Source library written in CuTeDSL by Tri Dao and his research group, uses the CuTeDSL to write highly efficient reduction operation which can be leveraged to write memory bound kernels at the speed of light! blogpost at veitner.bearblog.dev.
- Demystifying Backpropagation Through RMSNorm and LayerNorm: A member shared their work deriving backpropagation through RMSNorm and LayerNorm by hand, providing links to blog posts at veitner.bearblog.dev/backprop-through-rmsnorm/ and veitner.bearblog.dev/backprob-through-layernorm/.
- Hackathon in a Castle: A member is hosting a small hackathon in a fourteenth-century german castle to explore the future of ai compute around the themes are: GPU optimization, GPU infrastructure, & energy-efficient chips at castlecompute.com.
GPU MODE â· #đż (3 messages):
nsight compute profiling, AutoTriton
- Profiling Results Give Boost: A member suggested that giving access to nsight compute profiling results may improve outcomes, noting that nobody has tried it yet.
- They planned to experiment this week, highlighting that itâs been on their mind for months.
- AutoTriton Dives into LLM Reinforcement Learning: A member shared AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs, a GitHub repository posted early this month.
- This project represents related work in the field.
GPU MODE â· #submissions (7 messages):
H100 First Place, A100 First Place, MI300 Personal Best
- H100 crown claimed!: A member secured first place on the
trimul
leaderboard for H100 with a submission time of 6.56 ms. - A100 Ace Achieved!: A member achieved first place on the
trimul
leaderboard for A100 with a submission time of 11.5 ms. - MI300 Mania!: A member achieved a personal best on the
amd-identity
leaderboard for MI300 with a submission time of 29.6 ”s. - MI300 Bronze!: A member achieved third place on the
amd-identity
leaderboard for MI300 with a submission time of 5.76 ”s. - MI300 New Record!: A member achieved a personal best on the
amd-fp8-mm
leaderboard for MI300 with a submission time of 934 ”s.
GPU MODE â· #factorio-learning-env (15 messagesđ„):
Training Repo, Vision Transformers, TAS Data, Main Branch Broken
- Alpha Factorio Training Repo Emerges: A member inquired about a separate training repo, sharing a link to alpha-factorio as a potential resource.
- The same user referenced using vision transformers from OpenAI five.
- TAS Data Found, but Outdated: A member asked if data from alpha-factorio/tasks.lua.txt was raw TAS data.
- Another member confirmed it was TAS data but noted it was about 5 years old and thus pretty outdated.
- Main Branch Suffers Headless Hangup: It was reported that the MAIN branch is broken IFF you run tests while the client is connected.
- It seems that only headless mode works, and another member is unable to run the scripts on Windows, though this has been since fixed.
GPU MODE â· #cutlass (49 messagesđ„):
Cute Tensors, Broadcasting in CuteDSL, Cutlass Kernel, cuTile, CUDA
- Allocate a local tensor in Cute: To allocate a local tensor that can accumulate over, one can use
cute.full
for value semantics orcute.empty
, according to a member;cute.make_fragment
can be used for a buffer-like object.- Value semantics is
tensorssa
, buffer-like iscute.tensor
.
- Value semantics is
- Broadcast with Cute Tensors via Layout Manipulation: To achieve broadcasting with Cute tensors, a member suggests using
cute.append
to modify the layout, as demonstrated in this code snippet.- The member seeking to broadcast a
[m, k]
and[k, n]
tensor to[m,k,n]
and then sum alongk
found that the current implementation requires manual slicing and layout adjustments, but this could be simplified with a utility function.
- The member seeking to broadcast a
- Navigating Cutlass for Newbies: A seasoned member suggests starting with basic CUDA MMA kernels before diving into Cutlass to understand the problems Cutlass solves, and to search the names for variables that are given in the inline code examples in the markdown docs to find cutlass implementations for predication, referencing the Cute Docs and the Cute Tutorial
- They shared a gist of their code and another implementation in cute, noting that fusing the outer product in the kernel significantly outperformed PyTorchâs einsum.
Eleuther â· #general (85 messagesđ„đ„):
GPU demand, ICML 2025, Causal Systems, Low GPU power AI, Water/Light GPU
- GPU Demand Bubbles after Llama 2 Release: A member shared a Latent Space article noting the GPU supply squeeze and over-demand for renting GPUs around June 2023, coinciding with the Llama-2 release.
- ICML 2025 Discord and Side Events Launched: Members shared a Discord invite link for ICML 2025, a Lu.ma link and Partiful link for side events, and a WhatsApp group invite for AI Safety discussions at ICML.
- Members Debate Causal Inference Applications: A member inquired about working with causal systems, to which another member responded that their current project involves applying causal inference methods to understanding chains of thought.
- Community Skeptical of Water and Light-Based GPUs: After a user proposed making a new AI system using less GPU power, some members shared links about water-based GPUs and optical computing, though most were skeptical of their practicality.
- Engineer Seeks LLM Architecture Collaborator: A member with a background in analytic philosophy and mathematics sought someone with experience in LLM architectures and an interest in linguistics or philosophy of language to collaborate on a research-oriented side project aimed at developing new LLM architectures capable of genuinely understanding natural language.
Eleuther â· #research (42 messagesđ„):
RNN tokenization, Mixture of Tokenizers, Byte-level models, n-simplical attention, antipodal dense features
- RNNs tokenize faster than transformers: Tokenization can be replaced with RNNs to get byte level models that learn faster than tokenization based transformers, with the embedding and lm head of a normal transformer replaced with two small 4-layer RNNs.
- The model compares dot products p of hidden state output with the prior one and if less than 50% match, this hidden state becomes a token passed to the main model, recursively repeating this twice.
- Mixture of Tokenizers Approach: Sebastianâs Mixture of Tokenizers approach looks promising as an alternative to standard tokenization methods.
- The approach seeks to combine the strengths of different tokenizers to improve overall performance.
- Unicode codepoints vs bytes: A member asked whether it would be better for H-net to operate on Unicode codepoints than on bytes, but others said it would be overkill given the vocab size.
- Another member stated that the point wasnât to have a small embedding table, but to have the smallest unit of text for dynamic chunking, and said that they donât see why youâd need subcharacter chunks.
- Discussion on DiffuSSM: DiffuSSM was published in CVPR â24 but the code is unavailable, contradicting the paperâs open source disclosure.
- Analyzing Antipodal Dense Features: After reading interpretability works that found antipodal dense features in NNs, a member tried initializing networks with them but it didnât seem to help on modded-nanogpt.
- The member was surprised it didnât help because previous attempts at mimetic init found some effect and optimizing the pairwise sign patterns of weights seemed especially hard.
Eleuther â· #scaling-laws (1 messages):
schizik12: The rising sea??
Eleuther â· #interpretability-general (3 messages):
MechInterp Workshop CFP, NeurIPS, Open Source Library Spotlight
- MechInterp Workshop at NeurIPS is Back!: The Mechanistic Interpretability Workshop is happening at NeurIPS, with a call for papers open until August 22. Papers can be max 4 or 9 pages, and submissions under review at NeurIPS are welcome: mechinterpworkshop.com/cfp/.
- MechInterp Seeking principled approaches: The workshop covers any principled approach to understanding models better via their internals, and welcomes any paper that helps advance the field.
- Reviewers are needed, and those interested can express their interest here: https://forms.gle/pAHLAFcJc3jDduGh6.
- Spotlighting Open Source Libraries: The call for papers includes open source libraries, as they had at least one spotlight that was an open source library last year.
Eleuther â· #lm-thunderdome (16 messagesđ„):
lm-evaluation-harness mixed precision PR, logsumexp Trick for Logprob Calculation, Dynamic IFEval Dataset Benchmark
- Mixed Precision PR Speeds Up Eval Times: A member dropped a mixed precision PR for lm-evaluation-harness, showcasing eval times on an A30 for Pythia-160M with various configurations and noting that mixed precision is only slightly slower than casting the full model but much faster than full precision.
- For Qwen1.5-7B, using
softmax=fp32
resulted in an OOM error on an A30 with 24GB VRAM at batch size 32, whilesoftmax=none
used 22775MiB VRAM and took 08:54.
- For Qwen1.5-7B, using
- Logprob Calculation Improvement with Logsumexp: A member suggested using a logsumexp trick to only compute the logprob of the target logit during logprob calculation.
- This method involves
logits[:, -1, 329] - logits[:, -1, :].logsumexp(-1)
and could potentially optimize memory usage by avoiding the need for the log-softmax of all logits.
- This method involves
- Dynamic IFEval Benchmark Proposed: A member introduced a new benchmark forked from the lm-evaluation-harness repo, which dynamically generates a fresh IFEval dataset based on defined rules.
- The benchmark saves the dataset to
data/dataset.yaml
, and the creator provided a link to the new branch for review and feedback.
- The benchmark saves the dataset to
Eleuther â· #gpt-neox-dev (13 messagesđ„):
Neox, H100s, Transformer Engine, DeepSpeed
- Neox runs well on H100s with Transformer Engine: A member had a good experience using NeoX with H100s and Transformer Engine, sharing their Dockerfile and config.
- DeepSpeed requires separate installation for MPI flags: A member installed deeperspeed separately due to network permissions issues on their cluster, needing extra flags for the MPI run command generated within deeperspeed.
- Wandb errors fixed with hack: Transient logging errors during W&B init were addressed with a hack, as a root fix was not found.
- Non-TE config speed benchmark requested: Another member asked if a non-TE config with the same architecture was tested, seeking a sense of the slowdown.
Latent Space â· #ai-general-chat (153 messagesđ„đ„):
Windsurf Acquired, Apple succession and leadership, GPT-5 Rumors, Universal Reward Function, Gemini Embedding Model Release
- Cognition Gobbles up Windsurf: Cognition Labs announced the acquisition of Windsurf, including its IP, product, trademark, and employees, aiming to combine Cognitionâs autonomous agents with Windsurfâs agentic IDE, and all Windsurf employees will receive financial participation and accelerated vesting.
- The acquisition is intended to create a powerful tool for developers, enabling tasks like planning, parallel work delegation, and code stitching within a unified IDE; however, some users pointed to conflicting reports that employees who havenât vested yet will get nothing.
- Meta Machines Massive Megawattage: Meta is constructing vast AI clusters, including the 1000MW Prometheus (arriving in 2026) and Hyperion scaling over 5000MW, dwarfing current operational H100/H200 clusters at just 150-200MW, detailed in a SemiAnalysis article about Metaâs data center strategy and post-Llama4 learnings.
- Community reactions discussed implications for AI research, NVIDIA sales, and power sources.
- Karpathyâs Contemplates the Current Course of RL: Andrej Karpathy discussed limitations of current Reinforcement Learning (RL) approaches and proposes a review/reflect stage for LLMs to extract explicit lessons from rollouts and add them to the system prompt, similar to human learning, in this tweet.
- This lesson-based learning could provide more bits of supervision per rollout, improve generalization, and lead to new learning paradigms specific to LLMs, moving beyond traditional game/robotics RL.
- Sam Spares Safety Scare, Suspends Launch: Sam Altman announced a delay in the planned open-weight model launch to conduct additional safety tests and review high-risk areas, stating that once weights are released, they cannot be pulled back as seen in this tweet.
- The community generally supports this decision, emphasizing the importance of safety over speed.
- Gemini Gets Going with Global Goodness: Logan Kilpatrick announced the general availability of the Gemini Embedding model, which ranks #1 on the MTEB leaderboard, priced at $0.15 per million tokens as seen in this tweet.
- Upcoming features include Batch Mode support and new multimodal embeddings and confirms the model is currently multilingual with multimodal capabilities arriving soon.
Latent Space â· #ai-announcements (1 messages):
swyxio: special double podcast this week! https://x.com/latentspacepod/status/1943774304166195402
MCP (Glama) â· #general (74 messagesđ„đ„):
MCP for ML Models, Agent Definitions in GenAI, Clipboard Servers in MCP, Elicitation Implementations
- MCP simplifies Machine Learning Model Deployment?: A member shared a blog post suggesting MCP can simplify ML model deployment by integrating model serving with agent workflows.
- The articleâs example server starts an MCP server, exposes a ârequestâ tool that runs inference, and returns the result, using transformers.
- GenAI Agent Definition debated: Members debated the definition of AI agents, with differing views on whether workflows should be considered agents, and the relevance of Anthropicâs definition.
- One member argued that Anthropicâs definition is the most thought-through, while another contended that the definition of agents predates LLMs and is broader than GenAI.
- Clipboard Servers get Proposed: A member proposed adding clipboard servers to the official MCP specification to allow servers to write to client clipboards, planning to implement this in MCP-B.
- This would broaden utility by enabling easier data transfer from MCP servers to clients.
- Elicitation Implementations get Discussed: A member is trying to implement Elicitation so that if the user prompt already includes enough information to run the tool, then Elicitation is not needed.
- The UI is provided by the client application when you send an elicitation request.
MCP (Glama) â· #showcase (9 messagesđ„):
Neurabase, mcp-spec, Director Run, MCP Evals, Albert Heijn MCP
- Neurabase touts Fastest MCP Server Hosting: Neurabase claims to be the fastest server hosting service running fully on Cloudflare Workers CDN and calls itself a central hub for MCP servers, hosted at neurabase.deploya.dev.
- mcp-spec: Docs in MCP-ception!: A member built an MCP server called mcp-spec for MCP docs, partitioning and indexing the entire documentation into chunks after pasting the entire MCP spec into a .md file, check out the repo!
- Director Run Offers Local-First MCP Gateway: The Director Run team has created a fully open source, local first MCP gateway that allows users to connect Claude, Cursor or VSCode to any MCP server in 30 seconds, and can be found at director.run or on GitHub.
- mcp-evals: Open Source MCP Server Evaluations: A member shared an open source repo for running evals on an MCP server at mcp-evals, which launches a client, lists tools and loops over prompts to determine which prompts will trigger which tools, which also acts as an e2e test since you can call the actual tools as well.
- Someone Wants an Albert Heijn MCP?: Someone posted an image asking if there were any dutch people that want an Albert Heijn MCP.
Yannick Kilcher â· #general (56 messagesđ„đ„):
Industrial Agents Training, Good World Models, Kimi K2, OpenAI Safety, BitNet vs Llama.cpp
- Investors wonât stop funding AI, despite crash fears: One user stated that investors will continue to fund AI, because itâs extremely cheap over time to train models for dexterous behavior with human hands, despite fears of a crash.
- OpenAI delaying model to improve safety?: Users speculated that OpenAI is delaying model releases due to safety concerns after the Grok incident, or due to Kimi K2âs performance as mentioned in this tweet.
- Is OpenAI holding back capabilities?: An OpenAI employee may have implied that capabilities also factored into the delay, as seen in this tweet, leading some to cynically suggest that OpenAIâs model performs worse than kimi and is quickly training something slightly larger.
- BitNet discussion: Members discussed BitNet support in Llama.cpp, clarifying that itâs not a versus question, and noting its effectiveness with recent simplifications in training, although limited to around 7B models due to training data requirements.
- One user stated Itâs a bit like asking if bicycles or tires are better.
- Critique of Foundation Models: Users discussed this paper which is more a critique of how to use foundation models than learning physics, emphasizing that blindly training foundation models may not work, rather than transformers being incapable of learning.
- Another member said that planetary orbits are fairly low information-content data, so youâre likely to over fit, while one responded that the point of the paper is more the method for interrogating implicit model structures, rather than the results they got from their interrogations.
Yannick Kilcher â· #paper-discussion (6 messages):
ResNet and Attention in U-Nets, Hugging Face Transformers, U-Net Definition Confusion, Accordion Networks (WWWWWW), Kimi K2 Model
- ResNet, Attention, and U-Net Implementations Emerge: Discussion involves the use of ResNet with attention in U-Net architectures, referencing a paper using a single ResNet stage with attention and another using a stack of same-width ResNets with attention only in the latent space.
- Also mentioned is a paper using a stack of ResNets with attention to replace the layers of the encoder and decoder, potentially the one used in Hugging Faceâs transformers library.
- U-Net Definitions are Confusing: Thereâs confusion regarding what constitutes a U-Net, with some GitHub repositories like this one mislabeling architectures as ResNet + Attention + U-Net when they are not U-Net-based.
- The speaker also mentions an âaccordionâ network architecture and calls it a pagoda network as a network topology of interest for future study.
- Kimi K2 is actually really good: The speaker expresses high regard for the Kimi K2 model, describing it as a favorite for certain use cases and placing it in their top 3.
Yannick Kilcher â· #ml-news (3 messages):
Twitter Links
- Tweets Shared!: Three Twitter links were shared in the channel: Elie Bakouch, Hayden Field, and Signulll.
- More Tweets: Just adding a second topic to satify minItems=2
aider (Paul Gauthier) â· #general (52 messagesđ„):
Grok 4 Aider Benchmark, Aider Benchmark Harder Tasks, Aider Leaderboard Updates, Aider Agents, Aider on Windows
- Grok 4 Achieves 80% on Aider Polyglot Coding Benchmark: Grok 4 scored 80% on the aider polyglot coding benchmark, achieving 4th place on the leaderboard, which can be found here.
- More Challenging tasks needed in Aider Benchmark?: A member wondered if harder tasks are needed for the aider benchmark, now that many models score around 80%.
- Gemini 2.5 Pro vs Gemini 1.5 Pro Models compared: Members discussed using
GEMINI_API_KEY=[blah] aider --model gemini/gemini-2.5-pro
with Gemini models, saying that Gemini 1.5 Pro has 2M context, but 2.5 Pro is smarter.- One member shared a screenshot of model prices.
- Kimiâs Aider Polyglot Coding Benchmark Reported via MoonshotAI: A member mentioned using the score reported by MoonshotAI for Kimiâs performance on the aider benchmark.
- Another member suggested using official provider pricing instead of third-party estimates to avoid confusion, adding that Artificial Analysis has relevant cost metrics.
aider (Paul Gauthier) â· #questions-and-tips (10 messagesđ„):
Zed editor schema validation for aider conf file, Github Copilot support in Aider, COBOL support to Aider, LiteLLM Proxy config and Aider config, Gemini thinking tokens
- Aider conf file gets schema validation from Zed editor: A user reported that the Zed editor now provides schema validation for aiderâs configuration file, which prompted them to convert
test-cmd
into an array, leading to a configuration error.- The error message suggested that the
test-cmd
action type needs to be changed to âappendâ ornargs
needs to be set, but the user was unclear on the implications.
- The error message suggested that the
- Deepseek recommends
tsc --noEmit
for static type checking: A user was having trouble configuringbun typecheck
and another user suggested usingtsc --noEmit
for static type checking, as recommended by Deepseek- The user acknowledged that
/lint
only checks dirty files, which is different from their usual Python/Ruff workflow where everything is checked every time due to Ruffâs speed.
- The user acknowledged that
- Confusion Surrounds GitHub Copilot Support in Aider: A user was confused about the state of GitHub Copilot support in Aider, as the documentation suggests it is supported, while an issue indicates it might still be a work in progress.
- The discussion aimed to clarify whether the documentation and the issue refer to the same aspect of Copilot integration.
- Aiderâs voice command lacks configuration options: A user inquired whether itâs possible to change the model and endpoint used in the
/voice
command.- No specific answer was provided in the available messages.
- COBOL parser causes Aider segmentation fault: A user encountered a segmentation fault when adding COBOL support to Aider after creating
tags.scm
, compiling the COBOL parser with Tree-sitter, and making necessary code changes.- The segmentation fault occurs when the COBOL shared library is loaded, and the user has already verified the exported symbols and parser correctness, seeking advice on common pitfalls or debugging tips for Tree-sitter integration.
LLM Agents (Berkeley MOOC) â· #mooc-announcements (1 messages):
MOOC Certificates, Certificate Requirements, Feedback form
- Advanced LLM Agents MOOC Certificates are Out: The certificates for the Advanced LLM Agents MOOC have been released, recognizing the participantsâ achievements, with 232 Trailblazer, 38 Mastery, 80 Ninja, 1 Legendary, and 3 Honorary certificates awarded.
- The announcement thanked the participants and guest speakers while encouraging certificate earners to share their achievement.
- Certificate Receipt Checklist Released: A checklist was provided for participants who expected a certificate but didnât receive one, including checking spam folders, ensuring all coursework was completed under the same email address, and verifying completion of the article assignment and certificate declaration form.
- Confirmation emails for the assignment and declaration form should have been received upon completion.
- Feedback Appreciated on Anonymous Form: Participants are encouraged to share feedback on the MOOC through an anonymous feedback form.
- The announcement concluded by thanking everyone for a great semester and encouraging questions in the designated channel.
LLM Agents (Berkeley MOOC) â· #mooc-questions (49 messagesđ„):
Certificate Issues, Certificate Declaration Form, Article Submission Form, Formatting Errors on Certificates, Missing Certificates
- Certificate snags spark snag!: Several users reported issues with receiving their certificates, often related to unsubscribing from the mailing list or missing the certificate declaration form.
- Staff resubscribed one user after discovering they were accidentally unsubscribed, and resent another certificate to a different email, but staff reiterated that they donât have the staff capacity to handle individual cases.
- Formatting fails foul-up finishes!: One user reported that their name overlapped the tier on their certificate, causing issues when trying to post the certificate to LinkedIn.
- The staff member fixed the formatting error for that certificate, stating should be fixed now! sorry about that, referencing the certificate number which can be found in the PDF name.
- Article Assignment snafus snarl students!: Some users realized they missed filling out the article submission form despite completing the article, quizzes, and other requirements.
- Staff emphasized Iâm very sorry, but there isnât anything we can do now to accommodate the students who missed these forms/deadlines.
- Feedback flies for MOOC future!: A user suggested providing a centralized Excel sheet or progress tracker to allow participants to check their status and prevent last-minute issues.
- Staff thanked the users for the suggestions, stating that It is thanks to everyoneâs participation and enthusiasm that weâll be able to hopefully improve upon the format for delivering all of the lectures + coursework in the future!.
Modular (Mojo đ„) â· #general (9 messagesđ„):
Assembly coding inside Mojo, Modular community event tracking, Discord notifications, Mojo Standard Library Assembly Module
- Mojo lets you assemble!: A member asked if it is possible to code some assembly inside of Mojo, like making little syscalls.
- Another member replied that itâs possible but not documented well, linking to the Mojo Standard Library Assembly Module.
- Modular Community Event Tracking Tools Poll: Modular is polling its community on how they prefer to keep track of Modular community events (e.g. community meetings, livestreams, conference talks, meetups, etc.).
- Options include the Modular community Google calendar and the Modularâs Luma event page, as well as other suggestions.
- Discord Notifications FTW: A member stated that Notifications from Discord / other subscriptions are the best for me.
Modular (Mojo đ„) â· #announcements (1 messages):
July Community Meeting, Hashable-based hashing, FFT implementation, Mojo-Lapper, Quantum circuit simulator
- July Community Meeting Starts Soon!: The July community meeting is scheduled to start in approximately 2 hours, featuring presentations on several topics.
- Speakers include discussions on Hashable-based hashing, an FFT implementation, Mojo-Lapper (an overlap detection library), and a quantum circuit simulator - join the discord event!
- Team Q&A Incoming: As with every community meeting, the team has requested for users to drop any questions for them in advance.
- Ask them using this google form!
Modular (Mojo đ„) â· #mojo (30 messagesđ„):
Mojo error messages, M1 Metal 3 GPUs, Autotune functionality, EqualityComparable, Atomics on GPU
- Mojo Error Messages: Where to Find Them?: A member was looking for a list of Mojo error messages after encountering one, and was advised to consult the #help channel and use @kapa.ai for assistance.
- A member encountering an error where they needed to set
self.id
to a value.
- A member encountering an error where they needed to set
- M1 Metal 3 GPUs: Not Yet Supported in Mojo: A user inquired about leveraging M1 Metal 3 GPUs for processing in Mojo and Max, but was informed that Apple Silicon GPUs are not currently supported but are a work in progress.
- A link to a related GitHub commit was shared, though it only pertains to build system detection.
- Autotune Functionality: Replaced with Benchmarking Loops: A user asked about the timeline for the return of the autotune functionality, noting that it was removed for a rethink.
- It has the ability to massively blow up compile times, and you can now replace it with a for loop and some benchmarking.
- EqualityComparable: Lacking the equal function: A user reported an error related to implementing all requirements for EqualityComparable in a struct, asking for guidance on what might be missing.
- Members recommended the manual to get started.
- Capturing Keyword: Documentation Quest: A user sought specific examples of the âcapturingâ keyword in Mojo, noting its usage in explanations of other topics like âforeachâ.
- A member shared a link of an explanation and to a related issue.
Modular (Mojo đ„) â· #max (4 messages):
arg_nonzero kernel, max.kernels import, mojo build max kernels
- User struggles to import
arg_nonzero
kernel: A user encountered an error trying to importarg_nonzero
andarg_nonzero_shape
frommojo.kernels.nn.arg_nonzero
, despite havingmax
listed as a dependency.- The user received errors such as unable to locate module âmojoâ and âkernelsâ does not refer to a nested package.
max.kernels
import fails: The user found thatmax.kernels.nn.arg_nonzero
was not accessible or exposing submodules.- The error message was âkernelsâ does not refer to a nested package.
- User resolves import issue by building
max
kernels from source: A member suggested that the kernels should be standalone modules importable viafrom nn import ...
, and suggested building them from source if they canât be accessed from themax
package: Modular Forum post.- The user confirmed that running
mojo build -I ../modular/max/kernels/src/
resolved the issue.
- The user confirmed that running
Manus.im Discord â· #general (37 messagesđ„):
Manus Flutter Web Emulator, Startup Advice, Google Drive Save Error, Manus Website Outage, Manus Fellowship
- Flutter Web Emulator Extension Gets Traction: A member shared their Flutter Web Emulator extension created with Manus and noted it got 1900 installs in two months without any promotion.
- Online Incubator Recommendations: A member suggested to join online incubators to find partners and advisors and recommended checking out f6s.com.
- Another user suggested asking Manus for advice on developing a business step by step, joking someone could create a Manus online business incubator.
- Google Drive Save Error: A member reported a potential bug where saving the most recent item to Google Drive works, but saving the previous item results in a Google Auth error.
- Manus Websites Experience Outage: Multiple members reported issues accessing Manus websites and deployments on manus.space, indicating a potential outage.
Notebook LM â· #announcements (1 messages):
Featured Notebooks, NotebookLM
- Featured Notebooks arrive to NotebookLM: The team announced the launch of Featured Notebooks on the homepage, with content spanning from scientific exploration to expert advice.
- Users can access the notebooks directly to learn more.
- Explore diverse topics with Featured Notebooks: These notebooks offer a range of content, from scientific explorations to practical guides and expert advice, catering to various interests and needs.
- The featured notebooks section provides users with easy access to valuable resources and insights within the NotebookLM platform.
Notebook LM â· #use-cases (11 messagesđ„):
Targeted Fiction Editing with AI, NotebookLM integration with Apple system toolkits, AI for extracting information from books
- AI Targets Novel Editing: Users discussed utilizing AI for targeted editing of fiction, providing actionable advice and examples for novice authors, particularly analyzing early-draft manuscripts from opening contract to ending scene, as exemplified by the prompt Analyze [X]; Provide actionable advice as paired with written examples for [Y].
- The discussion centered around a comprehensive deep dive into every element of the manuscript, with a strong focus on cohesive packaging and writing quality, even going so far as saying the results allowed the original poster to get two hours of content from it.
- NotebookLM avoids native Apple features: It was stated that sharing text with the NotebookLM app creates a new notebook with the source material, and sharing a deep research report defaults to sharing a text file with the contents of the report, indicating no special treatment.
- A member noted that this is likely because Google is characteristically allergic to engaging with Apple system toolkits and human interface guidelines, in other words, a native app with native features is unlikely.
- AI Extracts Everything from a Book: A user shared an image found on Reddit, originally from Pinterest, titled How to use AI to extract everything from a book.
- The image contained multiple screenshots outlining steps for using AI tools to analyze and extract information from books, it included using chatbots and optical character recognition (OCR) to convert text from images.
Notebook LM â· #general (24 messagesđ„):
Source naming conventions, Audio file generation length, Embedding model details, Server tag requests, iOS app functionality
- Name Game: Naming Generated Sources the Same as Preset Queries: A user suggested that when using preset queries like âFAQâ, the generated source should be named exactly as the button, like âFAQâ, to improve organization and ease of finding sources, especially in notebooks with many sources.
- Podcast Panic: Audio File Generation Shortening Lately?: Users reported that audio file generation lengths seem to be shorter recently, generating around 10-15 minutes instead of the previous 30+ minutes, even with settings adjusted for longer podcasts.
- Embedding Enigma: Figuring Out the Embedding Model Under the Hood: A user inquired whether NotebookLM uses
gemini-embedding
,text-embedding-004/005
, ortext-multilingual-embedding-002
for its embeddings.- This remains a mystery.
- Server Tag Shenanigans: Server Tag Requests?: A user inquired about plans for a server tag for NotebookLM, while another inquired about reading pinned notes or pinning answers on the iOS app.
- Publishing Puzzles: Public Notebook Publishing?: A user asked if thereâs a way to publish notebooks and shared a link to a Google blog post, inquiring about whether publishing is limited to Google partnerships only.
LlamaIndex â· #announcements (1 messages):
LlamaIndex Meetup Amsterdam, Office Hours, Notebook Llama, Context Engineering, Research Agent
- LlamaIndex lands in lovely Lisse (near Amsterdam): LlamaIndex is hosting a meetup in Amsterdam on July 31 focused on LlamaIndex & Snowflake Data Agent Builders.
- Be sure to sign up to reserve your spot.
- Sign Up for Stress-free Study Sesh!: The next LlamaIndex office hours will be held on August 5.
- Sign up to join the session.
- Notebook Llama Clones NotebookLM: NotebookLlama, a NotebookLM clone by LlamaIndex, is available on GitHub and has already received over 1k stars.
- Check out the repo!
- Context Engineering Techniques: LlamaIndex introduced techniques for Context Engineering on their blog.
- This blogpost covers the what and how of Context Engineering.
- Gemini 2.5 Pro Powers Research Agent: LlamaIndex demonstrates how to build a research agent with LlamaIndex & Gemini 2.5 pro in this tutorial.
- Learn to leverage the power of LLMs for research!
LlamaIndex â· #blog (3 messages):
Notebook Llama new features, RAG Apps, Google Gemini 2.5 Pro
- NotebookLlama Gets More Features: The new release of NotebookLlama, an open-source alternative to @NotebookLM, allows users to extract and download images and tables from their files, and interactively visualize all tabular data.
- Users can now also chat with the new and improved NotebookLlama.
- Crafting Real-World RAG Apps: A comprehensive guide has been released on how to build real-world RAG (Retrieval-Augmented Generation) applications, walking users through the entire process from raw data to fully-fledged pipelines, described in this tweet.
- The guide was created in collaboration with @itsclelia from LlamaIndex and @krotenWanderung from @qdrant_engine, offering insights from both teams.
- Gemini 2.5 Pro Powers Research Agent: A new example demonstrates how to build a multi-agent research assistant powered by LlamaIndex workflows and Googleâs Gemini 2.5 Pro, as described in this tweet.
- The agent can search the web with Google and take notes with a dedicated note-taker agent.
LlamaIndex â· #general (27 messagesđ„):
LlamaIndex Partner Program, Tool Calling Models, Synk Hiring, Response Synthesizers
- LlamaIndex Seeks Co-Marketing Partners: Aviv from Bright Data is seeking a contact at LlamaIndex to discuss joint content or co-marketing opportunities for their web data/scraping tool integration.
- They aim to help developers get more value from the integration.
- Best Tool Calling Model? Llama3 and Mistral in the Running: A member asked for advice on the best open-source tool calling model that can run locally with <= 96 GB of VRAM.
- While acknowledging Claude as currently âbestâ, they noted llama3.3 70b and mistral-32-small are decent with LlamaIndex, seeking other opinions.
- Synk Project Recruits Anonymity Advocates: The Synk project, focused on a decentralized, anonymous, and secure browser, is hiring for various roles including developers, QA engineers, DevOps engineers, moderators, marketing analysts, and beta-testers.
- The project offers official employment with signed documentation, guaranteed salary, and a flexible schedule.
- Generating More Detailed Responses: A member asked for methods to generate bigger amounts of text instead of summarization, iterating over every point to create a bigger report.
- Another member suggested looking into GraphRAG and pointed to a GraphRAG example that theyâre adapting with Neo4j.
LlamaIndex â· #ai-discussion (1 messages):
Synk, MetaToyGame, Decentralized system for browsers
- Synk Seeks New Synergies: The fast-growing project Synk is looking for ambitious and passionate individuals to help develop a decentralized system for browsers that is fully anonymous and secure.
- They are specifically hiring for Developers (back-end, Front-end, blockchain), QA Engineer, DevOps Engineer, Moderators (Game chat), Marketing Analyst, and Beta-Testers, and invite people to check out their product on MetaToyGame X.
- Synk Positions Open for Beta Testers: Synk is hiring for Beta-Testers, no experience required!
- If you are interested in helping DM to discuss work details, they offer official employment with signed documentation, guaranteed salary, and a flexible schedule.
Torchtune â· #general (1 messages):
yamashi: Kimi K2 was trained with muon, could it be that this is the future
Torchtune â· #dev (17 messagesđ„):
Async Recipe, Flex Attention memory usage with complicated masks, torch.cuda.memory._set_allocator_settings, Sync GRPO Recipe
- Async Recipe doesnât work for every model: It was suggested to keep a fully functioning recipe, since the async recipe doesnât work on every model.
- Another member agreed and pointed out that the recipe has a critical issue and opened a PR to address it.
- Flex Attention Kernelâs Memory needs explored: The amount of shmem used by a flex attention kernel will be dependent on the scoremod and/or the maskmod (complicated mask made of multiple parts).
- A question was posed whether there is additional memory needed for the triton kernel if one constructs the mask via
and_masks
(link).
- A question was posed whether there is additional memory needed for the triton kernel if one constructs the mask via
- Expanding Memory Allocation for Robustness: A member suggested using
torch.cuda.memory._set_allocator_settings("expandable_segments:True")
to make the code more robust and change it dynamically.- This tip was appreciated and the user was encouraged to submit a PR; it was also recommended to change it in the init of the unit tests because torch is imported before tune.
- Sync GRPO Recipe is currently broken: It was pointed out that the sync GRPO recipe is broken (PR), and a recommendation was made to revert until #2697 or use
krammnic/torchtune
where this functions are reverted.- The submitter added that reverting in main is not a good solution due to the different structure of their rewards and that they need to wait until their PR is merged.
- Trainer crashes due to optimizer compilation: A member reported that the Trainer is crashing with compile: True while using the cosine lr scheduler.
- Another member suggested that they disable the optimizer compilation with this config:
compile:
model: true
loss: true
scale_grads: true
optimizer_step: false
Torchtune â· #papers (2 messages):
Token Training, Grokking
- Missed Token Training Opportunities?: A member questioned why a certain action wasnât taken 1T tokens earlier, suggesting that potential benefits were missed.
- They speculated that the decision-makers might be hoping for some kind of grokking to occur.
- Grokking Speculation: The user mentioned a link to a tweet.
- They wondered if a decision was made in hopes of grokking to happen.
tinygrad (George Hotz) â· #general (19 messagesđ„):
Frontend Reimplementations, Metal Profiling API, ONNX Flaky and Coredumps, Driving Vision ONNX Issue, Tinygrad Apps and Examples
- Frontend Reimplementations Ruffle Feathers: George Hotz advises against reimplementing frontends, pointing to the completeness and testing of the existing spec, while responding to the jafioti requests to chat with the tinygrad team.
- He said More incomplete frontends probably arenât a good use of dev effort, and is open to participation in existing conversations.
- Metal Profiling API Surfaces: uuuvn shared a Metal profiling API that is similar to sqtt on AMD, saying not sure if you saw this message - there is an api for profiling on metal thatâs similar to sqtt on amd.
- He also shared the onnx file with all of ONNX in 1000 lines of code.
- ONNX Reproducibility Plagued by Coredumps: b1tg reported a coredump during ONNX reproduction, pinpointing a crash in the python process at
_METADATA.set
within_metadata_wrapper
, potentially indicating a CPython bug.- They linked to prior segfaults and another one observed during ONNX parser merging, which crashed at
_METADATA.get
.
- They linked to prior segfaults and another one observed during ONNX parser merging, which crashed at
- Driving Vision ONNX Root Cause Identified: uuuvn has seemingly identified the root cause of an issue with
driving_vision.onnx
and developed a fix, noting it relates to bitcast folding, specifically folding for some uchar to half.- uuuvn is working on a minimal repro test before submitting a PR, but is having difficulty making a proper test.
- Apps and Examples: Quantity or Quality?: A user expressed interest in porting models for useful apps/examples (image deduplication, fast face detection, video content id), noting their reliance on chonky dependencies like ffmpeg.
- A member responded that the team is interested in making tinygrad easier to work with for developers, and less interested in supporting more examples.
Nomic.ai (GPT4All) â· #general (19 messagesđ„):
Gemma 3, Nomic-embed-v2 finetuning, LocalDocs embedding issues, Nomic API server performance, RAG for lore
- Gemma 3 Gets Passing Grade: A member found that Gemma 3 is the only model that theyâve found passable.
- Nomic-embed-v2 Finetuning Faces Cloudflare Access Issue: A user reported an Access Denied error when trying to access data for finetuning nomic-embed-v2 through Cloudflareâs R2 storage via
aws s3 ls
command, using a specific endpoint URL.- The command used was:
aws s3 ls --endpoint-url=https://9fa58365a1a3d032127970d0bd9a1290.r2.cloudflarestorage.com/ s3://contrastive
.
- The command used was:
- LocalDocs Embedding Process Stuck at 0%: Several users reported issues with LocalDocs where the embedding process gets stuck at 0%, even for small text files (example image).
- One user with a 3060 GPU, 9950x processor, and 64GB RAM sought help after the embedding process stalled, and was advised to enable their NVIDIA GPU with its VRAM in LocalDocs settings to improve performance.
- Nomic API Server Responds Slowly: A new Nomic user experienced a two-hour delay in receiving a response from the Nomic API server running on a Debian 12 machine with an older AMD processor and 24GB RAM.
- It was suggested that the system might be running entirely on CPU, and using a smaller model or a better video card could improve performance.
- RAG Text Similarity Search: A user inquired about using a model or agent to store and query a large amount of lore, and it was suggested that RAG (Retrieval-Augmented Generation) with text similarity search via LocalDocs could be a viable solution.
Cohere â· #đ§”-general-thread (12 messagesđ„):
Aya Expanse 32B, Preference Optimization Dataset, Cohere Labs Discord Server
- Aya Expanse 32B impresses despite age: A member found Aya Expanse 32B impressive, noting that itâs working (mostly) with Roocode, despite its age.
- The member compared it to Command-R 08 2024, highlighting that many modern open weight models of this size fail.
- Seeking credit code for testing: A member inquired about obtaining credit code for testing purposes.
- Another member responded with a link to a Discord guide providing instructions.
- Request for Preference Optimization Dataset Release: A member asked if Cohere has released the preference optimization dataset mentioned in this paper.
- The member also included a link to a tweet potentially relevant to the discussion.
- Join the Cohere Labs Discord Server: A member was invited to share their thoughts in the Cohere Labs Discord server.
- A member provided a link to the Discord server encouraging them to post in that community.
Cohere â· #đ-introduce-yourself (6 messages):
Machine Learning Research, High Performance Computing, Quantum Computing, PhD opportunities
- Lecturer Focuses on ML Research: A lecturer from NED University of Engineering and Technology is focusing on machine learning research and applications during their pre-PhD phase.
- They aim to connect with researchers and developers, staying updated on AI and ML developments to build a strong foundation for their PhD.
- Student Pursues Research Career: A Computer Science student from Pakistan is interested in ML, High Performance Computing, and Quantum Computing.
- He is eager to contribute and work on research, with aspirations to pursue a career as a researcher for a PhD, seeking to learn from the community.
DSPy â· #papers (1 messages):
okhattab: Yes. Or read the paper for IReRa!
DSPy â· #general (4 messages):
NFT Public Mint, OpenSea Rewards Claim, Custom LLM Adapter Error, Arc Prize DSPy Hacking
- NFT Public Mint Now Available!: The public mint for an NFT project is LIVE, with only 1,820 NFTs remaining.
- Users who participated in the OS2 rewards program can claim their treasures on the new OpenSea platform via the âRewardsâ tab, but beware of broken links.
- Troubleshooting a Custom LLM Adapter: A user with a custom LLM adapter encountered a ValidationException when using the Bedrock API.
- The error message indicates that the input is too long for the requested model.
- Arc Prize DSPy Hackers Unite!: A member is looking for collaborators on the Arc Prize who are using DSPy.
- They expressed interest in checking out the approaches other people are taking.
Gorilla LLM (Berkeley Function Calling) â· #leaderboard (2 messages):
Llama 4 Scout vs Llama 3.1 70B, BFCL Website Rendering Bug, Llama-3.3-70B-Instruct (FC) Score Discrepancy
- Size Isnât Everything, Says Scouts: A member pointed out that a larger model size doesnât guarantee superior performance, citing Llama 4 Scoutâs inferior performance compared to Llama 3.1 70B.
- Others stated that improvements in architecture and training data can lead to better results even with smaller models.
- Website Rendering Glitches Llama Scores: A member suspected a rendering issue on the website, noting that Llama-3.3-70B-Instruct (FC) displays a score of 74.33 for Non-live Simple.
- The member noted a discrepancy between the website score and the score in the git repo which reports a 94.
Codeium (Windsurf) â· #announcements (1 messages):
Windsurf, Cognition, Devin, AI coding
- Windsurf Merges with Cognition!: Windsurf has announced that they are joining forces with Cognition, the creators of Devin, to reinvent AI coding.
- The acquisition aims to combine Cognitionâs autonomous agents with Windsurfâs agentic IDE to create breakthrough developer experiences.
- Human-AI Collaboration is Key!: Windsurf states that they have always believed the future of software development lies in human-AI collaboration.
- They say this collaboration will lead to true amplification of developer capabilities, not just automation.
- Windsurf and Cognition Shape the Future of AI Coding: Two world-class teams are joining forces to shape the next era of AI coding.
- The announcement video and the YouTube Link provide further details.