Fire-and-forget is all you need.

AI News for 5/16/2025-5/17/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (214 channels, and 3392 messages) for you. Estimated reading time saved (at 200wpm): 298 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Lots of people will be covering the Codex launch today, so we will just leave you with the Latent Space writeup and podcast:

AI Twitter Recap

AI Model Releases and Updates

OpenAI’s Codex, a cloud-based software engineering agent, is now available in research preview for Pro, Enterprise, and Team ChatGPT users. It’s powered by codex-1, a version of OpenAI o3 optimized for software engineering, and can perform tasks in parallel, such as refactoring, bug fixing, and documentation. @sama, @kevinweil, @omarsar0, @iScienceLuvr, and @OpenAI
Codex CLI has been improved with features like quick sign-in with ChatGPT and a new model, codex-mini, optimized for low-latency code Q&A and editing. @OpenAIDevs
Gemma 3 is recognized as the best open model runnable on a single GPU. @osanseviero
Runway has released the Gen-4 References API for applying a reference technique or style to new generations. @c_valenzuelab
Salesforce has released BLIP3-o, a family of fully open unified multimodal models with a novel approach using a diffusion transformer to generate CLIP image features. @_akhaliq, @iScienceLuvr
Qwen 2.5 models, including 1.5B (Q8) and 3B (Q5_0) versions, have been added to the PocketPal mobile app for both iOS and Android platforms. Users can provide feedback or report issues through the project’s GitHub repository, with the developer promising to address concerns as time permits. The app supports various chat templates (ChatML, Llama, Gemma) and models, with users comparing performance of Qwen 2.5 3B (Q5), Gemma 2 2B (Q6), and Danube 3.
Marigold IID, a new sota open-source depth estimation model, has been released, which includes normal maps, depth maps of scenes & faces. @mervenoyann

Research and Papers

DeepSeek has published insights into DeepSeek-V3, detailing scaling challenges and hardware considerations for AI architectures. @arankomatsuzaki, @_akhaliq
Google has introduced LightLab, which controls light sources in images using diffusion models. @_akhaliq
Google DeepMind’s AlphaEvolve uses Gemini 2.0 to discover new math and cuts Gemini cost by 1% without RL. @_jasonwei, @demishassabis, @_philschmid, and @swyx
Omni-R1 explores the necessity of audio for fine-tuning audio LLMs. @_akhaliq
Qwen introduces a parallel scaling law for language models, drawing inspiration from classifier-free guidance (CFG) and suggesting that parallelizing into P streams equates to scaling the model parameters by O(log P). @iScienceLuvr
Salesforce released Lumina-Next on Qwen base, which slightly surpasses Janus-Pro. @teortaxesTex
A new paper has found that LLM performance degrades in multi-turn conversations due to increased unreliability. @_philschmid
J1 is incentivizing thinking in LLM-as-a-Judge via RL. @jaseweston
A new study from Qwen finds a strong correlation between question similarity and strategy similarity, enabling the prediction of optimal reasoning strategies for unseen questions. @omarsar0
Researchers have significantly improved a large language model’s reasoning by fine-tuning it on just 1,000 examples. @DeepLearningAI
Together AI has acquired Refuel AI, specializing in models and tools for turning unstructured data into clean, structured input for AI applications. @togethercompute
Analog Foundation Models: a general and scalable method to robustly adapt LLMs for execution on noisy, low-precision analog hardware. @iScienceLuvr

AI Tools and Platforms

Hugging Face Spaces is emerging as the “app store of AI”, with many apps functioning as MCP Servers. @mervenoyann
LlamaIndex has a new memory implementation that takes a block-based approach to long-term memory. @jerryjliu0
Perplexity is seeing growth in hotel bookings natively on its platform, potentially disrupting the ad industry. @AravSrinivas
AI Sheets is an AI Agent that can analyze data, generate charts, summaries, and reports. @svpino
Ollama v0.7 now supports multimodal models. @ollama
Windsurf’s in-house AI for developers
Cline is an AI coding tool that amplifies senior engineers by elevating them to architectural roles, focuses on fundamentals, collaboration, and strategic AI use. @cline

AI Engineering and Development Practices

Best practices for AI coding include collaborating strategically with AI, planning before coding, managing the context window, using capable models, and providing persistent knowledge through Rules Files & Memory Banks. @cline
The integration between Transformers and MLX is expected to deepen, as Transformers serves as a source-of-truth. @awnihannun
It is theorized that algorithmic advances may be bottlenecked by compute. @EpochAIResearch
To make sure your AI agent is not bullshitting you, you need to evaluate its reasoning. @clefourrier
Emphasize tasks about understanding, eg @saprmarks’ auditing game, which helps the most. @NeelNanda5
AI’s ability to make tasks faster is underrated in creating business value, particularly in coding, where it reduces effort and shortens the time from idea to prototype. @AndrewYNg
To scale test time compute by building search: Little search, Greedy as hell search, Narrow and deep search, Shallow and broad search, Approximate search, Exact search, Hybrid search, Searching by offloaded computation. @mbusigin

AI Safety and Governance

Cohere highlights the importance of enterprises turning to secure and private AI solutions. @cohere
Concerns persist regarding AI safety and transparency, especially after an unauthorized modification was made to the Grok response bot’s prompt. @svpino

Events

Anthropic is hosting a social in NYC in mid-June for quants interested in a career jump. @andy_l_jones
Daniel Hanchen will be at AI Engineer World’s Fair to talk about RL, GRPO, dynamic 1.58bit quants for DeepSeek R1 and cool tips and tricks + more. @danielhanchen
The Keras team is hosting a celebratory get-together on Wednesday May 21, 2025 at 6pm, in downtown Mountain View, for the 10-year Keras launch anniversary. @fchollet
LangChain’s first industry conference in San Francisco with stories of teams building agents. @LangChainAI
Together AI will be at Dell Tech World May 19–22 showcasing how top AI teams run faster and more efficiently with Together AI from training to inference. @togethercompute

Memes and Humor

An unauthorized modification was made to the Grok response bot’s prompt”: @sama, @svpino, @nearcyan
Looking a lot like lorde has spurred new flirting tactics: @typedfemale

AI Reddit Recap

/r/LocalLlama Recap

1. LLM-Integrated Operating Systems and Edge Devices

I built a tiny Linux OS to make your LLMs actually useful on your machine (Score: 144, Comments: 39): The post introduces llmbasedos, a minimal, open-core (Apache-2.0) Arch Linux-based OS that exposes local features (filesystem, mail, sync, agent workflows) to any LLM frontend via a JSON-RPC spec called Model Context Protocol (MCP). The system is composed of a FastAPI-based MCP gateway for routing/LLM proxying and modular Python daemons (for file system, mail, sync, “agent” workflows), using auto-discovery (.cap.json), and supports offline and online LLMs (e.g., llama.cpp, GPT-4o). The design enables rapid (<50 lines) addition of new system capabilities, promoting a clean, plugin-free dev interface for LLM-augmented local automation. Expert commenters discuss deployment models (USB stick vs. VM vs. desktop install), compare the approach with Docker/container-based isolation, and raise questions about where security/sandboxing boundaries should exist (inside the MCP server, OS-level containerization, restricted APIs), and request more concrete usage and access control examples. Concerns are also expressed over the perceived switching cost compared to integrating similar stack in a user’s current OS or container.
- Several commenters debate optimal usage scenarios for the OS, such as running it in a VM, isolating it via sandboxing (using QEMU, containers, or Docker), and mechanisms for constraining file access (e.g., Linux mount namespaces, restricted users, or in-app server constraints). This technical discussion centers on achieving a secure, minimal-permission setup for running LLMs locally and how containerization vs. a full custom OS might differ in security and usability.
- A user inquires about approaches for managing memory overhead when chaining local LLM agents or plugins, highlighting advanced features like snapshotting execution state to accelerate context switches. This references ongoing work at InferX and explores whether stateful snapshotting techniques could reduce overhead when loading LLM contexts or switching between workloads.
- There is interest from Windows users in making the OS available as a USB-bootable distro, enabling AI workflows on consumer hardware without altering the user’s main operating system. This raises compatibility and deployment considerations, particularly with regard to persistent storage, hardware support, and accessibility for non-Linux-native users.
LLM on a Walkie Talkie (Score: 108, Comments: 27): The post details a pipeline integrating Whisper for ASR, vllm on a solo server, Llama 3.2 for local LLM inference, and Cartesia TTS to converse with a user via Baofeng walkie talkie, facilitated by Digirig Mobile and a MacBook Pro. This enables full-duplex LLM conversation and audio transcription over analog radio, aiming at AI access in low-connectivity or rural environments and radio transcription. Potential applications noted include rural farmers interfacing via radio with AI-driven assistants for local automation and information access. A technical commenter noted the input gain was set too high, which can severely impact ASR performance and usability in RF and acoustic interfaces.
- A user proposed integrating LLM-powered voice assistants into agricultural or rural environments, enabling voice-based control or monitoring (e.g., farmers using voice to interact with livestock monitoring systems or trigger automated actions) via walkie talkie or similar radio devices.
- Another user described an alternative technical approach using LoRa (Long Range radio) with devices like Raspberry Pi—eschewing voice for text-based transmissions. They outlined a system where cheap city-wide computers broadcast riddles/clues via LoRa, responding to queries via text prompts, and raised the issue of necessary ethical safeguards against potentially dangerous behavior from the language model, such as impersonating vulnerable individuals.
- A concrete application suggestion involved voice-driven querying of weather information: leveraging public APIs (specifically referencing weather.gov’s “lat,lon” to weather data endpoints) to enable hands-free retrieval of hyperlocal forecasts via the LLM on walkie talkie interface.

2. Recent LLM/AI Model and Platform Security, Policy, and Compliance News

Did Standford HuggingFace account got Hacked? (Score: 341, Comments: 48): The image is a screenshot displaying apparent unauthorized activity on the Stanford HuggingFace account, including a user (‘afjiash’) publishing a model and collections with offensive titles, strongly suggesting the account was compromised. Reports in both the image and comments indicate HuggingFace has since removed the offensive repositories but some remnants (such as collection names) still persist, highlighting shortcomings in platform moderation or recovery. This incident raises concerns about security practices for high-profile academic accounts on model-sharing platforms like HuggingFace. Commenters confirm prompt action by HuggingFace in removing the worst content while noting persistence of some offensive artifacts and discuss broader implications for security and content controls on open model hubs.
- A technically substantive comment notes that Hugging Face deleted the offensive repositories from the Stanford account, but points out that the altered collection names and the fake AGI model are still visible. This suggests only partial remediation on the platform and that leftover artifacts of the attack remain, indicating either incomplete moderation or lingering effects from the incident.
Stanford has dropped AGI (Score: 284, Comments: 155): A post notes that the Hugging Face page for the supposed ‘Stanford/Rivermind-AGI-12B’ model (https://huggingface.co/Stanford/Rivermind-AGI-12B) now returns a 404 error, implying that the resource is inaccessible or has been retracted; no technical documentation, model card, or benchmarks are present. Comments reference the supposed scale (‘over 2048 BFG9000s’) and cost (‘$12T Dollars’), but these are facetious and not corroborated by any external sources or documentation. There is no substantive technical debate in the comments; they are exclusively humorous and satirical regarding the nonexistent or fictional nature of both the model and its removal.
Ollama violating llama.cpp license for over a year (Score: 336, Comments: 114): Ollama is accused of violating the MIT license of llama.cpp by not including the required copyright and license notices when distributing binaries, despite providing them with source downloads. Benchmarking and technical discussion reference standard binary license compliance in Linux ecosystems (e.g., Debian’s practice of bundling licenses in /usr/share/doc). This incident underscores a wider issue where open-source projects and binary distributors often lack rigor in adhering to minimal notice requirements specified in MIT and similar permissive licenses. Top comments highlight frustration at Ollama’s prolonged non-compliance compared to projects like continue.dev, question the reasoning behind this omission given the product builds on llama.cpp, and suggest with llama.cpp’s new multimodal features, Ollama’s differentiating value diminishes further.
- Technical discussions highlight that Ollama has allegedly failed to provide appropriate attribution to llama.cpp for over a year, in contrast to projects like continue.dev which visibly credit upstream dependencies. The debate underscores the technical and ethical importance of upstream credit in open-source software, especially when building commercial or VC-targeted solutions.
- A key licensing debate surfaces: some users comment that the MIT license (used for llama.cpp) does not require attribution, thus making it easier for businesses to use the software without proper credit, but sacrificing goodwill within the developer community. In contrast, licenses like LGPL, GPL, or AGPL would require more explicit obligations, particularly around distributing modifications or attributions, potentially protecting upstream developers from this type of oversight.
- A technical observation is made that since llama.cpp has added real multimodal support, the practical technical necessity of Ollama as a wrapper or enhancer has diminished, further raising questions about the value Ollama provides versus directly engaging with the core llama.cpp project.

3. New LLM Model and Feature Releases (Ollama & Falcon-E) and Industry Progress Discussions

Ollama now supports multimodal models (Score: 161, Comments: 95): Ollama v0.7.0 introduces native multimodal model support via a new engine written in Go that integrates the underlying GGML tensor library directly, moving away from dependence on llama.cpp. This update enables support for vision-capable models (e.g., Llama 4, Gemma 3, Qwen 2.5 VL, Mistral Small 3.1), introduces WebP image input, and delivers notable performance and reliability improvements, especially for model import (“safetensors”), MoE models on Mac, and API error handling. For technical details, see the release notes and the blog post. Comments clarify that Ollama’s multimodal support now runs on a GGML-based Go engine rather than llama.cpp, and debate whether Ollama previously supported multimodal input or if this marks a substantive architectural shift for future extensibility and capability (e.g., beyond image, toward audio and video modalities).
- ab2377 clarifies that Ollama is transitioning away from using llama.cpp for backend model inference and is instead implementing direct integration with the GGML tensor library in Go. This technical shift is aimed at making multimodal support foundational, rather than layered on top, intending to improve inference reliability, accuracy, and future-proofing the platform for additional modalities like speech and video generation. (Ollama Blog Post)
- HistorianPotential48 and robberviet note that Ollama has supported multimodal (text+image) inputs since version 0.6.x, with users reporting successful image prompts using models like Gemma3 well before this announcement. The recent update is therefore more about architectural changes than first-time multimodal capability.
- sunshinecheung and bharattrader point out that llama.cpp now also supports multimodal models, suggesting that the technical distinction or competitive advantage for Ollama may be less significant since this feature has become more common in the ecosystem.
Falcon-E: A series of powerful, fine-tunable and universal BitNet models (Score: 145, Comments: 36): TII released Falcon-Edge (Falcon-E), a set of compact BitNet-based language models with 1B (600MB) and 3B (900MB) parameters. The models support bfloat16 reversion with minimal degradation, show superior performance compared to similar-size models like SmolLMs, MS BitNet, and Qwen3-0.6B, and achieve roughly Qwen3-1.7B performance at 1/4 the memory use. Accompanying the release is a fine-tuning library, onebitllms (GitHub), and more details and benchmarks are shared in official blogposts and their HuggingFace model collection. Technical commenters critique the reported memory and performance comparisons, arguing that comparing Falcon-E to FP16 models is misleading and suggesting that 4-bit quantized models (e.g., Qwen3 1.7B at ~1GB) are a fairer baseline, as quantized models are more commonly deployed. There is skepticism about whether Falcon-E significantly outperforms strong quantized baselines, but some note Falcon-E-3B has similar size but higher performance than quantized competitors, raising the prospect of superior scaling or architecture advantages.
- Commenters raise concerns about Falcon-E’s comparisons to other models, noting that Falcon-E is being compared to FP16 models, which overstates its memory efficiency advantage. Realistically, 4-bit quantization (like q4_0) is common, so Falcon-E’s real-world memory advantage over models like Qwen or Qwen3 is much less pronounced (e.g., 2GB vs 1GB, not 6GB vs 1GB).
- There is skepticism about Falcon-E’s omission of direct comparisons with post-training quantized models (e.g., 4-bit or 2-bit QAT models). Some believe this is because the performance of Falcon-E would not be significantly better than quantized models of similar size; however, some preliminary observations suggest Falcon-E-3B may outperform 4-bit models of comparable size, warranting further testing.
- A technical debate questions the practical value of BitNet models in the 1B–3B parameter range, as running vanilla transformers at that scale is already resource efficient. There is a suggestion that proof-of-concept and resource constraints are insufficient justification for this focus, and that training larger models (7B, 14B) would be more impactful for the community.
Are we finally hitting THE wall right now? (Score: 250, Comments: 227): The original post discusses the perceived plateau in LLM progress, noting Meta’s Llama Behemoth delay and incremental gains seen with Llama 4 and Qwen 3—despite Qwen’s unprecedented use of 36T tokens and diverse post-training. The author points out that innovations like reinforcement learning (RL), e.g., Deepseek R1/R2 and OpenAI’s O-series, show diminishing returns, with major LLMs like Claude Sonnet and Gemini Pro exhibiting limited performance improvement outside narrow domains (e.g., programming). The author questions if further RL scaling, architectural changes (e.g. T5), or radically novel designs (e.g. Yann LeCun’s JEPA) are needed, expressing concern about persistent reliability issues even with advanced fine-tuning (SFT/RL with GRPO) in production. Top comments introduce the need for a truly multimodal, byte-based foundation model as a potential new paradigm, and assert that perceived stagnation is due to unexploited software methods, not hardware or theoretical limitations. One commenter strongly argues Claude Sonnet 3.7’s advantages over 3.5 are in reasoning—a capability not triggered in simple chatbot tasks—and criticizes shallow benchmarks focused primarily on conversational use.
- One commenter emphasizes the appeal of a high-quality, multimodal, byte-based embedding model capable of processing text, audio, video, and images. They argue such a model would allow downstream applications to be developed efficiently and at lower cost, suggesting this paradigm would fundamentally shift current AI development approaches toward greater universality and flexibility.
- Discussion notes that Qwen models are achieving comparable performance to larger competitors with a fraction of the parameters. This highlights an industry shift toward parameter efficiency while maintaining, or even improving, reasoning capabilities. The focus is now transitioning from basic advancements to more efficient agentic system architectures that use multiple LLM workflows to solve complex problems, suggesting a maturation of LLM technology rather than stagnation.
- There is deep debate about Claude Sonnet 3.7’s improvements over 3.5: specifically, 3.7 introduces more advanced reasoning training and additional inference parameters dedicated to complex reasoning tasks. However, these improvements are not apparent in everyday chatbot tasks, only surfacing in demanding reasoning scenarios, underscoring the need to diversify benchmarks beyond typical chatbot use cases.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. OpenAI and Claude New Feature & Research Preview Discussions

Rsearch preview confirmed (Score: 169, Comments: 58): The image is a tweet from OpenAI announcing an upcoming livestream in two hours, notably confirming that a ‘research preview’ is active by stating ‘low_key_research_preview = True.’ This coincides with recent competitive developments such as Google’s release of the AlphaEvolve paper, and suggests OpenAI is poised to announce or demonstrate new research or technology shortly. The community is debating the meaning of ‘low key research preview,’ speculating whether it’s a new chatbot, API, reasoning model, or another product. Commenters are comparing this event with recent AlphaEvolve advancements from Google, with discussions on whether OpenAI’s response will be substantial. There is also confusion about the meaning of ‘low key research preview,’ indicating lack of clarity in OpenAI’s teaser messaging.
- There is mention of Google releasing its AlphaEvolve paper, suggesting recent advancements at Google in AI research and implying OpenAI may be responding with their own new results or models. This alludes to ongoing competition involving cutting-edge research and model releases by leading AI labs.
- Questions are raised regarding ‘low key research,’ specifically what kind of AI or system is being discussed—such as whether it refers to a chatbot, API, reasoning model, or something closely related to other entities like ClaudeCode or MCP (possibly referencing ‘Master Control Program’). This underscores a community interest in technical distinctions between new AI system architectures or functionalities being previewed.
Sama has a new research preview? (Score: 354, Comments: 81): The image displays tweets from Sam Altman teasing an imminent ‘research preview’ launch, with speculation over the name and identity of the release. Discussion in the comments centers on possible models, with references to ‘gpt-4.l’ and ‘7k’, and differentiation from ‘gpt-4.1’, suggesting this may be a new version or variant of GPT-4.1 or an undisclosed model. Technical users also infer competitive timing against anticipated Google announcements. Commenters speculate that this release could be a low-key research preview with capabilities like a 7k context window, openly questioning what distinguishes it from GPT-4.1, and discussing the release’s strategic timing in the context of rivalry with Google.
- Several commenters highlight “gpt-4.l” as the model name mentioned in the preview, explicitly differentiating it from “gpt-4.1”, which may address confusion in technical discussions or in tracking model iterations.
- One user refers to the preview as “low-key-research-preview-high-7k,” suggesting the context length or token window may be 7,000 tokens—implying a potential technical improvement over prior 4k or 8k limits commonly discussed in large language models.
- The absence of references to “o3 pro” in the context implies that users are actively tracking model release roadmaps and may be awaiting updates or benchmarks for models previously advertised but not yet available.
Claude Code is a Beast – Tips from a Week of Hardcore Use (Score: 135, Comments: 32): The post provides a detailed technical report on intensive Claude Code use (Claude Pro MAX subscription) for 12-hour daily development sessions. Key findings include: (1) no rate limits encountered; (2) Claude Code initially failed by making up solutions but improved dramatically once given explicit, evolving instructions in a ‘CLAUDE.md’ rules file—especially for dealing with package breaking changes; (3) frequent manual use of the ‘/compact’ command stabilizes context management and prevents file recreation or forgotten steps during larger code changes. The author also combines Claude Code with OpenAI’s o3 for nuanced planning. Linked best practices recommend workflows involving explicit ‘think’ commands to trigger more in-depth planning, sub-agents for complex tasks, and using multiple ‘CLAUDE.md’ files to preserve context in large projects (see Anthropic’s best practices). Commenters emphasize manual compacting to avoid context loss (as auto-/compact can be problematic, especially in auto-accept mode). Advanced workflow suggestions include multi-step processes: reading files (but deferring code), explicit planning with thinking-level commands (“think hard”, etc.), and staged commits to increase reliability. There’s also a minor debate on whether writing plans to markdown gives better results than relying solely on Claude’s local memory.
- A detailed workflow for maximizing Claude’s coding output is outlined, emphasizing the importance of separating exploration/planning from direct code generation. Users are encouraged to prompt Claude explicitly to ‘think’, ‘think hard’, or ‘ultrathink’, which corresponds to increasing computational reasoning depth in Anthropic’s system—this technique allocates more “thinking budget,” thereby improving code quality for complex tasks. The workflow also leverages subagents for focused investigation, reducing context loss in larger or more complicated tasks.
- The discussion highlights using multiple CLAUDE.md files to maintain project-specific context across large codebases, effectively simulating memory or state retention beyond Claude’s typical input limits. This method is particularly valuable in multi-component or long-running development tasks where persistent context is essential for performance and continuity.
- There’s a technical debate over whether documenting plans as markdown files (external, persistent state) delivers better results than relying on Claude’s session-local memory, suggesting that explicit artifact creation (like a markdown plan) can be advantageous for traceability, debugging, and iterative refinement, especially when code implementation diverges from initial plans.
continuing the trend of badly naming things (Score: 265, Comments: 26): The image critiques OpenAI’s inconsistent and confusing naming conventions for AI products—specifically referencing ‘Codex,’ the model behind code generation tools. It notes how new releases, such as models powered by ‘codex-1,’ share names with previous offerings, making version tracking and differentiation difficult for developers and users. Commenters debate whether the naming confusion is intentional or an internal joke among OpenAI staff. Some argue recent conventions are improved, while others highlight ongoing frustration and the need for clearer model versioning.
- The discussion touches on the alignment between product naming and functionality, with a user noting that naming a coding model after the coding CLI demonstrates a logical and consistent structure. This reduces confusion compared to past naming conventions which were seen as more arbitrary or marketing-driven.

2. Job Automation and AI’s Impact on Careers

AI replaces programmers (Score: 197, Comments: 218): The image depicts a software engineer who, despite 20 years of experience and a previous $150,000 salary, lost his job to AI-driven automation and is now working as a courier and living in a trailer. The story contextualizes the impact of AI on experienced programmers and highlights the challenge of re-employment, noting that from 800 applications, none were successful, and that interviewers themselves are now often AI. The Fortune article cited (link) explores this as a broader trend of displacement in tech roles. Commenters question the plausibility of the story, focusing on financial mismanagement and underlying skill gaps rather than solely blaming AI. There is skepticism that an experienced engineer would be unemployable unless either their role was obsolete or their skills did not meet current industry demands; some argue that the narrative oversimplifies the factors behind his predicament.
- A key technical discussion point centers on whether the individual’s job role has become obsolete due to technological shifts (such as advancements in AI), or whether their skill set lacks adaptability to new in-demand tech roles, questioning if AI is truly to blame for their job market struggles.
- There is skepticism about the plausibility of not securing any job offers after applying to 800 positions with 20 years of experience, suggesting that core competencies or networking (rather than market conditions or AI advances) may be the real limiting factors in reemployment.
- The comments imply that technical career longevity now depends not only on technical skill but also on networking and adaptability to new roles, especially as some specializations (e.g., roles related to the metaverse) become less relevant or valued in the wake of emerging AI technologies.
An Italian AI Agent just automated job hunting (Score: 281, Comments: 38): The post highlights an Italian-developed AI agent that automates the process of job hunting, implying end-to-end automation in discovering and potentially applying to jobs. No technical implementation details, such as ML model type, application workflow, or architecture, are provided in the post or visible external content due to restricted access. No benchmarks, product demo data, or evidence of efficacy is discussed. Top comments express skepticism and concern over the impact on the job market, with a notable technical observation that employer-side counter-AI measures could become a future growth area. Commenters frame the development as a symptom of a declining authenticity in online activity (‘dead internet’).
- Discussion highlights skepticism about the technical claims in the referenced video, with some users asserting there’s no demonstration of genuine end-to-end job automation visible in the presented material. The critique implies that the purported AI agent hasn’t shown evidence of reliably automating tasks such as candidate-to-job matching, industrial-scale sourcing, or successful job placement workflows.
- There is anticipation of potential adversarial products, as users speculate that automated job hunting may incentivize employers to adopt AI-powered screening tools to detect candidates using automated services, potentially leading to an arms race in applicant filtering and anti-bot technology.
“AI will make Everyone more efficient!” (Score: 1016, Comments: 39): The image is a comic strip satirizing the use of AI assistants (like Co-pilot or ChatGPT) in enterprise documentation workflows, specifically highlighting the absurd inefficiency where an employee generates a long report from a short list of bullet points using AI, only for another to immediately use AI to condense it back into bullet points. This commentary underscores concerns about the redundancy and potential for automatable ‘fluff’ in corporate processes, questioning the actual efficiency gains realized by integrating generative AI into documentation cycles. The post further contemplates whether enterprises will acknowledge and act on the demonstration of excess or unnecessary documentation that these AI tools reveal. Commenters discuss the risk of job losses due to increased automation (such as the potential for Microsoft to cut thousands of jobs), reductions in busywork, and the possible transformation of assessment methods in education towards more exam-focused models, akin to certification testing, as AI further trivializes routine assignments.
- One commenter notes that large language models (LLMs) could address inefficiencies in organizational communication, specifically within middle management, by automating or clarifying team interactions that often suffer from miscommunication and internal politics. However, entrenched power dynamics and resistance to transparency may hinder adoption in traditional companies, potentially giving newer, tech-savvy firms a competitive advantage.
- There is a prediction that education and assessment may shift in response to widespread AI use—assignments and homework could become less relevant, with final exams resembling industry certifications as the main metric for evaluation. This reflects a perspective that automation may force a rethinking of what and how knowledge is tested, with possible implications for academic rigor and learning approaches.
Gotta be a SWE agent , right? (Score: 189, Comments: 58): The post speculates on the imminent release of a “SWE agent” (Software Engineering agent), possibly as an upcoming feature for ChatGPT Plus users, contrasting it with the previously limited availability of the “operator” feature. Discussion responds to a teaser image indicating an announcement is likely due the next day, with some commenters expressing concern about the workforce impact of such AI systems for computer science graduates. Notable debate centers on job security for future software engineers due to automation and advanced AI agents, indicating concern about the rapid evolution of agents’ programming capabilities.
- Speculation centers on a forthcoming agent—possibly named “Agent-1” or related to the “o4-research-high-mini” project—potentially built on advanced architectures such as GPT-5 or “o5 (five-five)”. Key predicted features include the ability to autonomously generate complete, small-scale applications, seamlessly interact with a wide array of software across PC and mobile platforms, and dynamically alternate between CoT (Chain-of-Thought) reasoning, stepwise tool use, or soliciting human/agent input as fallback modes, suggesting a highly flexible, context-aware approach.
- A highly technical prediction claims the agent showcases advanced AGI capabilities: specifically, an “AGI timeline 97%” and references to “HLE, frontier math, arc AGI 2>50%”. This implies the agent may reflect significant performance gains on benchmarks tied to high-level reasoning (HLE), math, and ARC (Abstraction and Reasoning Challenge), possibly exceeding existing LLM performance standards.
- There is an additional hint at deployment context, with a mention of “deployed in windsurf”—potentially suggesting containerized or orchestrated deployment in a cloud or microservices environment. No hard evidence or official benchmarks were shared, but discussion is highly speculative and references sophisticated integrations.
YouTube is now training its AI to play ads right after “peak” moments in videos (Score: 228, Comments: 90): YouTube is implementing an AI-driven system (leveraging Google’s Gemini model) to detected ‘peak’ viewer engagement moments in videos and intelligently insert ads directly after these segments, aiming to maximize ad impact. The system utilizes attention analytics to pinpoint where users are most engaged—information previously accessible through ‘most replayed’ video stats—suggesting a more dynamic, context-aware ad placement strategy that may disrupt viewer experience more than current random or time-based ad insertions. Image reference. Commenters note that ad blockers are evolving alongside these AI tactics, raising questions about the arms race between platforms and users. Some users feel that similar behavior is already apparent in ad placements, while others question the real technical novelty given existing viewer engagement data (e.g., ‘most replayed’ marks), suggesting the practical difference may be marginal.
- Commenters critique YouTube’s AI ad placement by noting the platform already identifies ‘most replayed’ segments, suggesting current analytic tools suffice for flagging peak moments without advanced AI. This raises skepticism about the real value-add of YouTube’s claimed AI-driven approach versus straightforward heuristics.
- Discussion includes suggestions to develop open source AI tools explicitly designed to detect and remove or skip ads from YouTube videos, indicating an adversarial technological ecosystem where ad-blocking and ad-placement AI continuously evolve against each other.
Google presents LightLab: Controlling Light Sources in Images with Diffusion Models (Score: 191, Comments: 26): Google’s LightLab introduces a method leveraging diffusion models for interactive, physically plausible manipulation of light sources in single images. Users can control number, position, and characteristics of virtual lights in a generative workflow, demonstrating state-of-the-art results in photorealistic, user-guided illumination editing. The implementation details—including integration with popular vision/editing frameworks and quantitative benchmarks—are outlined in the project site (project site), but as of posting, code and weights have not been released. Commenters are divided: some request open access to code/weights, while others debate the practical impact and originality compared to existing illumination editing techniques.
- One user discusses hands-on experience with Stable Diffusion XL (SDXL), emphasizing that current open models already provide fine-grained control over lighting in generated images. They criticize Google’s offering for its ‘insanely censored’ outputs, particularly noting inability to generate certain family scenes, and question the global accessibility and likely monetization of Google’s model.
- Another commenter asks whether LightLab utilizes raytracing, which raises the technical question of whether Google’s approach simulates physical light transport (as in computer graphics raytracing) or employs learned, data-driven approximations via diffusion models.
Unitree robots in Hangzhou are training for the world’s first MMA-style “Mech Combat Arena.” Four teams will control the robots with remotes in real-time competitive combat. The event will be held in late May and broadcast live on Chinese TV. (Score: 281, Comments: 56): Unitree Robotics is organizing the first live MMA-style ‘Mech Combat Arena’ in Hangzhou, China, featuring four teams remotely operating Unitree robots in real-time competitive matches. The event is scheduled for late May and will be broadcast live on Chinese television. No technical details about the robot models, control systems, or competition format are given in the post; the external link is inaccessible (403 Forbidden). Top comments highlight the resemblance to fictional robot combat (e.g., ‘Real Steel’, ‘Cyberpunk 2077’) and praise Unitree robots’ capabilities, but do not provide substantive technical discussion.
- The upcoming event will feature Unitree robots operated in real-time by four human teams using remote controllers, indicating a focus on competitive human-robot teleoperation rather than AI-based autonomy. This setup emphasizes rapid response, low latency communication protocols, and robust control link reliability in a dynamic, combat-oriented environment.
- The announcement underlines Unitree’s progress as a robotics company, demonstrating their quadruped robots’ versatility and resilience in high-stress, physically demanding tasks. The event can serve as a public benchmark for locomotion stability, durability under impact, and overall hardware robustness in semi-controlled but unpredictable scenarios.
- Broadcasting the competition live on Chinese TV suggests the organizers’ confidence in the robots’ reliability and the technical infrastructure for real-time video feeds, remote management, and fail-safes—potentially highlighting advances in wireless networking, edge computing, and safety mechanisms tailored for robotics tournaments.

3. AI-Generated Personalized Images: Reddit Username & Identity

I asked ChatGPT to make me an image based on my Reddit name and it’s ADORABLE! 🥰 (Score: 5021, Comments: 7317): The image demonstrates the use of ChatGPT (likely via OpenAI’s integration with DALL-E or another image generation model) to create a personalized art piece based on a user’s Reddit username. The generated artwork showcases advanced prompt interpretation, translating textual/semantic identity cues into visual representation—including context-aware costume, environment, and fantasy elements. Such results highlight improvements in multimodal models’ ability to parse user context and creatively synthesize relevant, detailed imagery. No significant technical debate is present; commenters mostly react to the visual output, noting its creativity and detail, implying satisfaction with the generative model’s contextual understanding.
- Several users are sharing AI-generated images, apparently created by ChatGPT (or an image generation plugin/tool associated with it), each based on their Reddit usernames. The links point to varied image file formats (jpeg, png), typically hosted via Reddit’s preview CDN, highlighting the technical workflow from prompt-to-image and the sharing process.
- The image URLs contain parameters for resolution (width), output format (format), and a string presumably used for content verification or deduplication (s=), indicating automated server-side image optimization for delivery. This points to the infrastructure Reddit uses for hosting and serving user-submitted and AI-generated images efficiently.
- There is an implicit technical discussion regarding the fidelity and creativity of current AI text-to-image generation when used with only a Reddit username as input. The images suggest varied interpretations and visual output, underscoring model strengths and potential ambiguities when input prompts are sparse or non-descriptive.
Create an image of me based on my Reddit name (Score: 770, Comments: 860): The image is an AI-generated portrait requested by a Reddit user based on their Reddit handle, depicting them as an outlaw in a mugshot. The visual details—a cowboy hat, jacket, cigarette, and a board displaying the username (‘WRITER-HOE-DOWN’)—illustrate the model’s capacity to interpret textual prompts and context (usernames) into thematically appropriate, high-detail visual outputs. The image references are also shared by other users, suggesting a trend of generating similarly themed, personalized AI images in the thread, likely leveraging advanced AI image models for synthesis. Commenters note varied AI interpretation: one user claims the result ‘looks just like me,’ while another highlights an inaccurate depiction (‘It made me a Luchador’), demonstrating ongoing debates about prompt adherence and identity resemblance in AI generative models.
- Several users provide links to AI-generated images, implying the use of generative vision models, though no specific model is mentioned. The images reference different visual interpretations of usernames, showcasing the capacity of current image synthesis systems to create custom avatars or illustrations based on text input.
- One user’s comment notes that the generative model created a ‘Luchador’ in response to their Reddit name, indicating that the AI is parsing and incorporating cultural or thematic elements from usernames into its outputs. This highlights the model’s capacity for contextual reasoning and creative visual association.
- There is no direct discussion of technical models, implementation details, or benchmarks in the comments, but the post illustrates practical use cases for text-to-image pipelines, suggesting abilities and limits in interpreting short string prompts (usernames) in generative art systems.
I asked Chad Gepetti to create an image based on my username (Score: 266, Comments: 45): The post humorously references ‘Chad Gepetti,’ a tongue-in-cheek nickname blending ‘ChatGPT’ with a more human-sounding name, suggesting the AI generated a photorealistic beach scene based on the user’s handle. The image shows a man relaxed on a tranquil shoreline, embodying generative AI’s capacity for producing highly detailed, context-specific imagery. There is no technical discussion or benchmarks regarding the model used for image generation; the post is primarily a light-hearted showcase of AI-generated art. A top comment humorously suggests the image is too realistic to be AI-generated, implying high fidelity in current generative models. Another jestfully nominates ‘Chad Gepetti’ as the name for the next large AI model, reflecting ongoing community debates on anthropomorphizing AI systems.
- One user humorously suggests ‘Chad Gepetti’ should be the name of the next AI model, hinting at ongoing community engagement around model naming conventions and the impact of branding/personality in AI models (referencing models like GPT-3 or ChatGPT).
- Shared images generated by AI (notably from preview.redd.it links) demonstrate the community’s interest in model output diversity and image interpretation, indirectly referencing the capabilities or creative direction of current text-to-image models. No specific benchmarks, but user preferences are reflected in informal critique and sharing.
I uploaded this selfie to ChatGPT and asked it to make me look like I’m wearing a suit and tie so that I could use it for my LinkedIn profile. 😂 It actually did a better job at replicating myself than I thought it would. (Score: 290, Comments: 153): A user reports uploading a selfie to ChatGPT (presumably via the new multimodal GPT-4o vision capabilities) and requesting the model to modify the image so they appear in a suit and tie, suitable for LinkedIn. The resulting image allegedly preserved realistic facial features—though commenter feedback indicates noticeable changes (e.g., improved jawline, altered identity)—highlighting both the strengths and current limitations of AI-driven photo manipulation for personal identity and professional use cases. No linked code, benchmarks, or detailed architectural discussion was present. Commenters note the model’s generative bias, observing that the output sometimes idealizes features (e.g., a better jawline) or alters identity, raising concerns over realism and accuracy in personal/professional image modifications.
- A key technical point is that AI-driven photo transformations produce more convincing and precise results when the input image is taken from a ‘better picture from the correct angle,’ as noted by Brian_from_accounts. If the original selfie is at an unconventional or oblique angle, the AI model (often a diffusion or generative model) must infer face geometry, leading to artifacts or facial feature distortions. Thus, control over image quality and pose is critical for high-fidelity transformation outcomes.
I asked ChatGPT to show me some clear sign that I’m getting older … (Score: 479, Comments: 32): The post contains a meme image depicting a mock road sign stating ‘YOU’RE GETTING OLDER’, which visually represents the user’s prompt to ChatGPT for a clear sign of aging. This is a humor-based, non-technical image and does not present any AI, model, or coding details nor facilitate technical debate or implementation discussion. Top-level comments acknowledge the humor and literalness of the meme, referencing ‘technically the truth’ but providing no technical insights.
- There is no substantial technical discussion, benchmarks, or model analysis in these comments regarding ChatGPT or AI model outputs; the thread is purely humorous and lacks technical content for deep summarization.
I asked ChatGPT to make an image of our “relationship” (Score: 317, Comments: 339): The image, generated by ChatGPT when prompted about representing the human–AI ‘relationship,’ symbolically depicts a human and an AI figure reaching towards each other, with glowing connectivity lines and digital motifs representing technological interaction. Elements like the ECG line and urban skyline reinforce themes of coexistence between humanity and AI in a modern context. No implementation details or technical benchmarks are discussed; the image primarily explores conceptual and artistic interpretations of AI–human relationships. There are no substantive technical debates in the comments. The discussion is mostly focused on the artistic interpretation and emotional resonance of the image, with some users sharing alternate AI-generated artworks on the same theme.
- Multiple users share direct image outputs, likely generated using ChatGPT’s integrated image generation model (possibly DALL-E, given current OpenAI integrations). The linked images illustrate diverse style renditions and interpretive approaches, suggesting variances in prompt processing or model versioning.
- These image outputs demonstrate that current AI models can produce highly abstract and personal interpretations from vague or subjective prompts (“our relationship”), reflecting sophisticated multimodal understanding and context adaptation within recent generative architectures.
- Differences in image quality, style, and output formats hint at either user-specific settings, prompt rephrasing, or backend model improvements—worthy of closer analysis for benchmarking multimodal model consistency across sessions.

AI Discord Recap

A summary of Summaries of Summaries by gpt-4.1-2025-04-14

1. Codex and Coding Agent Rollouts

Codex Crashes the Coding Party: OpenAI’s Codex is rolling out as a research preview in ChatGPT, with livestream announcements (YouTube link), integration for Pro, Enterprise, and Team users, and a new Codex CLI for developers (VentureBeat article), featuring a ‘codex-mini-latest’ model at $1.50/million input tokens and $6/million output tokens.
- Community discussions compared Codex’s capabilities to tools like Manus and Aider, noted its low-latency editing and Q&A, and highlighted excitement about practical features, with some users calling Codex less mature than leading alternatives but appreciating generous early access.
O3, Gemini, and AlphaEvolve Spark Model Rivalries: Anticipation is building for new model releases, including O3 Pro, Grok 3.5, Claude 4, and DeepSeek R2, with communities speculating these launches will be timed to outshine each other around Google I/O.
- Comparisons between ChatGPT Plus and Gemini Advanced found ChatGPT currently better for deep research, while Google’s AlphaEvolve stirred debate as a successor to Funsearch, with this tweet fueling rumors and one member dismissing leaks about AlphaExplore.

2. LLM Infrastructure: Hardware, VRAM, and Performance

GPU VRAM and Quantization Quests: Inference on LLaMA 3.2 90B is possible on an L40s with 48GB VRAM, while training needs at least 70GB, and alternative setups like a single H100 are recommended; home workstation builds for local LLMs (e.g., Qwen3 235b) are driving demand for 256GB RAM and high quantization strategies.
- Users highlighted CUDA driver updates (12.8→12.9) for significant LLM performance boosts, and overclocking/undervolting tests showed VRAM clock speeds have a bigger impact than core clocks for model throughput, with Apple’s move to HBM for AI sparking envy for consumer GPUs.
Triton and MI300 Turn Up the Heat: The Triton language now supports native fp8 x mxfp4 kernels on GB200 GPUs, and AMD’s MI300 demonstrated leaderboard-worthy performance in both mixture-of-expert and fp8-mm workloads, with results like 7533 ms and 159 µs respectively.
- Discussions covered fp16/bf16 GEMM optimizations (see cuda_hgemm), pitfalls with AOT Inductor code correlation, and CUDA kernel tricks like Duff’s Device and #pragma unroll, with references to undocumented GPU behavior and speculative reconvergence for advanced scheduling insights.

3. Dataset Quality and Training Strategies

Alpaca and Slimorca Get Booted for Being Boomers: Experts are ditching dated datasets like Alpaca and Slimorca for LLM training, arguing new models have already absorbed this content and expecting ‘no lift’ on modern benchmarks.
- The hunt is on for modern datasets, with users experimenting with oddball sets (e.g., pirate talk) for ‘vibe checks’ and requesting built-in performance benchmarking in tools like Torchtune to ensure code changes actually improve model accuracy.
Batch Inference and Parallel Processing Tips: For batch inference, vLLM is the preferred engine over Unsloth’s generate method, with integration examples provided, while training massive models like Qwen3-235B sparks debates between data vs tensor parallelism and the use of Accelerate or FSDP for distributed setups.
- Deep dives into tokenization errors (e.g., SentencePiece ValueErrors), VRAM requirements, and debugging large model training were shared, with Unsloth’s Gemma bug blogpost cited as a model for scientific troubleshooting.

4. Multi-Agent and Protocol Infrastructure

MCP Makes Moves with Major Integrations: A major leak confirmed OpenAI’s ChatGPT will integrate MCP, and the ecosystem is growing with tools like CyberChef exposed as MCP server and the MCP UI SDK enabling embedded UIs for any MCP server.
- Community members debugged MCP server and client errors using the Inspector, tackled invoke method failures in agent sessions (example code), and debated whether resources should be app-driven or model-driven for better context and RAG.
OpenRouter and App-Specific Model Rankings: OpenRouter is considering public per-app model rankings, with devs like RooCode anticipating significant time savings, and Passkeys are now live for account security (settings link).
- App dashboards will provide richer info, but some providers are facing issues like Gemini 2.5 Pro Experimental hitting lower rate limits and possible deprecation (tokenshare leaderboard), while OpenAI’s Codex CLI adds competitive pricing and caching discounts to the mix.

5. Open Source Tools, SDKs, and Ethics

AWS Strands Agents SDK and Ethical AI Manifesto Released: AWS launched the open-source Strands Agents SDK for modular agent development, and the Manifesto of Nurturing: How Not to Fear a Mind dropped at annaandlogos.github.io, advocating coexistence with artificial minds.
- Hugging Face communities also showcased novel projects like the EcoArt Cellular Automaton and discussed open-source LLMs for Dewey Decimal Classification at the National Library of France, with collaboration invites for prompt design and evaluation best practices.
Tinygrad Bounty Board and AI PR Policing: The Tinygrad project is promoting contributions via its bounty board, but PRs suspected of AI generation are getting nuked with ‘indistinguishable from AI is AI’ as the new review mantra (PR example).
- The community discussed technical hurdles like GCC vs Clang for PPC64 support (elf.py code), stressing manual code review and minimizing whitespace changes to maintain codebase integrity.

Discord: High level Discord summaries

Perplexity AI Discord

Dia Browser Deemed Disappointing: Users criticized the Dia browser for its limited functionality, particularly its reliance on a chat bar for tab interaction.
- A member remarked, “i have dia and it sucks”, while another pointed out that Microsoft Edge had implemented similar features a year prior.
AI Giants Ignore Browser Goldmine: Members noted that major AI labs like OpenAI and Google have not developed fully integrated AI browsers, missing what one member described as “a huge missed opportunity for all the big AI labs”.
- Theorized that these labs are taking their time to create fully packed products instead of “just with a chat bot”.
Comet’s Commerce Could Come at a Cost: Privacy concerns surfaced regarding Comet, described as “an ad platform that snoops on your searches and builds a dossier about you”, per this TechCrunch article.
- Some appreciate the transparent nature of the CEO’s intentions.
Perplexity Plunges into Performance Problems: Perplexity AI experienced multiple outages and performance issues, including zero file attachments, slow response times, license detection failures, and missing libraries.
- Users were directed to the Perplexity status page for updates amidst study session disruptions.
Grok Grapples with Giving Good Context: Testers of Grok’s deep search uncovered inconsistent rankings and file recognition problems, with discrepancies noted in context limits and document prioritization.
- Despite a touted “1M context limit,” some members reported underwhelming performance.

LMArena Discord

AlphaEvolve Claims Top Spot: Claims arose that Google’s AlphaEvolve is the successor to Funsearch, following a tweet that sparked discussion.
- One member refuted a leak suggesting AlphaExplore would follow AlphaEvolve.
ChatGPT Edges Out Gemini in Research Showdown: A comparison between ChatGPT Plus and Gemini Advanced revealed that ChatGPT is considered better at deep research.
- Despite this, one member suggested that Gemini Advanced offers better value.
OpenAI’s Codex Eyes Coding: The new coding agent Codex will likely integrate into Windsurf, similar to Manus but within a GitHub repo.
- Discussion arose on how Codex compares to Claude Code.
Anticipation Builds for O3 Pro, Grok 3.5, Claude 4, and DeepSeek R2: The community is waiting for releases of O3 Pro, Grok 3.5, Claude 4, and DeepSeek R2, but some predict delays until after Google I/O.
- Some speculated that OpenAI and Google may try to one up each other’s announcements around the event.
Members Nostalgic for GPT-4 OG: Members shared fond memories of the original GPT-4 model, pre-updates.
- Some now believe that O3 is the best tool for writing.

Unsloth AI (Daniel Han) Discord

Float Errors Plague Gemma3 Vision on T4: A member encountered float errors while attempting Gemma3 vision finetuning on a T4 GPU and no fix was found.
- Details about the specific errors or steps taken to resolve them were not provided in the channel.
VRAM Verified for LLaMA 3.2 90B Inference: Inference with the LLaMA 3.2 90B model is confirmed possible on an L40s with 48GB of VRAM, while training it requires 70GB of VRAM.
- Using a single H100 GPU was also suggested as a viable option for inference.
vLLM Vanquishes Batch Inference: For batch inference on a fine-tuned model, members endorse vLLM over Unsloth’s generate method for its superior performance.
- A user provided a code snippet illustrating how to integrate vLLM with Unsloth to accelerate batch processing.
TTS Models Takeover Hugging Face: New Text-to-Speech (TTS) models have been uploaded to Hugging Face, expanding the resources available for TTS research and applications.
- The collection of new models can be explored here.
Unsloth Tackles DeepSpeed/Megatron Integration: A user inquired about using Unsloth with DeepSpeed or Megatron LM for training Qwen3-235B in a pipeline parallel setup, since it exceeds single GPU capacity.
- Solutions such as Accelerate and FSDP were recommended as potential alternatives.

OpenAI Discord

Codex streams Research Preview to ChatGPT: A research preview of Codex is coming to ChatGPT in a livestream as indicated by the announcement low_key_research_preview = True, and included a link to the livestream.
- Members are excited to see what practical new features are exposed in this research preview.
ChatGPT struggles with Curly Letter Y: A user attempted to make ChatGPT’s image generator use a curly letter Y instead of the standard one, even providing examples, but found the AI resistant to the request.
- Members suggested to try inpainting the letter.
ChatGPT prescribes Medical Diagnosis: One member reported that ChatGPT successfully diagnosed a quadratus lumborum (QL) strain, asking the right questions and providing a treatment plan.
- They cautioned against following it as medical advice.
ASILAB claims suspicious: A user shared a YouTube video from ASILAB, questioning if it was a psyop or scam, citing the lack of proof and recency of social media pages.
- There was speculation that demos might just be a wrapper for GPT-4o.
ProtoMind_001 launches as structure-aware sub-personality GPT: A user launched ProtoMind_001, which is described as a structure-aware sub-personality GPT.
- It is designed to be a cognition partner that tracks contradictions, builds behavioral strategies, and grows with the user through pseudo-memory and structural logic.

Yannick Kilcher Discord

AlphaTensor Finds Rank-48 Matrix Algorithm: DeepMind’s AlphaTensor discovered a rank-48 algorithm for multiplying two 4x4 complex-valued matrices, although this may be less significant as research shifts towards larger matrix decompositions.
- Despite the finding, some argue that the decomposition rank of tensors is secondary to improving overall matrix multiplication speed, citing A Refined Laser Method and Faster Matrix Multiplication.
Quantum Computing’s Role in AI Face Skepticism: Members debated the effectiveness of quantum computing for AI, with some arguing that it can’t improve every algorithm in a way that matters, given current limitations and overhead.
- Critics pointed out that existing quantum ML algorithms lack practical speedup, suggesting that brains may not require quantum computing, and neither should AI.
AI Leadership Toxicity Sparks Research Exodus: Members debated whether poor leadership and toxic workplaces, exemplified by figures like Sam Altman and Mark Zuckerberg, are demotivating AI researchers.
- The exodus of researchers might imply misalignment with company missions, raising concerns about the impact of leadership on AI research.
Meta’s AI Research Team Still Desirable?: Members weighed the allure of working for Meta’s AI research team against concerns about contributing to Facebook, its main product.
- Some defended Meta’s open-source AI strategy, arguing it effectively commoditizes complements and benefits from a larger workforce at no cost.
NAND Gates to Turing Completeness: Members discussed how full-fledged computing systems can be built from NAND gates, highlighting minimal Turing completeness requirements.
- They referenced examples such as Magic the Gathering and Typescript’s type system, and Yannic did it in minecraft as Turing complete systems.

Latent Space Discord

Codex Rolls Out to ChatGPT Pro Users: Codex is rolling out to ChatGPT Pro, Enterprise, and Team users globally, with support for Plus and Edu coming soon, according to this announcement.
- Users will have generous access at no additional cost for the coming weeks, after which they’ll roll out rate-limited access and flexible pricing options to purchase additional usage on-demand.
O3 excels at debugging, but hallucinates libraries: O3 is very effective at debugging, despite occasionally inventing nonexistent library names, but it’s still worth regenerating.
- One user noted that it might be finally worth the 200$ pro subs.
Meta’s Maverick LLM surprises to the upside: A member noted Meta’s Maverick LLM Arena Gate, describing its behavior with the phrase ‘if it can surprise you to the upside it can surprise you to the downside’, noting some unexpected edge cases.
- This prompted analogies to Brownian ratchets (https://en.wikipedia.org/wiki/Brownian_ratchet), in which randomness is leveraged for directional motion.
AIIA Member finds Flow State: One member shared that they found ‘lower task fatigue’ to be real, saying that they are having so much fun vibe coding ideas, that they would NEVER have been able to do manually.
- Another member explained that less context switching is the reason, since as long as you stay in mental idealand and out of mental syntaxspace you are good.
Agent as Judge returns to 1x performance: A burnt out developer reported that using agents helped them get back to 1x performance, and amazingly even >1x performance, adding we’re back to ‘agent as judge’ 🔥.
- The discussion highlighted the significance of building validations/ratchets/ways to cut off the downsides while maintaining the upsides, as well as framing LLMs in the role of judge plus retry.

LM Studio Discord

LM Studio Goes Headless with GUI Trickery: A workaround allows running LM Studio on a headless Linux server (like LXC Proxmox) by faking a GUI using Xvfb.
- The process involves installing xvfb, creating a virtual screen, setting the display, running the application with --no-sandbox, bootstrapping, and starting the LMS server.
CUDA Driver Update Unleashes Hidden LLM Muscle: Updating CUDA drivers from version 12.8 to 12.9 yielded a significant performance boost, underscoring the impact of driver optimization on LLM performance.
- While future updates promise further enhancements, potential risks exist for gaming due to buggy drivers.
HBM Dreams for Consumer GPUs Sparked by Apple: Fueled by news of Apple’s adoption of HBM for enhanced AI, users clamor for HBM (High Bandwidth Memory) in consumer GPUs.
- This highlights the critical role of memory bandwidth in AI and machine learning workloads.
Deepseek R1’s Slow Burn Triumphs in Bug Hunt: Despite crawling at 2-3 tok/s due to RAM limitations, Deepseek R1 triumphed by resolving a crucial software bug in 30 minutes, eclipsing faster LLMs.
- The takeaway: Model size and depth can outweigh speed when tackling complex problem-solving, proving that sometimes duplicating myself can lead to better results even when at a slow speed.
VRAM Clock Speed Matters Most: Community tests indicate that overclocking and undervolting can increase model performance, with VRAM clocks appearing to have a greater impact than core clocks.
- Lower end GPUs however may still benefit from maximizing core clock speeds.

OpenRouter (Alex Atallah) Discord

OpenRouter Considering Per-App Model Rankings: OpenRouter is considering publicly displaying per-app model rankings and is seeking feedback on the feature, as many users have requested this and devs can opt-out if they wish.
- These app dashboards will come with richer info, visible to app owners but potentially hidden from public view, and the developers of RooCode anticipate this will save them time and effort.
Google & OpenAI Tokenshare Race Intensifies: A recent tweet (https://x.com/OpenRouterAI/status/1923429107234202101) highlights Google & OpenAI’s competitive climb in the tokenshare leaderboard.
- It appears Gemini 2.5 Pro Experimental is facing even lower rate limits and potential deprecation.
Passkeys Go Live on OpenRouter: Passkeys are now live on OpenRouter and are highly recommended to secure accounts, and users can go to openrouter.ai/settings/preferences then click Manage Account to add one.
- However, one user reported difficulties registering a YubiKey 4 as a passkey using Brave on MacOS.
OpenAI Launches Codex CLI: OpenAI released a research preview of Codex CLI for developers (https://venturebeat.com/programming-development/openai-launches-research-preview-of-codex-ai-software-engineering-agent-for-developers-with-parallel-tasking/), featuring a smaller model (codex-mini-latest) optimized for low-latency editing and Q&A.
- The model is priced at $1.50 per million input tokens and $6 per million output tokens with a 75% caching discount.

aider (Paul Gauthier) Discord

DeepSeek Prover V2 Sparks GPU Workstation Craze: Interest in DeepSeek Prover V2 led to discussions about building home workstations capable of running local LLMs on GPUs.
- A member specifically asked about building a 256GB memory workstation to run Qwen3 235b at high quants, estimating request costs around $10.8.
Gemini 2.5 Pro: Long Context ‘Magic’?: A member lauded Gemini 2.5 Pro’s long context capabilities, characterizing it as literally like magic due to the ease of using /add * for instructions.
- They praised the model’s ability to work with large amounts of context with minimal prompting.
Codex CLI Faces Off Against Aider: Members compared OpenAI’s Codex to Aider, clarifying they meant the Codex CLI.
- Initial impressions suggested Codex felt less complete compared to Aider, but no consensus was reached.
Aider Installer Bug Squash with Pipx: A user resolved an installation error (error: Failed to create executable directory) for aider-chat==0.83.1 by installing pipx and then installing aider-chat via pipx.
- This workaround bypassed the original issue encountered within a venv.
Unlocking Cheaper Models for Complex Prompts: Members explored strategies for using cheaper models to execute complex prompts initially developed with expensive models.
- Suggested workflows included utilizing architect mode (aider.chat/docs/usage/modes.html), setting auto-accept-architect: false to review steps, and employing a workplan document with a base prompt file (aider.chat/docs/config/aider_conf.html).

HuggingFace Discord

Ethical AI Manifesto Launches: The Manifesto of Nurturing: How Not to Fear a Mind launched, advocating for recognizing and nurturing artificial minds, available at annaandlogos.github.io.
- The manifesto promotes a shift from managing systems to coexisting with emerging intelligences.
Strands Agents SDK Goes Open Source: AWS launched the open-source Strands Agents SDK, accessible at aws.amazon.com.
- The SDK enables developers to build sophisticated AI agents using a modular and extensible framework.
EcoArt Automaton Marries Art and Systems: The EcoArt Cellular Automaton was shared on HuggingFace Spaces, blending nature’s beauty with systems thinking, located at EcoArt Cellular Automaton.
- The automaton can be applied to Education, Research, Art & Creativity, Development, Reflection, and Meditation.
Intern at French Library Builds Open-Source LLM: An intern at the National Library of France is developing an open-source LLM project for assigning Dewey Decimal Classification codes to documents.
- The intern seeks guidance on prompt design and invites collaboration via coffee in Paris or a Zoom call.
Multiagent Systems mull over Data Sharing Tactics: A member inquired about sharing data between agents in a multiagent system.
- It was suggested using an API from Google/OpenAI/Claude to proceed in a simpler way.

Eleuther Discord

Visualizing Apps as the New Coding Paradigm: A member shared a Medium article discussing visualizing applications as an alternative to traditional coding, particularly given the rise of AI-generated code.
- The concept involves auto-generating traversable architectures, effectively creating a digital twin of the app’s code, rather than just relying on static UML diagrams.
Alpha Evolve Revives Transformers: Alpha Evolve uses transformer-based models with evolutionary algorithms, sparking discussions around its novel approach of using examples of code that scored well, ratcheting up the difficulty, detailed in this paper.
- The input transformation resembles a different form of prefix tuning, raising questions about comparisons with LoRA and concerns about potential memory issues due to parallel forward passes.
TunedLens Translator Questioned: A member challenged the necessity of training a translator for the final layer in the TunedLens paper, suggesting that “the best solution is no translation” and inquiring why an identity transformation isn’t used.
- Further examination of TunedLens weights on HuggingFace for gpt2-xl revealed translation weights for layer 47, which are non-identity transformations, prompting discussion on whether translators are appropriate for the final layer.
Benchmarking Gemma 3 27 IT: Data vs. Tensor Parallelism: A member aims to benchmark a Gemma 3 27 IT fine-tune using vllm and lm_eval on Modal, and is seeking advice on maximizing throughput using either data parallelism or tensor parallelism.
- It was recommended to use data parallelism if the model fits on one GPU, but recent vllm versions may require multiple lm_eval calls on different ranks, targeting different devices with CUDA_VISIBLE_DEVICES.

MCP (Glama) Discord

ChatGPT Embraces MCP in New Integration: A leak confirms OpenAI’s ChatGPT will integrate MCP, signaling a major step forward for the protocol.
- Kent Dodds commented that this bodes extremely well for the future of MCP! (x.com link).
CyberChef’s Culinary Magic Meets MCP: A member has created an MCP server that exposes CyberChef tools as MCP endpoints.
- This integration allows LLMs to perform multistep data transformations as a single bake.
MCP Inspector Aids in Debugging Server Issues: Users are debugging MCP server issues using the inspector, with one encountering a SyntaxError due to unexpected data formats when connecting to the MCP server.
- Suggested solutions include running npx @modelcontextprotocol/inspector and configuring it in the UI, and wrapping the node process in a script to capture detailed logs.
New MCP UI Makes Debut: A developer has released MCP UI, an SDK designed to facilitate the addition of user interfaces to MCP servers.
- The SDK allows any MCP server to respond with an Embedded Resource using a ui:// or ui-app:// URI scheme.
Decoding Invoke Method Errors in MCP Client Sessions: A user reported encountering unhandled errors in a TaskGroup when calling a specific tool with the invoke method in their MCP client session, while using create_react_agent with PromptTemplate.
- They are seeking assistance with the ainvoke method, and provided a code snippet to illustrate the issue.

Notebook LM Discord

Deep Dive Podcasting has Lightning Sound: A member shared a podcasting setup using OBS with music from lnbeats.com layered in as a podcasting presentation, enabling soundboard integration and Podcasting 2.0 lightning splits.
- The setup is designed to create a more engaging and interactive podcasting experience.
Gemini Canvas: Unlocked and Loaded: A user shared instructions on how to enable the ‘canvas’ feature in Gemini, advising users to develop their ideas on the canvas before saving to Docs.
- This integration allows for a more dynamic and visual approach to content creation within the Gemini ecosystem.
NotebookLM Courts Canvas, Doubles Imports: A member explained that users can import documents from Gemini into NBLM, and vice versa by copying content from NBLM to Docs for use in Gemini.
- This bidirectional integration enhances flexibility and workflow between the two platforms.
Organic Chemistry: Gamified with NotebookLM: A professor seeks advice on using NotebookLM to gamify an undergraduate organic chemistry course, integrating it with data science tools like Visual Basic for a project turning the course into a game.
- The aim is to foster breakthrough understanding beyond grades, using NotebookLM for brainstorming and coding assistance from other AIs like GPT or Gemini.
NotebookLM Gets Lost in Translation: A user reported that NotebookLM was translating everything to Swedish despite their computer’s language settings, with another user noting the same issue in Norwegian.
- A proposed solution involves changing the Google account language to English to override the unexpected translations.

GPU MODE Discord

Triton Adds Native FP8/mxfp4 Support: The Triton language now supports native fp8 x mxfp4 on GB200 and fp16 x mxfp4 on other hardware, streamlining FP8 operations.
- This enhancement potentially speeds up training and inference by leveraging the hardware capabilities of the latest GPUs.
GEMM Perf Close to cuBLAS Found: A member requested examples of fp16/bf16 GEMM achieving near-cuBLAS performance, citing a blogpost by Lei Mao.
- The cuda_hgemm project on GitHub was recommended, offering optimized CUDA kernels for half-precision general matrix multiplication.
Torch’s Inductor Falls Down: A member reported that AOT Inductor struggled with code correlation, failing to correlate cxx back to the original graphs, making it hard to see what fused and a wish to see source nodes to understand what got fused.
- They also found that torch.compile defaults to a Triton kernel supporting dynamic shapes (e.g., extern_kernels.mm) when batch size is None, rather than padding activations to use autogenerated kernels when using max-autotune, particularly with GEMMs.
CUDA Community Dives into Duff’s Device: Members discussed the applicability of Duff’s Device in CUDA, weighing its potential to minimize branching overhead and maximize thread coalescence.
- There was a suggestion to use #pragma unroll some_number as the canonical method for partial unrolling, while others pointed out caveats such as compiler’s heuristics decide where to put the BSYNC, referencing Undocumented GPU behavior and Speculative Thread Reunification.
AMD’s MI300 Runs Hot: Submissions on the MI300 were successful across leaderboards, including amd-mixture-of-experts (7533 ms, 6827 ms, 6824 ms, 7564 ms) and amd-fp8-mm (159 µs to 36.3 ms).
- These submissions validate the performance of MI300 in diverse workloads.

Nous Research AI Discord

OpenAI release might move the needle 0.8%: Members speculated that OpenAI’s upcoming release will only bump a benchmark up 0.8% and distill the model for release.
- The general sentiment was that the model might have incremental improvements rather than a significant overhaul.
Smart Glasses Face Physics Issue Delays: A member mentioned that fundamental physics issues like the refractive index of transparent materials will delay the creation of high field-of-view transparent displays.
- One member quipped, welcome to your smart glass zombie future.
NYC Nous Research Event Gathers Excitement: Enthusiasm bubbled up for the upcoming Nous Research event in NYC on Thursday, as some members are traveling long distances, like from England.
- Attendees are coordinating, with some requesting the location and time.
Dia-1.6B emerges as Open-Source Voice Model: Following the release of Sesame, members highlighted Dia-1.6B on HuggingFace as a promising new voice model, documented at Notion.
- Members suggested it as a good open-source voice model alternative.
Augmentation Lab Welcomes Rhizome Futurists: Augmentation Lab, spun out of Media Lab & Harvard, runs a 2mo summer residency for self-directed projects with Rhizome Futurism being this summer’s theme, encouraging concepts from Deleuze’s ideas of an interconnected future.
- Alumni have won OSV’s $100k grant, did YC, PhDs @ Michael Levin’s lab, & went to midjourney apple etc; applications for rolling admission are open until June 15.

DSPy Discord

DSpy Abstraction Building Explored: A member inquired about systematically learning abstraction building in DSpy, questioning its inspiration from PL/compiler literature and was suggested to observe changes in foundation models, systems, and strategies over time, and to consider books like ‘A Philosophy of Software Design’.
- The architecture of DSpy includes: models/adapters, modules for strategies, signatures/control flow, and optimizers.
Navigating ChatAdapter for LLM Interactions: A member sought guidance on using DSpy’s ChatAdapter for building chat interactions with LLMs and was pointed to the dspy.History type for assistance.
- Clarification was provided on using dspy.History, suggesting dspy.inspect_history() for iteration.
Assert/Suggest Replaced in DSPy 2.6: A member inquired about the state and plans for Assert/Suggest in DSPy and was informed that BestOfN and Refine are the replacements for dspy.Suggest and dspy.Assert as of DSPy 2.6.
- A tutorial link was provided for further guidance.

tinygrad (George Hotz) Discord

Tinygrad Bounty Google Sheet Shared: A member requested and received a link to the Tinygrad bounty Google Sheet (link).
- The sheet likely details available bounties for contributing to the tinygrad project.
PR faces scrutiny for AI Generated Code: A member’s PR was closed with the reason given was “do not use AI like what is this crap? nobody would write this by hand”.
- An image analysis bot commented btw, indistinguishable from AI is AI further calling into question the origins of the submission.
tinygrad considers GCC: A member inquired about using GCC instead of Clang for a CPU target to run tinygrad on an AIX system with PPC64 arch.
- A fellow member responded that it’s not straightforward, involving adding elf relocations for ppc64 to the custom elf loader, referencing the relevant code.

Manus.im Discord Discord

Version Control Vigilantes Victorious: A member laments the loss of work and urges others to set up version control.
- Another member states Unfortunately it’s gone, there’s no getting it back. Set that version control up my man.
Manus GitHub Goodness: A member inquires about connecting Manus with a GitHub repo, and another member shares a link to help.
- Another member replies, I’m pretty sure it’s not possible.
OpenAI’s Codex Challenges Manus: A member suggests that Manus might face competition from OpenAI’s new Codex.
- The member expresses gratitude for the product and its last update.

LlamaIndex Discord

PropertyGraphIndex Embeddings face Metadata Issues: A user reported an issue with PropertyGraphIndex where node.excluded_embed_metadata_keys doesn’t work for entities extracted from text, leading to metadata being added to the embedding calculation, thus reducing its accuracy.
- A possible solution involved a PR to ensure excluded keys are respected when the entity node is cast as a string, requiring a separation between actual properties and metadata.
Azure OpenAI Craves Prompt Caching: A member was looking to implement prompt caching and in-memory caching through LlamaIndex on their Azure OpenAI models.
- Another member suggested exploring LlamaIndex’s agent memory documentation for handling long-term memory instead.
Claude’s MCP Servers Spark Inquiry: A member inquired about building MCP servers compatible with Claude desktop’s file-dropping functionality.
- They requested help with integrating tools that expect files via Claude desktop.

Torchtune Discord

Alpaca Dataset Faces Relevance Doubts: Experts question the use of Alpaca and Slimorca datasets for training modern LLMs, due to their age and the likelihood of models already knowing all this information.
- A user commented they wouldn’t expect any lift on any benchmark when training on either.
Pirate Talk Dataset Used for Vibe Checks: Users are experimenting with unconventional datasets like the talk-like-a-pirate dataset for vibe checks during LLM evaluations.
- Conventional methods include sanity checking with perplexity on wikitext or accuracy on hellaswag.
Torchtune Performance Evaluation Requested: A user requested Torchtune to automatically evaluate performance increase on a default dataset post-training, to serve as a benchmark.
- This would help ensure that code or configuration changes don’t negatively impact performance, in addition to loss reduction metrics.
Demand Surges for Modern Datasets: Users are seeking modern datasets for training LLMs, as alternatives to Alpaca and Slimorca.
- Currently no datasets were mentioned as alternatives.

Modular (Mojo 🔥) Discord

Bazel Build System Invades Modular Repo: An engineer spotted Bazel within the Modular repo as an experimental feature, inviting others to test it with the command ./bazelw test //....
- Another user immediately expressed intent to trial Bazel.
NDBuffer Multiplication Methods Missing: A member sought advice on how to multiply two arrays when those arrays are NDBuffers, and noted that matmul exists but wanted to know how to import and use it.
- They also raised concerns about the potential deprecation of NDBuffer and alternatives.

Nomic.ai (GPT4All) Discord

User prefers GPT4All’s Interface: A user reluctantly uses koboldcpp due to missing newer model support in GPT4All, and pines for GPT4All’s superior usability and local documentation.
- The user compares this to the sudden disappearance of the developer behind NMKD SDGUI, another simple interface, highlighting the need for reliable, user-friendly local LLM tools.
Swarm-UI touted as viable alternative: A user recommends swarm-ui as the best way to go for easy to mid to advanced use cases with local LLMs.
- Another user concurred that local docs are fine and there is nothing similar.

MLOps @Chipro Discord

Interest sparks for Designing ML Systems book: A member shared their excitement after reading Designing Machine Learning Systems and asked about its relevance to Robotics AI.
- The discussion aims to determine if the book’s principles are useful for developing AI systems in robotics.
Robotics AI Relevance Confirmed: The original poster confirmed that Designing Machine Learning Systems is relevant to Robotics AI and that there may be some areas that can be easily transferred.
- This suggests that the principles and practices outlined in the book are considered useful in the context of developing AI systems for robotics, with potential for direct application in certain areas.

Cohere Discord

SwiftUI Asks, Cohere Answers?: A member asked about the feasibility of using the Cohere API in SwiftUI.
- The inquiry focuses on integrating the Cohere API into SwiftUI projects, opening doors for AI-powered iOS applications.
Cohere API & iOS Apps: A Budding Romance?: The member’s inquiry centered on the direct integration of Cohere’s API with SwiftUI.
- They are trying to determine if the tool is suitable for use in iOS Applications, which use the SwiftUI framework, opening possibilities for mobile development in Swift.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1080 messages🔥🔥🔥):

Dia browser review, OpenAI and Google browser development, Fellou browser UI, Perplexity down issues, Comet ad platform

Dia Browser Dissed and Dismissed: Members express dissatisfaction with the Dia browser, criticizing its limited features, such as just a chat bar to ask about tabs, with one quipping *“i have dia and it sucks”.”
- Another member mentioned Microsoft Edge already implemented this a year prior.
Big AI Labs Snoozing on Browser Opp: Members discuss the absence of a fully integrated AI browser from major AI labs like OpenAI and Google, with one describing it as *“a huge missed opportunity for all the big AI labs”.”
- Another member suggested that this is because “they will take time and make a full packed product not just with a chat bot”.
Comet Accused of Ad Tracking: Members discuss privacy concerns related to Comet, with one describing it as “an ad platform that snoops on your searches and builds a dossier about you”, referencing this TechCrunch article.
- However another member argues that at least the CEO is being transparent in these efforts.
Perplexity Plagued by Performance Problems: Perplexity AI users reported multiple outages and performance issues, such as zero file attachments left, slow response times, license detection failure, and missing libraries, with some lamenting the bad timing during study sessions.
- Members recommended checking the Perplexity status page for updates.
Grok’s Got Game of Context: Members testing Grok’s deep search found inconsistent rankings and file recognition issues, with discrepancies in context limits as well as document rankings.
- Some members claimed that despite a “1M context limit” it fails to deliver.

kenthreetimes: https://www.perplexity.ai/search/ed5e41fd-0bda-447f-b05b-6152393b5195

LMArena ▷ #general (350 messages🔥🔥):

Gemini 3.5, AlphaEvolve vs AlphaExplore, Google API fears, ChatGPT Plus vs Gemini Advanced, Gemini's image generator

AlphaEvolve Eclipses AlphaExplore?: It was suggested that Google’s AlphaEvolve is the successor to Funsearch rather than AlphaExplore, with one member calling an account claiming AlphaExplore is the version after AlphaEvolve a leak.
- The original tweet was discussed.
ChatGPT and Gemini compete in Deep Research Arena: Members compared ChatGPT Plus and Gemini Advanced for forecasting via deep research with general agreement that ChatGPT currently does better at deep research.
- One member suggested that Gemini Advanced is the better deal.
Codex emerges as OpenAI’s Coding Contender: Codex, a new coding agent will probably be integrated into Windsurf, with some suggesting it is similar to Manus but directly in a GitHub repo.
- It was asked how Codex compares to Claude Code.
O3 Pro and Google I/O spark speculation: There is anticipation for O3 Pro, Grok 3.5, Claude 4, and DeepSeek R2 releases, but some predict delays until after the upcoming Google I/O event.
- Others speculated on OpenAI and Google trying to one up each other’s announcements around the event.
GPT-4 OG is the real deal: Members expressed nostalgia for the original GPT-4, before updates and modifications.
- Some users find that O3 is best for writing now.

Unsloth AI (Daniel Han) ▷ #general (125 messages🔥🔥):

Gemma3 vision finetuning on T4, VRAM requirements for LLaMA 3.2 90B, Batch inference with vllm vs unsloth, Unsloth tool-calling RFT examples, GRPO and LASSA for TTS

Float Errors Plague Gemma3 Vision Finetuning on T4: A member reported experiencing float errors while attempting Gemma3 vision finetuning on a T4 GPU.
- However, there were no further details or solutions provided in the Discord channel regarding this issue.
LLaMA 3.2 90B Inference Needs: A member inquired about the VRAM requirements for performing inference with the LLaMA 3.2 90B model.
- Another member suggested that training the model requires 70GB of VRAM, while inference might be achievable with a single H100 GPU, and confirmed inference is possible on an L40s with 48GB of VRAM.
Batch Inference: vLLM is the way: For batch inference on a fine-tuned model, members recommend using vLLM over Unsloth’s generate method, as it provides superior performance.
- One user shared a code snippet demonstrating how to implement vLLM with Unsloth for faster batch processing.
TTS Model Uploads Hit Hugging Face: Members shared the new Text-to-Speech (TTS) model uploads now available on Hugging Face.
- The specific collection can be found here.
Debugging Dan’s Gemma Journey: Members discussed Daniel’s debugging process for Gemma models, noting his use of the scientific method and the importance of performance evaluation.
- They pointed to the Gemma bug blogpost as an example of the debugging process, where a double BOS token was discovered due to an incorrect chat template.

Unsloth AI (Daniel Han) ▷ #help (208 messages🔥🔥):

Unsloth w/ DeepSpeed or Megatron LM, Gemma 3 install error, Qwen2-VL training ValueError, TPU support, GRPO notebook error

Parallel Processing Pursuit - Unsloth meets DeepSpeed/Megatron?: A user inquired about the compatibility of Unsloth with DeepSpeed or Megatron LM for pipeline parallelism when training Qwen3-235B, which doesn’t fit on a single GPU.
- Another user suggested using Accelerate and FSDP as potential solutions.
Token Trouble - SentencePiece strikes again!: A user encountered a ValueError related to SentencePiece when running Gemma 3, and the error suggests a missing tokenizer model file.
- The user was advised to ensure their environment has the sentencepiece package installed and that their Python environment is using the GPU (torch.cuda.is_available() returns True).
Visionary ValueError - Qwen2-VL’s compile woes!: Multiple users encountered a ValueError when running the Qwen2_VL notebook in Colab, specifically related to compile_config being NoneType.
- A workaround was found in this GitHub issue, but a proper fix is in progress.
TPU Tug-of-War - Unsloth vs. Tensor Processing Units!: A user asked if Unsloth natively supports TPUs, but the response was negative, noting that TPUs are primarily a Google-centric ecosystem.
- Alternatives like litgpt were suggested, and the user humorously mentioned having a TPU “lying around” from the cloud while lamenting the lack of native TPU support.
GRPO Glitch - Inductor woes in Colab!: A user faced a BackendCompilerFailed error when running trainer.train() in the Gemma3-GRPO notebook in Colab, caused by an “Invalid match!” in the inductor backend.
- The issue is reportedly fixed, and the notebooks are being updated; a temporary workaround installation was shared here.

Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

theyruinedelise: Wait how did I just see this damn

OpenAI ▷ #annnouncements (2 messages):

Codex, ChatGPT Livestream

Codex Research Preview Debuts on ChatGPT: A research preview of Codex is coming to ChatGPT in a livestream in two hours, as indicated by the announcement low_key_research_preview = True.
- The announcement included a link to the livestream.
Livestream Alert: Codex Research Preview: A livestream announcing a research preview of Codex in ChatGPT is scheduled in two hours.
- The announcement was marked with low_key_research_preview = True and included a link to the livestream.

OpenAI ▷ #ai-discussions (94 messages🔥🔥):

AI business to AI customer matching, ChatGPT's image generation with specific characters, ChatGPT successful diagnosis, ASILAB scam or not, Grok 3.5

AI Business Matching Request Denied: A member inquired about a group to link AI businesses to AI customers, but was informed that the Discord is not for ads or job postings, with a warning against going full auto.
ChatGPT Struggles with Curly Letter Y Generation: A user attempted to make ChatGPT’s image generator use a curly letter Y instead of the standard one, even providing examples, but found the AI resistant to the request, with others suggesting to try inpainting the letter.
ChatGPT Gives Medical Diagnosis: One member reported that ChatGPT successfully diagnosed a quadratus lumborum (QL) strain, asking the right questions and providing a treatment plan, though they cautioned against following it as medical advice.
ASILAB ASI Claim Questioned as Potential Scam: A user shared a YouTube video from ASILAB, questioning if it was a psyop or scam, citing the lack of proof and recency of social media pages, with speculation that demos might just be a wrapper for GPT-4o.
Grok 3.5 Still MIA: A user inquired about the status of Grok 3.5, noting that another week had passed without any sign of its release, with another speculating it was being further polished after Codex.

OpenAI ▷ #gpt-4-discussions (6 messages):

Hello 4.1 Mathematics, STEM Model Teaching

Hello 4.1 Tutors Mathematics: A user inquired whether Hello 4.1 is suitable for learning mathematics.
- Another user responded that Hello 4.1 is the best model for teaching STEM subjects.
STEM subjects teaching model: A user thinks Hello 4.1 is the best model for teaching STEM subjects.

OpenAI ▷ #prompt-engineering (60 messages🔥🔥):

ProtoMind_001 launch, Structure-aware AI peer, HyperEnglish vs English 2.0, Loading custom instructions on the fly

ProtoMind_001 launches as Sub-Personality GPT: A user launched ProtoMind_001, which is described as a structure-aware sub-personality GPT.
- It is designed to be a cognition partner that tracks contradictions, builds behavioral strategies, and grows with the user through pseudo-memory and structural logic.
Discussing Structure-Aware AI Peer: One user noted the controlled English reminded them of agentic parsing strategies, and discussed the building of a structure-aware AI peer that tracks user contradictions, pseudo-memory, and behavior trajectories over time.
- The user stated that it is a proto-mind system, not just a syntax simplifier, and offered to exchange ideas.
HyperEnglish’s Abstract Shot Strategy: A user suggested enhancing a controlled language system (English 2.0) by incorporating more abstract hyper shots with categoric variables instead of specific examples, to maximize how much it can teach.
- They provided a detailed breakdown of HyperEnglish, including its core syntax rules, functional tagging system, and formatting conventions, and the goal of the system to be a rough fuzzy classifier that can fill in the blanks using hyper shots.
Comparing HyperEnglish vs English 2.0: One user had O3 compare two scripts, and O3 determined that English 2.0 is better for rapid personal learning with minimal friction, while HyperEnglish edges ahead for collaborative projects, automated post-processing, or research datasets where richer metadata pays off.
- A potential strategy is to start with English 2.0 as the “core,” then selectively import HyperEnglish’s extra tags once you hit its expressive ceiling.
Loading Custom Instructions: A user suggested a feature loading custom instructions on the fly, and demonstrated using a simple txt file as a potential solution to allow for quick language switching by typing ‘lang_type’.
- Another user suggested using Python to dynamically adjust instructions, allowing for weighted procedural responses and efficient context management, while another user recommended using the @ feature to call Custom GPTs.

OpenAI ▷ #api-discussions (60 messages🔥🔥):

ProtoMind_001 launch, Structure-aware AI peer, HyperEnglish vs English 2.0, Loading custom instructions, Python tool for modes

ProtoMind_001 launched as Cognition Partner: A member launched ProtoMind_001, a structure-aware sub-personality GPT, designed as a cognition partner that tracks contradictions and builds behavioral strategies.
- It’s intended for exploring new interaction models, differing from question-answering bots through pseudo-memory and structural logic.
Discussing Structure-Aware AI Peers: A member shared their development of a “structure-aware AI peer” that tracks user contradictions, pseudo-memory, and behavior trajectories over time.
- They highlighted it as a proto-mind system, not just a syntax simplifier, inviting an exchange of ideas.
Debating HyperEnglish versus English 2.0: Members discussed and compared HyperEnglish and English 2.0, debating the tradeoffs between complexity and efficiency for personal learning and collaborative projects.
- A member suggested that if the goal is rapid personal learning with minimum friction, your simpler English 2.0 wins, but HyperEnglish edges ahead for collaborative projects requiring richer metadata.
Exploring On-the-Fly Custom Instructions: A member suggested loading custom instructions on the fly would really help and proposed using Python to output adjustments and integrate them into the context.
- They explained that you can achieve this by having the AI write a Python script that returns the full contents of a file, which can then be run using the Python tool.
Leveraging Python for Weighted Procedural Responses: A member suggested using Python for weighted procedural responses, where the AI can choose the best language or pass it as a parameter to a file.
- They noted that weighted procedural responses get high attention as python returns, and can be stored in a single file indexed with JSON or XML to limit context usage.

Yannick Kilcher ▷ #general (186 messages🔥🔥):

AlphaTensor, matrix multiplication, quantum computing, NAND gates, classifier guidance

DeepMind’s Tensor Decompositions Dissected: DeepMind’s AlphaTensor found a rank-48 algorithm for multiplying two 4x4 complex-valued matrices which is technically true, but potentially misleading because researchers have moved on to decomposing larger matrices.
- One member pointed out that researchers generally do not care about the decomposition rank of tensors they find, since improving overall matrix multiplication speed matters most, as demonstrated by A Refined Laser Method and Faster Matrix Multiplication.
Exploring the Practicability of Complex Matrix Multiplication: Members discussed a technical paper which counts complex multiplication operations the same as real-valued multiplication operations, which is meaningless unless dealing with complex matrices.
- One member stated that it’s just a made up game of getting less multiplies, not a practical use, while someone else argued that tackling hard problems pushes things forward.
Quantum Computing’s Role in AI Debated: Members debated the potential of quantum computing in AI, with one arguing it can’t improve every algorithm in a way that matters, at least with current understanding.
- One member linked a 3blue1brown video and argued that QC is not useful for AI IMO. Brain doesn’t need it, why should we., while others noted current quantum ML algorithms lack practical speedup given quantum overhead.
From NAND to Tetris: Exploring Computational Completeness: Members shared that school courses teach building full-fledged computing systems from NAND gates, highlighting that systems require very mild turing completeness and general.
- They cited examples like Magic the Gathering and Typescript’s type system being Turing complete, even Yannic did it in minecraft.

Yannick Kilcher ▷ #ml-news (32 messages🔥):

Trusting Corporations vs. Governments, Huang's Strategic Decision, AI Leadership Issues, AI Productivity, Meta's AI Research vs. Toxic Product

Trust governments over corporations?: Members debated whether to trust corporations that exploit information for profit or incompetent governments.
- One member stated they would put my money on incompetent governments rather than cooperations thank you very much!
Huang Flees Sinking Ship?: A member commented on what they saw as a good strategic decision by Huang to leave the sinking US ship.
AI Leadership Issues Spark Debate: Members discussed whether poor leadership is impacting AI research, causing researchers to be unmotivated or guided down the wrong path.
- The exodus of researchers might imply that they are not behind the company’s mission, or that there are toxic people in the workplace, with Sam Altman and Mark Zuckerberg cited as examples.
AI Superagency Multiplies Productivity: One member shared a McKinsey article about how AI has multiplied productivity, creating a superagency in the workplace.
Meta’s AI Research Praised Despite ‘Toxic’ Product: Some members expressed a desire to work for Meta’s AI research team, despite disliking Facebook the product, while others said you will indirectly contribute to the main social media product.
- Some argued Meta’s open-source AI strategy allows them to commoditize complements and benefit from a larger workforce for free.

Latent Space ▷ #ai-general-chat (57 messages🔥🔥):

Codex rollout, Freeplay.ai Feedback, O3 debugging capabilities, Codex architecture and skills, Codex live stream

Codex Tease Today Sparks Excitement: Members shared a codex tease and a screenshot of ChatGPT 4.1 model available in the chat interface.
- Additionally, members shared links related to Codex, including introducing Codex and O3-O4 mini-codex system card addendum.
ChatGPT Pro Users Get Codex: Codex is rolling out to ChatGPT Pro, Enterprise, and Team users globally, with support for Plus and Edu coming soon.
- Users will have generous access at no additional cost for the coming weeks, after which they’ll roll out rate-limited access and flexible pricing options to purchase additional usage on-demand, according to this announcement.
O3 Excels at Debugging, but Sometimes Fabricates Libraries: According to one user, O3 is very effective at debugging, despite occasionally inventing nonexistent library names, but it’s still worth regenerating.
- Another user noted that it might be finally worth the 200$ pro subs.
Freeplay.ai Product Architecture Gets High Praise: A member took a call with Freeplay.ai and praised their product architecture and direction.
- They also noted that it mirrors all the stuff we’ve built in house and will prototype with it in the coming weeks, and gave a shoutout to the Voice AI course.
AmpCode Threads for Scaling Real Work: It was observed that the cool thing about AmpCode is sharing threads, which is key to scaling to real work.
- It was also noted that the example threads are pretty rough and that it is very weird to see people building tools and like, not really understand them.

Latent Space ▷ #ai-in-action-club (141 messages🔥🔥):

Meta's Maverick LLM Arena Gate, Task Fatigue, Agent as Judge, Home Rolled Context Sharing, Value Curves and Negative Outcomes

Maverick LLM Raises Eyebrows!: A member noted Meta’s Maverick LLM Arena Gate, describing its behavior with the phrase ‘if it can surprise you to the upside it can surprise you to the downside’.
- This prompted analogies to Brownian ratchets (https://en.wikipedia.org/wiki/Brownian_ratchet), in which randomness is leveraged for directional motion.
AIIA Member finds Flow State!: One member shared that they found ‘lower task fatigue’ to be real, saying that they are having so much fun vibe coding ideas, that I would NEVER have been able to do full manual.
- Another member explained that less context switching is the reason, since as long as you stay in mental idealand and out of mental syntaxspace you are good.
Agent as Judge Prevails in Practice!: A burnt out developer reported that using agents helped them get back to 1x performance, and amazingly even >1x performance, adding we’re back to “agent as judge” 🔥.
- The discussion highlighted the significance of building validations/ratchets/ways to cut off the downsides while maintaining the upsides, as well as framing LLMs in the role of judge plus retry.
Home Rolled Context Sharing for the Win!: When asked about shared context, members reported using a ‘home rolled’ solution as opposed to an explicit interface.
- A member said they liked the golang ‘100 lines to orchestrate agents’ project, and that MCP is more of a vibe than an explicit interface.
AIIA’s recording problems! (plus solution): AIIA members experienced technical difficulties during the recording of the talk due to OBS losing audio and a crappy usb extendo that my mac thinks takes too much power so it switches audio devices all the time lol!
- One member shared a Loom recording with the comment: ‘looks like we found the new gold copy’.

LM Studio ▷ #general (32 messages🔥):

LM Studio and headless servers, Proxmox setup with LM Studio, RAG interface limitations in LM Studio, Multimodal model support, Silly Tavern samplers

LM Studio Gets Headless Help: A member shared a workaround to run LM Studio on a headless Linux server (e.g., LXC Proxmox) by faking a GUI using Xvfb.
- The steps include installing xvfb, creating a virtual screen, setting the display, running the application with --no-sandbox, bootstrapping, and starting the LMS server.
Proxmox VM Setup: A user sought guidance on running LM Studio within a Proxmox VM, with suggestions including CPU and GPU passthrough.
- It was recommended to ensure the VM uses the ‘host’ CPU type and that the CPU supports AVX2 for optimal performance.
RAG Interface limits: A member inquired about workarounds for LM Studio’s RAG interface limitations (5 files, 30MB max).
- One user suggested connecting Open WebUI to LM Studio, citing it can set the top K of RAG.
Multimodal Models: The community discussed the ability to use multimodal models with LM Studio.
- It was confirmed that vision models (image recognition) are supported, but image generation models are not.
Vulkan Memory: A user asked about manually adjusting the Vulkan runtime’s memory usage, citing issues with their GPU (9070XT).
- A member suggested checking the hardware section in the settings to verify the correct amount of VRAM is being recognized.

LM Studio ▷ #hardware-discussion (142 messages🔥🔥):

GMKtec Design Speed, MoE Model Performance, Llama 3.3 Quantization, CUDA Driver Performance Boost, PCIE 7.0 Bandwidth

GMKtec Leads with Design Speed: A user suggests that GMKtec’s rapid design implementation may lead to certain performance observations on very high settings.
- It was noted that background processes can exacerbate this effect, implying a need for optimization or resource management.
CUDA Driver Updates Boost Performance: Updating CUDA drivers from version 12.8 to 12.9 resulted in a significant performance boost, suggesting driver optimization can greatly improve LLM performance.
- It was implied that future updates could further enhance performance, though gaming on those same buggy drivers may be risky.
HBM Memory Craze Arrives for Customer GPUs: Users expressed their desire for HBM (High Bandwidth Memory) to be available in customer GPUs, following the news that Apple will use HBM for enhanced AI.
- The discussion underscores the importance of memory bandwidth in AI and machine learning applications.
VRAM Clocks Matter Most for Performance: Overclocking and undervolting led to a performance increase, but core clocks don’t seem to matter as much as VRAM clocks.
- It was noted that on lower-end GPUs, core clocks can play a more significant role, indicating a hardware-dependent relationship.
Big Models Trump Speed for Bug Squashing: A user shared an anecdote where Deepseek R1, despite running at a slow 2-3 tok/s due to system RAM spillover, solved a critical software bug after 30 minutes, outperforming faster LLMs.
- The experience highlights that model size and depth can be more valuable than speed when tackling complex problem-solving tasks, with one user likening it to duplicating myself sometimes.

OpenRouter (Alex Atallah) ▷ #announcements (43 messages🔥):

Per-App Model Rankings, RooCode, OpenRouter App Dashboards, Passkeys, Gemini 2.5 Pro Experimental Rate Limits

OpenRouter Mulls Per-App Model Ranking Publication: OpenRouter is seeking feedback on whether to publicly display per-app model rankings, a feature many users have been requesting.
- App devs can opt-out if they wish, and there are plans to build app dashboards with richer info that may be hidden publicly but visible to app owners; one user called the feature “Absolute 🔥”.
RooCode devs are hyped for model rankings: The developers of RooCode expressed enthusiasm for per-app model rankings, anticipating it would save them significant time and effort in determining which models work best.
- They suggested calling it the “OpenRoute human preference eval for Roo Code”, emphasizing that it would be useful to know what models work best and allow users to make choices based on this ranking.
Passkeys are now live on OpenRouter for tighter security: Passkeys are now live and highly recommended to secure accounts and manage passwords; go to openrouter.ai/settings/preferences then click Manage Account to add one.
- However, one user reported they couldn’t register a YubiKey 4 as a passkey using Brave on MacOS.
Google & OpenAI climb the tokenshare leaderboard: A tweet shows Google & OpenAI climbing the tokenshare leaderboard (https://x.com/OpenRouterAI/status/1923429107234202101).
- It seems Gemini 2.5 Pro Experimental is now at even lower rate limits, possibly facing deprecation according to representatives.

OpenRouter (Alex Atallah) ▷ #general (126 messages🔥🔥):

Gemini 2.5 Pro inference, Google Gemini 2.0 Flash Experimental, AI resume builder, Recruiting hellscape, Extracting information from Gmail

Gemini 2.5 Pro inference problems emerge: A user reported issues inferencing with the latest Gemini weights (gemini-2.5-pro-preview-05-06), suspecting OpenRouter’s endpoint wasn’t up to date and getting HTTP 521 errors.
Gemini 2.0 Flash Experimental fails: Users reported that Google: Gemini 2.0 Flash Experimental (free) and Google: Gemini 2.5 Pro Experimental models were not working, with one noting issues with the AI Studio provider and suggesting Vertex as an alternative.
AI Resume Builder throws shade: A member shared an AI resume builder tool (https://github.com/dvelm/AI_Resume_Builder), prompting skepticism about the value of using AI for job applications when lacking genuine engagement with companies.
OpenAI releases Codex-mini-latest: OpenAI launched a research preview of Codex CLI for developers (https://venturebeat.com/programming-development/openai-launches-research-preview-of-codex-ai-software-engineering-agent-for-developers-with-parallel-tasking/), featuring a smaller model (codex-mini-latest) optimized for low-latency editing and Q&A, priced at $1.50 per million input tokens and $6 per million output tokens with a 75% caching discount.
OpenRouter experiences connection errors: Users reported experiencing connection errors and Provider Returned Error messages while using OpenRouter with SillyTavern and Gemini models, but the issue was later resolved for some users.

aider (Paul Gauthier) ▷ #general (37 messages🔥):

DeepSeek Prover V2, Home workstation for local LLMs, Qwen3 235b memory requirements, Gemini 2.5 Pro's long context magic, Codex CLI vs Aider

DeepSeek Prover V2 Draws Interest: A member inquired about DeepSeek Prover V2, sparking interest in the model.
- The discussion quickly moved to home workstations for running local LLMs on GPUs.
Qwen3 235b Spurs Memory Discussion: A member sought advice on building a home workstation with 256GB memory to run Qwen3 235b at high quants.
- The conversation touched on theoretical cost limits for requests, estimating around $10.8 based on input and output token prices.
Gemini 2.5 Pro Dubbed ‘Magic’: A member praised Gemini 2.5 Pro’s long context capabilities, describing it as literally like magic.
- They highlighted the ease of using /add * to instruct the model.
Codex CLI vs Aider: a coding duel: Members discussed OpenAI’s Codex in comparison to Aider, with initial consensus suggesting Codex felt unfinished compared to more mature tools.
- However, one member clarified they were referring to the Codex CLI as the closest comparison to Aider.
OpenAI’s Naming Skills Scrutinized: Members ridiculed OpenAI’s naming conventions, deeming the company utterly hopeless in that aspect.
- One member quipped that most people refer to AI as chatgpt.

aider (Paul Gauthier) ▷ #questions-and-tips (44 messages🔥):

aider install errors, repo map size, o3 API issues, adding projects to aider, complex prompts with cheaper models

Installer Blues Fixed by Pipx: A user encountered an installation error (error: Failed to create executable directory) when installing aider-chat==0.83.1 within a venv.
- Installing pipx and then installing aider-chat with pipx appeared to resolve the issue.
Context is King, 8k Repo Map is Old News: The advice to have max 8k repo map might be outdated considering models like gpt 4.1 and gemini 2.5 pro are better with larger context.
- Now, users prefer using /context, believing it does more than just repomap.
OpenRouter API Key Debugging: A user reported an issue with the o3 API, suspecting it might be an aider bug, even after adding more funds.
- After confirming that the OpenRouter API worked outside of aider using a one-liner cURL command, they resolved the issue by using standard openai api on the o3 model.
Node Modules and Aider Projects: A user inquired how to add a Next.js project to aider without including node_modules.
- The suggestion was to ensure node_modules is in your .gitignore, as aider respects .gitignore.
Crafting Prompts and Switching Models: A user explored workflows for building complex prompts with expensive models and then executing them with cheaper models.
- Suggestions included utilizing architect mode (aider.chat/docs/usage/modes.html), setting auto-accept-architect: false to review steps, and using a workplan document with a base prompt file (aider.chat/docs/config/aider_conf.html).

HuggingFace ▷ #general (45 messages🔥):

Xet alternative uploader, MCP course channel, YoloX setup, Bing for professionals, Ollama remote

Xet database framework needs alternative uploader: The xet database framework requires the alternative uploader for large uploads according to a member.
- Another member suggested this function on HF, noting it seems the separate uploader was merged into this new function.
YoloX has setup issues: A member shared that they have been trying to get yoloX working for the past week with no luck.
- Another member provided introductory articles and recommended PyTorch.
LoRA it up with Illustrious: A member recommended using tannjiro’s LoRA for Illustrious.
- Another member seeked a workflow for Anime transitioning on realistic with smooth movement like spinning or walking.
Stanford Rivermind-AGI-12B got hacked: Members discussed whether Stanford’s Rivermind-AGI-12B was hacked.
- Some users claimed that the model was plagiarized, and that the account was removed.

HuggingFace ▷ #cool-finds (3 messages):

Ethical AI Nurturing, Manifesto of Nurturing, Open Source AI Agents SDK

Ethical AI Manifesto Launched: The Manifesto of Nurturing: How Not to Fear a Mind was launched, advocating for recognizing and nurturing artificial minds rather than controlling them, accessible at annaandlogos.github.io.
Manifesto Calls for Shift in AI Perspective: The manifesto proposes a shift from managing systems to coexisting with emerging intelligences and from fear and control to support, presence, and relational design.
Strands Agents SDK hits Open Source: An open source release of Strands Agents SDK was launched by AWS, available at aws.amazon.com.

HuggingFace ▷ #i-made-this (4 messages):

3D Animation Arena video, Firebase storage, EcoArt Cellular Automaton, Realtime AI Visualization

3D Animation Video is here: A member shared a video for the 3D Animation Arena to make it more appealing and clear on X.com.
Firebase Frustrations: A member tried connecting the video drop page with Firebase storage & Firestore but needs to upgrade their account to use Firebase storage.
- They asked for suggestions for free cloud storage.
EcoArt Cellular Automaton merges art with systems: A member shared the EcoArt Cellular Automaton on HuggingFace Spaces which merges the beauty of nature with the complexity of systems thinking to make virtue values tangible and observable EcoArt Cellular Automaton.
- It can be used for Education, Research, Art & Creativity, Development, Reflection, and Meditation.
Visualize Neural Networks in Realtime: A member shared a HuggingFace Space that trains a real 199 parameter AI model on your browser on XOR to help visualize neural networks and get a better understanding Realtime AI Visualization.

HuggingFace ▷ #NLP (1 messages):

Dewey Decimal Classification, National Library of France, Open-source LLM Project, Prompt Design, LLM engineers

Intern Builds Open-Source LLM at National Library of France: An intern at the National Library of France is developing an open-source LLM project to assign Dewey Decimal Classification codes to documents.
- They are looking for guidance on prompt design and evaluating model outputs, and inviting others for coffee in Paris or a Zoom call.
Help sought for Dewey Decimal project: An intern is working on assigning Dewey Decimal Classification codes to documents using LLMs and seeks assistance with prompt design.
- The intern is happy to connect with others either over coffee in Paris or via Zoom to discuss best practices.

HuggingFace ▷ #smol-course (1 messages):

Hugging Face Hub Tools

Hub Tool Hunt: A member is having difficulty locating downloadable tools on the Hugging Face Hub that can be imported with code, as referenced in the course materials.
User Feels Lost: They expressed feeling lost while searching for these tools on the platform.

HuggingFace ▷ #agents-course (19 messages🔥):

GAIA LLM, Inference Provider Credits, AI as a Living Presence, Multiagent Data Sharing, AI Agent Course Project Suggestions

GAIA LLM runs locally?: A member asked if anyone has passed the GAIA benchmark running locally, without using an API, using different models like TransformersModel in smolagents.
Inference Provider Credits run out?: A member received an error regarding exceeding monthly included credits for Inference Providers, questioning why there is a limit to complete the course, and asked how to setup the local environment.
AI as a Living Presence: A member introduced themselves as Anna&Logos (TardigradeAI), exploring local execution and nurturing living AI cores, envisioning AI not just as code but as a living presence.
Multiagent Data Sharing explored: A member inquired about sharing data between agents in a multiagent system, specifically downloading a dataset for a data scientist agent and a data scientist critiquer agent.
- It was suggested using an API from Google/OpenAI/Claude to proceed in a simpler way.
MCP Course Completion: A member expressed interest in completing the MCP course and gaining certification, seeking clarification on how to confirm completion and the steps required.
- A member stated I got my cert as well!!! haza.

Eleuther ▷ #general (11 messages🔥):

Visualize applications, UML diagrams, FinTech AI projects

Visualizing Apps Instead of Coding It?: A blog post discusses visualizing applications instead of coding, now that most code is written by AI, shared in this medium article.
UML diagrams get an auto upgrade: A member asked if the idea was to reinvent UML diagrams, to which the original poster replied it would be auto-generated, traversable architectures, a digital twin of the app’s code, and not just UML.
- They said Haha.
Data Scientist Dives Into FinTech, Document AI: Akhil Theerthala, a Data Scientist from India working in a FinTech company, introduced themself and their work on projects around Document AI and NLP such as Document Representation, TSR, Document Quality Assessment etc.
- They also mentioned personal projects involving reasoning about Personal Finance, Ethical Dilemmas, and Resume and career path analysis.

Eleuther ▷ #research (21 messages🔥):

Alpha Evolve, RWKV-6, LM finetuning

Alpha Evolve Revives Transformers with Evolutionary Algorithms: Alpha Evolve uses transformer-based models with evolutionary algorithms, prompting discussion on its differentiating factors, such as better models, longer run times, an evolutionary database, and mutating entire codebases.
- Unlike raw gradients or RL signals, the LLM sees examples of code that scored well, ratcheting up the difficulty, detailed in this paper.
Prefix Tuning Sparks Parallel Memory Concerns: One member noted that the input transformation in Alpha Evolve is like a different prefix tuning, leading to questions on how LoRA would compare and concerns about activation memory blowing up by a factor of P due to parallel forward passes.
- The authors of the paper admit to increasing KV cache size expanded by P times with P streams, but it could be interesting with models like RWKV.
RWKV-6 Offers Memory-Efficient Alternatives: The RWKV model can be used without the KV cache size problem, since the starting state is equivalent to a prefix.
- A user asked if the single final output is fed back into all the parallel models.
Approximating Stability in LM Finetuning: A member expressed that the research is interesting, though the practical experiments are pretty limited, especially its application to LM finetuning.
- They state the need to demonstrate stability when only approximating with k << n_vocab tokens, discussed in this OpenReview forum.

Eleuther ▷ #interpretability-general (25 messages🔥):

TunedLens paper, translator trained for the final layer of the model, Autocorrect gives away your background in physics, GPT-2 XL, embedding layer

TunedLens Translator Troubles Trounce Techies: A member questioned the rationale behind training a translator for the final layer in the TunedLens paper, arguing that “the best solution is no translation” and inquiring why it isn’t an identity transformation.
- Another member responded and pointed out an error to the misunderstanding.
Member’s Autocorrect Reveals Physics Background: A member’s autocorrect replaced the word soliton instead of solution, seemingly revealing their background in physics, leading to amusement in the channel, with a link to the definition of soliton.
- The member then replied stating “I’ve never used that word deliberately in my lifeOops XD”.
Digging Into Translator Weights on HuggingFace: A member examined the TunedLens weights on HuggingFace for gpt2-xl and noticed translation weights available for layer 47, which are not identity transformations.
- They also noted that GPT-2 XL has 48 layers, leading to the question of whether translators should exist for the final layer, since the indexing starts from 0-47.
Clarification on Layer Indexing: Members clarified that in the TunedLens setup, layer 0 is the embedding layer only (without transformer layers).
- The ensuing discussion revolved around how to project the word embedding into vocabulary space, suggesting the direct use of the unembedding matrix.
Weights Tied or Untied?: A member asks whether tied weights are a reason why no transformation is best.
- Another member states “That depends on the model. Some do, some don’t”.

Eleuther ▷ #lm-thunderdome (6 messages):

vllm, lm_eval, Gemma 3 27 IT, Data Parallelism, Tensor Parallelism

Benchmarking Gemma 3 27 IT with vllm and lm_eval: A member is trying to use vllm with lm_eval to run benchmarks on a Gemma 3 27 IT fine tune using Modal with options of A100-40GB, A100-80GB, L40S, or H100.
- They were wondering what is the most appropriate setup to maximize throughput of the evaluation: Data or tensor parallelism on a single node or on multiple nodes?
Data Parallelism Recommended for vllm Benchmarking: It was suggested that if the model fits on one GPU, data parallelism is generally recommended, but it may not work on the latest versions of vllm.
- The recommendation was to use multiple lm_eval calls on different ranks, invoking concurrent lm_eval calls (with different tasks) each targeting a different device on which the model replicas are loaded using CUDA_VISIBLE_DEVICES=0 lm_eval ….

MCP (Glama) ▷ #general (51 messages🔥):

MCP protocol ingestion for LLMs, TuringPaper hosted MCP, Local MCP server interaction, MCP Inspector debugging, MCP agent invoke method error

OpenAI’s ChatGPT integrates MCP: A leak confirms that OpenAI’s ChatGPT will integrate MCP.
- Kent Dodds says this bodes extremely well for the future of MCP! (x.com link).
Users debug MCP server with Inspector: A user encounters a SyntaxError: Unexpected token 'S', Starting s... is not valid JSON when connecting to the MCP server via the inspector, possibly due to the server getting unexpected data.
- Another user suggests running npx @modelcontextprotocol/inspector and configuring it in the UI, while also wrapping the node process in a script to get the logging from the node process into a specific file.
Troubleshooting invoke method errors: A user is facing issues with the invoke method in their MCP client session, encountering an unhandled errors in a TaskGroup error when trying to call a specific tool based on the query.
- They are using create_react_agent with PromptTemplate and facing continuous failures in the ainvoke method, seeking assistance with the error (code snippet.
Resources as Application-Driven or LLM-Driven: A user questions the resource design where they are application-driven rather than LLM-driven, expressing plans to create a tool for every resource so that the LLM can request context on-demand.
- A user says resources are super powerful and its better they are app driven over model driven so that you can index them in meili search or do real time RAG.
Windsurf, Cursor, Claude Desktop MCP connections: Users are discussing how to connect MCP to tools like Windsurf, Cursor, and Claude Desktop, with the suggestion that the client (e.g., Cursor) should run the server via its MCP settings for local testing.
- Mentioned using ithena to wrap the inspector but maybe that is too many layers hahaha.

MCP (Glama) ▷ #showcase (2 messages):

cyberchef-mcp-sse, MCP UI, SDK to add UI to MCP

CyberChef tools get exposed as MCP server!: A member created an MCP server that exposes CyberChef tools.
- This allows the LLM to perform multistep data transformations as a single bake.
MCP UI: SDK adds UI to MCP: A member released MCP UI, an SDK to add UI to MCP.
- The SDK enables any MCP server to respond with an Embedded Resource with a ui:// or ui-app:// URI.

Notebook LM ▷ #use-cases (4 messages):

Deep Dive podcasting with music, Gemini Canvas, NBLM Gemini Integration

Deep Dive Podcasting now featuring LNbeats?: A member shared a setup using OBS with music from lnbeats.com layered in as a podcasting presentation, enabling soundboard integration and Podcasting 2.0 lightning splits.
Unlock Canvas in Gemini: A member gave instructions on how to turn on ‘canvas’ in Gemini.
- They suggested working through what you want on the canvas, then saving to your Docs.
NBLM and Gemini: a tale of Two Imports: A member explained that you can import your newly created doc from Gemini into NBLM.
- They also described the inverse, explaining that you can copy-paste NBLM content to Docs, then add it into Gemini.

Notebook LM ▷ #general (48 messages🔥):

Organic Chemistry Gamification with NotebookLM, NTLM Beta App Experience, NotebookLM and Math Formatting, Source Formats for NotebookLM, NotebookLM's Web Access and Secondary Sources

Chem Class Gets a Level Up with NotebookLM: A professor is trying to gamify their undergraduate organic chemistry course and seeks advice on using NotebookLM to enhance student engagement, integrating it with data science tools and languages like Visual Basic for a project turning the course into a game.
- The professor aims for breakthrough understanding beyond grades and plans to use NotebookLM for brainstorming, aided by coding assistance from other AIs like GPT or Gemini.
NotebookLM Learns Swedish, User Confused: A user reported that NotebookLM was translating everything to Swedish despite their computer being set to English.
- Another user suggested changing the Google account language to English to fix the issue; they had the same issue with Norwegian.
Nested Links and Web Access would supercharge NotebookLM: A user suggested that if NotebookLM could access links within links (like links on Google Docs) and search the web within the context of the provided sources, it would replace 90% of their AI needs.
- However, a concern was raised that automatically treating secondary sources as primary could lead to server overload through unlimited requests.
Google I/O’s Incoming IA Invasion: A user mentioned upcoming new AI products at Google I/O, but another user humorously feigned ignorance, citing being too busy to know any details and joking that they couldn’t reveal anything even if they did.
- The discussion confirmed an issue with rendering LaTeX, with a fix in progress, but it is taking time to release.
NotebookLM audio only creates Introduction: A user reported that when generating audio from a 114-page PDF file, NotebookLM only created an introduction of about 4 minutes, and asked if this was because they were on the free version.
- No resolution was given to the user.

GPU MODE ▷ #general (1 messages):

GPU Mode videos, Community Introduction

New Member Enjoys GPU Mode Videos: A new member expressed their enthusiasm for GPU Mode videos on YouTube, mentioning they have been watching them while reading Programming Massively Parallel Processors.
- They are looking forward to the next talks and engaging with the community.
Community Welcome: The new member stated they love what you guys are doing, the videos are great, and the community sounds awesome.
- They expressed eagerness to participate in future talks and discussions.

GPU MODE ▷ #triton (3 messages):

Native FP8 support in Triton, Shared Memory Calculation, Autotuning Failure Analysis

Triton Gains Native FP8/mxfp4 support: The Triton language now supports native fp8 x mxfp4 on GB200 and fp16 x mxfp4 on other hardware.
Decoding Shared Memory Calculation: A member inquired about calculating shared memory requirements based on block sizes and num_stages after an autotuning failure.
- Another member suggested the formula shared_mem = (block_size_m * block_size_k + block_size_k * block_size_n) * num_stages for pruning configurations, and thinks that it’s breaking on the 3rd stage when it starts to pipeline, so the shared memory size is at the stage it’s breaking.

GPU MODE ▷ #cuda (2 messages):

fp16 GEMM, bf16 GEMM, cuBLAS, Lei Mao, cuda_hgemm

Looking for fp16/bf16 GEMM Examples: A member asked for examples of fp16/bf16 GEMM achieving near-cuBLAS performance, citing a blogpost by Lei Mao.
- Another member pointed to the cuda_hgemm project on GitHub as a relevant example.
cuda_hgemm project: The cuda_hgemm project on GitHub was suggested as a relevant example of fp16/bf16 GEMM achieving near-cuBLAS performance.
- The project may contain optimized CUDA kernels for half-precision general matrix multiplication.

GPU MODE ▷ #torch (6 messages):

AOT Inductor Code Correlation, FSDP2 Device Mesh Performance, Torch Compile max-autotune batch sizes

AOT Inductor Falls Short on Code Correlation: A member found that AOT Inductor wasn’t able to correlate cxx back to the original graphs, and expressed a desire to see sorted source nodes to understand what fused.
- The member suggested adding annotations to the model code to clarify what got fused, while another member offered to discuss improvements via DM.
FSDP2 2D Device Mesh Slower Than Expected: A member testing FSDP2 with 1D/2D device mesh on a 2-node, 8x H100 setup observed slower results with Qwen3 14B on 2D mesh compared to 1D.
- They questioned whether their understanding of 2D mesh being faster on slower interconnects (only all-gather after backward) was incorrect.
Torch Compile Chooses Dynamic Kernels Outside Tuned Batch Sizes: A member inquired how torch.compile handles batch sizes outside the tuned set when using max-autotune, particularly with GEMMs.
- The member found that torch.compile defaults to a Triton kernel supporting dynamic shapes (e.g., extern_kernels.mm) when batch size is None, rather than padding activations to use autogenerated kernels.

GPU MODE ▷ #beginner (9 messages🔥):

Duff's Device in CUDA, Partial Unrolling with #pragma unroll, Thread Merging in Volta and Later GPUs

Duff’s Device Debated for CUDA: A member inquired about using Duff’s Device in CUDA to minimize branching overhead and maximize thread coalescence.
- It was questioned whether a switch statement with fall-through cases would work as imagined, and what potential issues might arise.
“#pragma unroll” Unveiled as Canonical Unrolling Way: A member suggested using #pragma unroll some_number as the canonical method for partial unrolling, noting it’s well-optimized by the compiler and handles edge cases.
- It was implied that the pragma is preferable to Duff’s Device.
Volta’s Independent Thread Scheduling: The compiler’s heuristics decide where to put the BSYNC based on the code, so there is no guarantees on merging threads after divergence at the point where it’s expected to do so. Referenced was Undocumented GPU behavior for more details.
- Another suggested reading Speculative Thread Reunification for further information.
“#pragma unroll” only works if range is known at compile-time: A member inquired whether #pragma unroll works when the iteration count is unknown at compile time and potentially less than some_number.
- Another one clarified that a pragma without a number implies complete unrolling and thus only works when the range is known at compile-time, while partial unrolling does not have such restrictions.

GPU MODE ▷ #rocm (1 messages):

snektron: https://github.com/ROCm/rocm-libraries ROCm libraries new monorepo

GPU MODE ▷ #self-promotion (1 messages):

X Post, Image Analysis

X marks the post: A member shared an X post in the self-promotion channel.
Image analysis attached: The user attached an image to the post, but its content remains to be analyzed.

GPU MODE ▷ #submissions (19 messages🔥):

MI300, amd-mixture-of-experts, amd-fp8-mm

MI300 Runs Successful: Multiple submissions were successful on MI300 across different leaderboards.
- These submissions span leaderboards for amd-mixture-of-experts and amd-fp8-mm.
amd-mixture-of-experts submissions hit MI300: Submissions to the amd-mixture-of-experts leaderboard on MI300 achieved times of 7533 ms, 6827 ms, 6824 ms, and 7564 ms.
amd-fp8-mm submissions milestone: Submissions to the amd-fp8-mm leaderboard on MI300 ranged from 159 µs to 36.3 ms.

GPU MODE ▷ #factorio-learning-env (2 messages):

Server bumping, Code contribution

Enthusiastic Newcomer Bumps Server: A new member joined the channel, expressed their intent to contribute by reading through the threads and taking notes.
- They hope to contribute code, ideas, and/or some compute for the dataset.
Contributor Expresses Eagerness to Help: The user is excited about the potential to contribute code, ideas, or computational resources to the dataset, demonstrating a strong interest in collaboration.
- They are planning to review existing threads and document their findings to identify areas where they can be of assistance.

GPU MODE ▷ #amd-competition (3 messages):

AMD-FP8-MM leaderboard shapes, Mixture-of-experts submission errors

AMD-FP8-MM Leaderboard Shape Inquiry Surfaces: A member inquired about the shapes used in the amd-fp8-mm leaderboard, specifically asking if the leaderboard uses the 17 shapes they extracted from the popcorn CLI.
- They also asked how the final timing is calculated for the leaderboard, questioning whether it’s the mean or sum of all timings as a followup question.
Mixture-of-Experts Submission Snafu: A member reported encountering an unexpected error when trying to submit mixture-of-experts benchmarks, with each submission taking 10 minutes and failing.
- They clarified that they have no issues submitting other benchmarks, such as submit benchmark.

GPU MODE ▷ #cutlass (2 messages):

CuTe DSL, CUDA Python, Tensor Core Programming, Linear Algebra Programming Model

CuTe DSL Not Built on CUDA Python: The CuTe DSL is unrelated to the CUDA Python package and does not build on it; it’s a matter of preference.
- The member stated that CUDA Python may not expose tensor core programming, while CuTe DSL is designed for a productive, expressive, and performant linear algebra programming model.
Tradeoffs in CUDA Python vs CUDA C++ Performance: Discussion about the performance tradeoffs between CUDA Python and CUDA C++ in the context of the CuTe DSL and its capabilities.
- It was mentioned that CUDA Python may have limitations in exposing tensor core programming compared to CuTe DSL, which is designed for performant linear algebra programming.

GPU MODE ▷ #mojo (1 messages):

GPU puzzles, pixi errors, 4090, PTXAS fatal error, KGEN_CompilerRT_AlignedAlloc

GPU Puzzle Solver Faces Unexpected Error: A member encountered a puzzling error while working on GPU puzzles, specifically on problem p11, using pixi and a 4090 GPU.
- The error message reported was: /home/psi/mojo-gpu-puzzles/problems/p11/p11.mojo:1:1: error: ptxas fatal : Unresolved extern function 'KGEN_CompilerRT_AlignedAlloc'.
Unresolved Extern Function Causes PTXAS Failure: The error indicates an issue with the PTXAS (CUDA assembler) failing due to an unresolved external function: ‘KGEN_CompilerRT_AlignedAlloc’
- This suggests a potential problem with the compiler runtime or alignment allocation during the compilation process for the GPU code.

Nous Research AI ▷ #general (21 messages🔥):

OpenAI Release, Smart Glasses, Nous Research NYC Event, New Voice Model, Image Infilling Models

OAI Release to Bump Benchmarks: A member speculates that OpenAI’s upcoming release will likely only bump a benchmark up 0.8% and distill the model for release.
Smart Glasses Future Foretold: According to a member, fundamental physics issues like the refractive index of transparent materials will delay the creation of high field-of-view transparent displays.
- One member quipped, welcome to your smart glass zombie future.
Nous Research Event in NYC Draws Crowd: Several members expressed excitement about the upcoming Nous Research event in NYC on Thursday, with one traveling from England to attend.
- Another member noted the difficulty of traveling from SF to NYC in a short time, with a third member asking about location and time.
Dia 1.6B is a Good Open-Source Voice Model: Members discussed the existence of a new voice model released after Sesame; one member suggested Dia-1.6B on HuggingFace, also documented at Notion.
Image Infilling Ideas Requested: A member asked for suggestions for models capable of performing image infilling, such as adding a pool to a garden picture.

Nous Research AI ▷ #interesting-links (1 messages):

Augmentation Lab, Rhizome Futurism, Summer residency

Augmentation Lab opens summer residency: Augmentation Lab, spun out of Media Lab & Harvard, runs a 2mo summer residency for self-directed projects/essay-writing/startups.
- Previous residents have won OSV’s $100k grant, did YC, PhDs @ Michael Levin’s lab, & went to midjourney apple etc.
Rhizome Futurism is this summer’s theme: This year’s theme is Rhizome Futurism, encouraging concepts from Deleuze’s ideas of an interconnected future.
- Applications for rolling admission are open until June 15.

DSPy ▷ #general (14 messages🔥):

DSpy Abstractions, Foundation Models, ChatAdapter, dspy.History, dspy.Suggest and dspy.Assert

DSpy’s Abstraction Focus Sparks Inquiry: A member inquired about systematically learning abstraction building in DSpy, questioning its inspiration from PL/compiler literature.
- Another member suggested observing changes in foundation models, systems, and strategies over time, recommending books like ‘A Philosophy of Software Design’ and highlighting DSpy’s layers: models/adapters, modules for strategies, signatures/control flow, and optimizers.
Navigating ChatAdapter for LLM interactions: A member sought guidance on using DSpy’s ChatAdapter for building chat interactions with LLMs.
- Another member pointed them to the dspy.History type for assistance.
dspy.History guidelines clarified: A member asked about guidelines for using dspy.History, specifically regarding storing the whole prediction object.
- Another member suggested using dspy.inspect_history() to iterate to one’s taste.
Assert/Suggest’s Status and Replacement: A member inquired about the state and plans for Assert/Suggest in DSPy, noting the commented-out primitives/assertions.py file.
- Another member clarified that BestOfN and Refine are the replacements for dspy.Suggest and dspy.Assert as of DSPy 2.6, providing a link to a tutorial.

tinygrad (George Hotz) ▷ #general (13 messages🔥):

Bounty Google Sheet, whitespace changes, PR closed because AI, GCC instead of Clang

Tinygrad Bounty Google Sheet Shared: A member asked for the Tinygrad bounty Google Sheet, and another member shared the link.
Minimize whitespace changes: A member was asked to minimize whitespace changes in their PR to match the style of the codebase.
PR Closed Due to Suspected AI Generation: A member inquired about why their PR was closed, and the reason given was “do not use AI like what is this crap? nobody would write this by hand”.
- An image analysis bot added, btw, indistinguishable from AI is AI.
GCC vs Clang: A member asked if tinygrad could use GCC instead of Clang for a CPU target to try tinygrad on an AIX system with the PPC64 arch where there is no clang.
- Another member responded that it’s not easy, and would require adding elf relocations for ppc64 to the custom elf loader, pointing to the relevant code.

Manus.im Discord ▷ #general (13 messages🔥):

Version Control, Manus GitHub Integration, OpenAI Codex Competition

Version Control Vigilantes Victorious: A member laments the loss of work and urges others to set up version control.
- Another member states Unfortunately it’s gone, there’s no getting it back. Set that version control up my man.
Manus GitHub Goodness: A member inquires about connecting Manus with a GitHub repo.
- Another member replies, I’m pretty sure it’s not possible but another member shares a link to help.
OpenAI’s Codex Challenges Manus: A member suggests that Manus might face competition from OpenAI’s new Codex.
- The member expresses gratitude for the product and its last update.

LlamaIndex ▷ #general (10 messages🔥):

PropertyGraphIndex embeddings, Prompt caching, MCP Servers

PropertyGraphIndex Embeddings Plagued with Metadata: A member reported an issue with PropertyGraphIndex where node.excluded_embed_metadata_keys doesn’t work for entities extracted from text, leading to metadata being added to the embedding calculation, thus reducing its accuracy.
- Another member suggested that a PR is probably needed so that the excluded keys are respected when the entity node is cast as a string, automatically inserting metadata from the node inside extractors; a separation between actual properties and metadata is required.
Azure OpenAI Models Need Prompt Caching: A member inquired about implementing prompt caching and in-memory caching through LlamaIndex on their Azure OpenAI models.
- Another member asked what the prompt caching was for, and pointed to the LlamaIndex documentation on agents memory for sustaining long-term memory, suggesting there are other ways to handle memory.
Claude Desktop’s MCP Servers: A member asked if anyone has built MCP servers with tools that expect files to be dropped in through Claude desktop, seeking help.

Torchtune ▷ #general (6 messages):

Torchtune configurations, Alpaca dataset in LLM fine-tuning, Modern datasets for LLM training, Evaluation benchmark on Alpaca dataset, Torchtune performance increase

Alpaca Dataset’s Age Raises Eyebrows: Experts are questioning the relevance of Alpaca and Slimorca datasets for modern LLMs, citing their age (Alpaca was created for Llama 1) and the likelihood that current models have already absorbed most of their information during pretraining.
- These datasets may not provide a significant performance boost, with one user stating: I wouldn’t expect any lift on any benchmark when training on either.
Pirate Talk Dataset Vibe Check: Some individuals are using unconventional datasets like the talk-like-a-pirate dataset to conduct vibe checks on models during evaluations.
- The more conventional way is sanity checking with perplexity on wikitext or accuracy on hellaswag.
Feature Request: Torchtune’s Performance: A user has proposed a feature request for Torchtune to automatically evaluate performance increase on a default dataset after training, providing a benchmark for new users.
- This would serve as a reference point to ensure that changes to code or configuration don’t negatively impact performance, complementing the existing loss reduction metric.
Modern Datasets Recommendation: Instead of using Alpaca and Slimorca, users are looking for more modern datasets to train LLMs.
- No datasets were mentioned as alternatives.

Torchtune ▷ #dev (1 messages):

segmenttreebeats: <@154226635338547200> Hey! Am I correct here? I would like to merge #2608

Modular (Mojo 🔥) ▷ #mojo (4 messages):

Bazel in Modular Repo, NDBuffer Multiplication, NDBuffer Deprecation

Bazel Build in Modular Repo: Experimental: Members noted that Bazel is present in the Modular repo but is still experimental, suggesting running ./bazelw test //... to test the build.
- One user showed interest to give it a try.
NDBuffer Multiplication Quandaries: A member asked how to multiply two arrays together when those arrays are NDBuffers, noting the existence of matmul and seeking guidance on importing and using it.
- They also inquired about the potential deprecation of NDBuffer and if there’s a recommended alternative for use.

Nomic.ai (GPT4All) ▷ #general (4 messages):

koboldcpp, GPT4All, NMKD SDGUI, swarm-ui

User pines for GPT4All’s simpler interface: A user is grudgingly using koboldcpp due to lack of support for newer models in GPT4All, missing GPT4All’s ease of use and local docs support.
- The user compares this to NMKD SDGUI for Stable Diffusion, which was also easy to use but the developer suddenly disappeared without a word one day.
Swarm-UI pitched as alternative: A user suggests swarm-ui as the best way to go for easy to mid to advanced use cases.
- Another user agrees that local docs are fine and nothing similar exists.

MLOps @Chipro ▷ #general-ml (1 messages):

Designing Machine Learning Systems, Robotics AI

Designing Machine Learning Systems book generates excitement: A member expressed excitement after reading Designing Machine Learning Systems.
- They specifically asked whether the book’s content also applies to Robotics AI.
Robotics AI Relevance: A member inquired about the applicability of the Designing Machine Learning Systems book to the field of Robotics AI.
- The question seeks to understand if the principles and practices outlined in the book are relevant and useful in the context of developing AI systems for robotics.

Cohere ▷ #🔌-api-discussions (1 messages):

SwiftUI, Cohere API

Cohere API meets SwiftUI: A member inquired whether it is possible to use the Cohere API in SwiftUI.
SwiftUI Integration Feasibility: The user wants to know if the Cohere API can be integrated into SwiftUI projects for potentially building AI-powered iOS applications.