an incremental step.

AI News for 11/11/2025-11/12/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (201 channels, and 5148 messages) for you. Estimated reading time saved (at 200wpm): 423 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

GPT 5.1 launched in ChatGPT today, with API availability “later this week”:

  • 5.1 Instant is

    • “warmer by default and more conversational... surprises people with its playfulness while remaining clear and useful.”
    • improved instruction following - including respecting emdash preferences
    • can use adaptive reasoning to decide when to think before responding to more challenging questions, resulting in more thorough and accurate answers, while still responding quickly
  • 5.1 Thinking now:

    • now adapts its thinking time more precisely to the question
      ![A bar graph comparing GPT-5 and GPT-5.1's token generation across different task difficulty percentiles, showing GPT-](https://resend-attachments.s3.amazonaws.com/ZIFlwy8Q52AvVWe)

GPT5.0 move to a “legacy model”, and will be sunset in 3 months.

There is mention of AIME and Codeforces, but no evals made it to this particular blogpost, which some are criticising.

ChatGPT also gets new tone toggles for personalization. Fidji Simo’s blog says “with more than 800 million people using ChatGPT, we’re well past the point of one-size-fits-all.”

Mobile app personalization screen showing different tone and style options for AI interactions, with "Quirky" currently selected as the base style.


AI Twitter Recap

Autonomy and Physical AI: Waymo freeway rollout, Anthropic’s Project Fetch, and Perceptron’s platform

  • Waymo freeway driving goes live: Waymo is rolling out freeway driving for public riders in Phoenix, LA, and across the SF Bay Area, connecting SF↔San Jose with curbside access to SJC. Leadership frames this as a validation of the Driver’s generalization and safety claims; scale enables new airport routes and longer corridors. See announcements from @dmitri_dolgov and @JeffDean.
  • Anthropic’s Project Fetch (robot dog with/without Claude): Anthropic had two non-roboticist teams program a quadruped; only one team could use Claude. It’s framed as an empirical check on “LLMs as robotics copilots” for planning/control authoring, debugging, and iteration speed. Results and methodology are in the thread: @AnthropicAI.
  • Perceptron’s “Physical AI” platform: A new API and Python SDK targeting multimodal perception-and-action apps, currently supporting Isaac-0.1 and Qwen3VL‑235B for VLM/VLA use cases (prompting primitives grounded in vision + language, plus “chat competitions”). Free access to Isaac this week per founders. Details: @perceptroninc, @AkshatS07.

Agent evals and control: Code Arena, LangChain middlewares, and LlamaIndex SEC agent

  • Code Arena (live coding evals): A step-by-step evaluation harness where models must plan, scaffold, debug, and ship working web apps. Currently lists support for Claude, GPT‑5, GLM‑4.6, and Gemini. Useful for measuring agentic decomposition, tool use, and temporal coherence under realistic coding tasks: @arena.
  • Agent governance via middleware (LangChain):
    • Human‑in‑the‑loop middleware that pauses execution for user approval of the next step—adds an explicit “ask before acting” gate to reduce unintended actions: @bromann.
    • Tool‑call limit middleware to cap runaway tool invocation and costs; demo shows reining in a spend‑happy shopping agent: @sydneyrunkle.
  • LlamaIndex structured extraction template (SEC filings): Multi‑step agent that classifies filing type, routes to the correct extraction schema, provides a review UI prior to commit, and can extend to downstream syncing/monitoring—built on LlamaAgents with LlamaClassify + Extract. Starter template: @llama_index.
  • Benchmarking push: NousResearch endorses ARC Prize’s interactive benchmarks for measuring generalized intelligence: @NousResearch.

Systems and infra: cross-container covert channel, edge LM IPW harness, and inference infra

  • Cross‑container communication via /proc lock state: A clever channel encodes ~63 bits in the shared lock for /proc/self/ns/time that all processes can access (even across unprivileged containers), enabling a chat app without networking. Implications for container isolation and policy hardening: @eatonphil.
  • Local LMs and the “intelligence‑per‑watt” (IPW) thesis: Evidence that ≀20B‑active‑param local models improved ~3.1× in capability and ~5.3× in efficiency since 2023, with a released profiling harness across NVIDIA, AMD, and Apple Silicon. Authors argue a cloud→edge redistribution similar to mainframe→PC, with IPW as the guiding metric. Summary: @Azaliamirh; paper/blog links: arXiv + blog.
  • Inference infra note: Teams report building bespoke inference platforms, crediting Modal for compressing time‑to‑ship: @ArmenAgha.

Model UX and product updates: Gemini Live, GPT‑5.1 persona, and AI privacy

  • Gemini Live upgrade: A large update emphasizes faster turn‑taking, expressiveness, and accents for voice interactions, with usage demos highlighting more fluid conversation latency and paralinguistic variety: @joshwoodward.
  • GPT‑5.1 tone and “persona” tuning: Mixed reception on style. Some users find the default tone too saccharine or over‑empathetic @tamaybes, while others report a meaningful reduction in sycophancy and more grounded, self‑aware suggestions vs GPT‑5 (and better than 4o) in journaling‑style use @_simonsmith. Net: persona tuning is now a first‑order product surface; defaults matter.
  • AI privilege and data minimization: OpenAI’s CPO calls for a new “AI privilege” to protect sensitive, conversation‑level interactions and pushes back on indiscriminate requests for millions of chats—arguing granularity matters for respecting user intent: @jasonkwon.

Research and theory notes

  • RL geometry and “implicit KL leash”: Commentary on a new paper argues RL updates implicitly constrain divergence from the base model (a de‑facto KL leash) and preserve pretrained geometry; methods targeting “principal weights” (e.g., PiSSA) may underperform or destabilize vs LoRA. Discussion: @iScienceLuvr.
  • Spatial intelligence framing: Fei‑Fei Li’s new blog (via The Turing Post) argues world models for spatial intelligence must be generative, multimodal, and interactive—setting expectations for next‑gen embodied systems: @TheTuringPost.
  • Demos: collaborative multi‑agents in tldraw: Early look at multi‑agent collaboration UX explored live at Sync conf, with a grilling session on task decomposition and shared canvases: @swyx.

Top tweets (by engagement)

  • Waymo expands to freeways across Phoenix, LA, and SF Bay Area; adds SF↔San Jose and SJC curbside — @JeffDean (5,557)
  • Waymo’s CTO on the rollout and safety/generalization framing — @dmitri_dolgov (1,214.5)
  • Cross‑container comms via /proc/self/ns/time lock bits — @eatonphil (910)
  • Gemini Live’s biggest update (speed, expressiveness, accents) — @joshwoodward (624.5)
  • Code Arena: live coding evals for agentic coding — @arena (514.5)
  • Anthropic’s Project Fetch (robot dog + Claude vs control) — @AnthropicAI (478.5)
  • AI privacy and “AI privilege” stance — @jasonkwon (438.5)

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. AELLA Open-Science Initiative

  • AELLA: 100M+ research papers: an open-science initiative to make scientific research accessible via structured summaries created by LLMs (Activity: 455): AELLA is an open-science initiative aimed at making over 100 million research papers accessible through structured summaries generated by Large Language Models (LLMs). The project is hosted on Hugging Face and offers a visualizer tool for exploring these summaries. The initiative is detailed in a blog post by Inference.net, highlighting its potential to democratize access to scientific knowledge by leveraging AI to create concise, structured summaries of vast amounts of research data. Some users express skepticism about the project’s utility and the choice of its name, indicating a need for clearer communication on its practical applications and benefits.
  • Repeat after me. (Activity: 671): The post discusses the performance of AMD graphics cards in processing tokens per second compared to Nvidia cards, highlighting that an AMD card, which is significantly cheaper, achieves 45 tokens per second. This is contrasted with Nvidia cards that can achieve 120 to 160 tokens per second, but at a higher cost. The post suggests that while AMD cards may currently be slower, they are improving over time, and users should not feel pressured to pay a premium for faster performance. Commenters note that the token speed is sufficient as long as it exceeds their reading and comprehension speed. There is also a mention of misinformation regarding the difficulty of running LLM models on AMD hardware, suggesting that it may not be as challenging as some claim.
    • A key issue highlighted is the performance disparity between AMD and NVIDIA GPUs, particularly in handling large-context processing tasks. While 45 tokens per second (tps) is adequate for single-user generation, NVIDIA’s GPUs excel in prompt processing at larger contexts, achieving several thousand tps compared to AMD’s few hundred. This makes NVIDIA more suitable for complex applications like RAG pipelines and coding assistants.
    • The software ecosystem for AMD is criticized for being poorly supported, with users experiencing issues such as random crashes and lack of driver support. For instance, the Radeon PRO W6000-series has been plagued with GCVM_L2_PROTECTION_FAULT_STATUS faults, and AMD’s ROCm support is inconsistent, requiring users to apply workarounds like monkey-patching libraries. In contrast, NVIDIA’s CUDA has maintained long-term support, with Pascal support only recently dropped after a decade.
    • AMD’s approach to customer support is criticized as lacking, with a focus on selling hardware rather than maintaining it. Users report that AMD often fails to support their products beyond a single generation, leading to a reliance on community-driven solutions to make AMD hardware functional. This contrasts with NVIDIA’s more stable and long-term support for their products, making them a more reliable choice for compute tasks.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. GPT-5.1 Release and Features

  • GPT-5.1: A smarter, more conversational ChatGPT (Activity: 878): OpenAI has launched GPT-5.1, featuring two models: GPT-5.1 Instant and GPT-5.1 Thinking. The release focuses on enhancing conversational AI with adaptive reasoning and dynamic thinking time adjustments, allowing for faster responses to simple queries and more detailed answers for complex ones. However, the release lacks benchmarks, a comprehensive system card, and an API, raising questions about the rushed nature of the launch. For more details, see the OpenAI announcement. Commenters noted the absence of benchmarks and a detailed system card, suggesting a rushed release possibly to compete with other tech announcements. Concerns were raised about the lack of API and incomplete testing phases.
    • Several users noted the absence of benchmarks in the GPT-5.1 release, which is unusual for a major model update. This lack of performance metrics makes it difficult to assess improvements over previous versions, such as GPT-4, and raises questions about the model’s capabilities and enhancements.
    • The release of GPT-5.1 appears rushed, as indicated by a brief system card and the delay in API availability. Additionally, the model did not complete its stealth testing phase, known as Windsurf, which is typically a standard procedure before a full release. This has led to speculation about the reasons behind the hurried launch.
    • Some users speculate that GPT-5.1 is aimed at users who preferred the style of GPT-4 over GPT-5, suggesting that the new version might be an attempt to cater to those who were not satisfied with the previous iteration’s changes. However, without benchmarks or detailed documentation, it’s challenging to confirm these assumptions.
  • ChatGPT-5.1 (Activity: 813): The image is a promotional announcement for the release of GPT-5.1 by OpenAI, scheduled for November 12, 2025. This version is described as a more intelligent and conversational iteration of ChatGPT, with a focus on customization features. The release is initially targeted at paid users, indicating a strategic move to prioritize premium services. The announcement suggests improvements in user interaction, particularly in the ‘Instant mode,’ which may offer a different tone or style of responses compared to previous versions. Some users express concern over the increasing number of similar model names, which could lead to confusion. Others note the prioritization of paid users, indicating a shift in OpenAI’s business strategy.
    • AdDry7344 highlights a noticeable change in tone with ChatGPT-5.1’s Instant mode, suggesting it may affect user experience by altering how responses are perceived, especially in stress-related queries. This could imply a shift in the model’s conversational style, potentially impacting its effectiveness in providing concise, direct advice.
    • Nakrule18 criticizes ChatGPT-5.1 for defaulting to a more verbose, ‘chatty’ style compared to GPT-5, which was appreciated for its concise and direct responses. This change might affect users who prefer straightforward answers over a conversational tone, indicating a possible regression in user experience for those seeking efficiency.
    • Dark_Karma notes the improved speed and more engaging responses of ChatGPT-5.1, suggesting enhancements in processing and interaction quality. This could indicate optimizations in the model’s architecture or algorithms, leading to faster response times and potentially more dynamic conversational capabilities.
  • I Won Full Custody With No Lawyer Thanks to ChatGPT. (Activity: 727): A Reddit user, a health physicist, successfully navigated a custody battle without a lawyer by leveraging ChatGPT to understand court rules, procedures, and fill out legal forms. The user was awarded full custody, with the other parent limited to conditional visitation due to preexisting assault charges. The user emphasizes that while AI was instrumental, the success was also due to the specific circumstances of the case, including the mother’s legal history and the user’s technical expertise. The post highlights the potential of AI in legal contexts but cautions against over-reliance on it for legal success. Commenters noted AI’s potential to disrupt traditional legal practices, with one highlighting the importance of understanding AI’s limitations, such as hallucinations, and another sharing a similar tool, FreeDemandLetter.com, for legal assistance.
    • Dry-Peanut6627 highlights the disruptive potential of AI in family law, noting that while attorneys criticize AI for generating inaccuracies, users can quickly correct these ‘hallucinations’ if they are knowledgeable. This suggests a shift in power dynamics, where litigants are increasingly equipped with information that was traditionally monopolized by legal professionals.
    • bobboblaw46, a lawyer, strongly advises against self-representation in legal matters, even with AI assistance like ChatGPT. They emphasize that legal errors can have severe consequences, and AI often provides incorrect legal advice, misinterprets case law, or offers overly simplistic solutions. The complexity of law justifies the extensive education and training lawyers undergo, underscoring the risks of relying solely on AI for legal representation.
    • MetsToWS mentions creating FreeDemandLetter.com, a tool designed to assist individuals with legal issues such as unpaid contracts and security deposit refunds. This tool, similar to ChatGPT, guides users through legal processes, indicating a trend towards accessible legal assistance through technology.
  • Chat gpt used to write article in Dawn newspaper (Activity: 970): Dawn, a prominent Pakistani newspaper, reportedly used ChatGPT to write an article, sparking discussions about the role of AI in journalism. The incident highlights concerns over AI-generated content, particularly regarding the lack of human oversight, as evidenced by a commenter’s experience where AI editing led to significant content distortion, including the addition of 30 em dashes. This raises questions about the reliability and editorial standards when using AI tools in professional writing. Commenters express skepticism about AI’s role in journalism, emphasizing the importance of human oversight in editing to maintain content integrity and quality.
    • irr1449 shares a technical issue where using ChatGPT for editing led to a significant alteration of their article. The AI not only shortened the text but also introduced about 30 em dashes, which disrupted the original content. This highlights potential pitfalls in relying on AI for nuanced editing tasks, where the AI’s changes can inadvertently alter the intended message or style of the writing.

3. Creative AI Experiments

  • I told my AI to surf the internet and send me postcards (Activity: 499): The post describes an experiment where an AI is tasked with a multi-step process: surfing the internet, generating an image as if it were a postcard from a virtual location, and writing a short message. The AI is instructed not to reveal the websites it visited, focusing instead on the creative output. This experiment highlights the AI’s ability to integrate web search, image generation, and text composition into a cohesive task, showcasing advancements in AI multitasking capabilities. The comments include links to images presumably generated by the AI, suggesting a focus on the visual output of the experiment. However, there is no substantive technical debate or discussion in the comments.
  • Gemini switched roles (Activity: 1632): The image appears to be a humorous depiction of a digital interface, possibly related to an AI or software named “Gemini,” which is tasked with changing the color of a jacket worn by a character. The interface suggests that “Gemini” might have switched roles, implying a mix-up or error in its functionality. This is further emphasized by the comments, which mock the AI’s response capabilities, suggesting it might not be performing as expected. The image and comments highlight the challenges and limitations of AI in understanding and executing specific visual tasks. The comments humorously critique the AI’s limitations, with one suggesting a sarcastic response from the AI and another pointing out the AI’s inability to perform the task, reflecting a common sentiment about AI’s current capabilities.
  • UBTech shows off its self charging humanoid robots army aiming to fullfill a >100M factory order (Activity: 1239): UBTech has showcased its self-charging humanoid robots, which are part of a significant order valued at 112M USD, not 100M units as initially misunderstood. According to a South China Morning Post article, the company plans to deliver “more than 500” units by the end of the year. These robots are designed for factory jobs, highlighting a significant step in automation and robotics in industrial settings. A comment clarified the misunderstanding about the order size, emphasizing the financial value rather than the number of units. This highlights the importance of precise communication in technical discussions.
    • The discussion clarifies that UBTech has received $112 million in orders, not 100 million units, as some might have misunderstood. According to a SCMP article, the company plans to deliver over 500 units by the end of the year. This indicates a significant scale of production and deployment for humanoid robots in industrial settings.
  • Wisker dont like to take orders (Activity: 3151): The post humorously suggests that a cat, referred to as ‘Wisker’, is resistant to taking orders, possibly in the context of a playful or metaphorical scenario involving AI or automation. The comments play along with this theme, joking about a cat being involved in tasks like cooking or using AI, such as ‘CatGPT’. The external link summary indicates restricted access to the content, requiring login or a developer token for further details. The comments reflect a light-hearted engagement with the idea of a cat being autonomous or involved in AI tasks, with no substantive technical debate present.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Flash Preview 05-20

Theme 1. Next-Gen AI Models Spark Hope and Frustration

  • GPT 5.1 Disappoints While Gemini 3 Hype Builds: Many users trashed the newly released GPT 5.1 as trash and safetylobotomized, noting a lack of benchmarks, but eagerly await Gemini 3 Pro, expected next week, with one test showing it comparable to a human. OpenAI announced GPT-5.1 rolls out to all users this week, with a Reddit AMA planned for tomorrow at 2 PM PT.
  • Riftrunner Codes Mario, Other Models Crash: Riftrunner demonstrated superior coding by building a 3D Mario game and a functional 3D Flappy Bird game from a simple prompt, generating 2k lines of code and outperforming Lithiumflow and the “bad” rain-drop (a Llama model). However, Riftrunner also exhibited laziness, prompting one user to state, if you motivate it, it might listen to you.
  • New Small Models Make Big Claims, But Drift: New models like WeiboAI (based on qwen2.5) showed surprisingly good initial performance for a 1.5B parameter model, but it drifts after the first 1-2 turns, while ixlinx-8b was released as a state-of-the-art (SOTA) small model from a local hackathon. Users also noted Aquif-3.5-Max-42B-A3B trending, speculated to be upscaled and fine tuned.

Theme 2. Developer Tooling Navigates Complex AI Landscapes

  • Aider’s Vim Mode Wins Praise, Markdown Still a Mess: Users lauded Aider’s Vim mode as fantastic and praised new session management features, but reported Aider gets confused by nested markdown when creating code snippets with anthropic.claude-sonnet-4-5-20250929-v1:0. Adding three and four backticks ('&#x60;&#x60;&#x60;' and '&#x60;&#x60;&#x60;&#x60;') to conventions.md forced <source> tags, resolving the issue.
  • Cursor’s Max Mode Boosts Power, But Costs Double: Max mode in Cursor removes limits for maximum performance and cost reduction, enabling it to read entire files instead of chunks, but exceeding 200k context with Sonnet 4.5 doubles the cost. Users humorously suggested capping it, Cant we limit this to 200k and post that we can give another command 💀.
  • Perplexity Partner Program Bans Frustrate Users: Several users reported Perplexity Partner Program bans for “fraudulent activity,” citing a lack of support for appeals and suspecting issues like referral system gaming or VPN usage. Meanwhile, Gemini 2.5 Pro integration within Perplexity also “is broken and poorly implemented,” automatically switching to GPT.

Theme 3. Hardware Challenges Drive AI Performance Optimization

  • CUDA Compiler Commands Clarified, PTXAS Already O3: New CUDA developers learned to use O3 for host optimization and lineinfo for profiling with Nsight Compute. It was clarified that O3 primarily optimizes the host (CPU) part of the code, and PTXAS already defaults to O3 optimization for GPU code.
  • Vulkan’s Stability Issues Arise, CUDA Saves the Day: Users experienced frequent blue screen errors (BSODs) with LM Studio using Vulkan, particularly on NVIDIA GPUs, resolving issues by switching to CUDA. Although Vulkan was faster for small tests on a 3090, it proved unstable.
  • NVIDIA Competition Rules Cache Kernels, Not Tensors: Users submitting to the NVIDIA competition (e.g., nvfp4_gemv) learned that caching compiled kernels is permissible, but caching tensor values between benchmark iterations is strictly prohibited. The B200 GPU with 148 SMs running at 1.98 GHz scores submissions, with details in Nvidia’s blog post.

Theme 4. AI’s Ethical Battlegrounds and Licensing Quandaries

  • OpenAI Fights NYT Over User Privacy: OpenAI’s CISO addressed The New York Times’ invasion of user privacy in a letter, detailing the legal battle and their commitment to protecting user data. OpenAI also offered 12 months of free ChatGPT Plus to eligible active-duty service members and recent veterans.
  • AI Chatbot Hordes Threaten Social Media Propaganda: Members discussed the potential for an AI chatbot infestation across social media, predicting online will be dominated by AI chatbots who will just constantly push propaganda. This raises concerns about distinguishing real people from AI and the spread of misinformation.
  • Nemo-CC 2’s License Raises Developer Eyebrows: Members debated the restrictive licensing terms of Nemo-CC 2, citing concerns about NVIDIA terminating licenses with 30 days notice and prohibiting public sharing of evaluation results without prior written consent. One user summarized, You are not allowed to train a model on the dataset, evaluate the model, and publicly share the results without NVIDIA’s prior written consent, with more details in this paper.

Theme 5. Advancing LLM Research and Development Practices

  • MLE Interview Prep: Leetcode Trap or Real-World Skills?: Members debated MLE interview preparation, with some calling it a trap due to employer/team dependency, while others advised building something in the open that would be useful to companies training/serving models. Implementing Multi-Head Attention in Numpy was deemed horrible for interviews.
  • DSPy Demands Domain Knowledge, Signatures Still Act as Prompts: While DSPy abstracts prompting, domain-specific LLM applications still require detailed instructions within signatures (some users writing 100 lines), indicating that DSPy needs encoded domain knowledge to guide the LLM effectively. Participants noted that DSPy’s signatures still function as prompts, particularly in docstrings encoding business rules, despite offering better abstraction.
  • Mojo’s Metaprogramming Might, Mutability Muddle: Mojo aims for dynamic type reflection and features metaprogramming capabilities more powerful than Zig’s, with Mojo able to allocate memory at compile time (Mojo’s metaprogramming capabilities). A debate arose over mandatory mut annotations for function parameters, with comparisons to Rust and proposals for optional annotations or comptime syntax.

Discord: High level Discord summaries

LMArena Discord

  • Riftrunner masters Mario, Other Models Fail: Members found Riftrunner excels in creating a Mario game, surpassing other models, but it can also lie.
    • In comparison, the model rain-drop turned out to be a Llama model that generated bad terminal outputs and acted like Gemini 3 Flash.
  • Riftrunner codes Flappy Bird better than LithiumFlow: Riftrunner outperforms Lithiumflow in coding tasks, creating a functional 3D Flappy Bird game after generating 2k lines of code from a shared prompt.
    • However, Riftrunner also exhibited laziness, prompting the user to share, if you motivate it, it might listen to you.
  • GPT 5.1 Dissappoints, Awaiting Gemini 3: Members expressed disappointment with the recent GPT 5.1 release and hope that Gemini 3 Pro will be significantly better and is expected to release sometime next week.
    • One user derisively called GPT 5.1 trash.
  • Code Arena Replaces WebDev Arena: Code Arena is live on LMArena, offering real-time generation of deployable web apps that users can directly inspect and judge, succeeding the old WebDev Arena, according to a blog post and YouTube video.
    • Models generate live, deployable web apps and sites that anyone can open, inspect, and judge directly, in real time, while the leaderboard showcases the new evaluation system.

Perplexity AI Discord

  • Perplexity Referral Program Bans Users: Several users report being banned from the Perplexity Partner Program for “fraudulent activity,” and expressed frustration over the lack of support.
    • Some users suspect the bans are related to gaming the referral system or VPN usage, while others speculate that the issue might be related to payout eligibility.
  • Gemini 2.5 Pro Integration Flounders: Users are reporting issues with Gemini 2.5 Pro in Perplexity, saying that “it’s broken and poorly implemented in pplx atm, no way to fix it.”
    • Perplexity’s interface seems to be automatically switching to GPT, even when Gemini 2.5 Pro is selected.
  • GPT Go Sells GPT-5 mini: A user who purchased the GPT Go subscription reported that while it advertises GPT-5 thinking, it mostly uses GPT-5 thinking mini, leading to a refund request.
    • Members debated on whether they preferred a specific model or not, which led to the refund.
  • Comet Plagued by Control Catastrophies: Users reported Comet AI Assistant issues such as the inability to perform webpage actions, unresponsive buttons, and an inability to control the browser.
    • Some users have found solutions such as logging in, changing IP address (VPN) or deleting and reinstalling Comet, and posting in the troubleshooting channel.
  • Sourcify’s Open Source Shindig: Sourcify IN is hosting an event titled Forks, PRs, and a Dash of Chaos: The Open Source Adventure on November 15, 2025, featuring Swapnendu Banerjee.

Cursor Community Discord

  • Cursor Review Agent is Cheap, Still Costs: Members noted the agent review feature in Cursor IDE incurs costs with each use, but is relatively inexpensive, sharing usage screenshots.
    • One user had used 76% of their allowance, and others found clicking Try Again sometimes resolves issues.
  • Cursor vs Copilot Preference Prevails: Some users returned to Copilot after trying Cursor, emphasizing that tool preference can be subjective, saying My brother is a fan of copilot idk why.
    • The discussion highlights how developer’s choice can be influenced by personal style.
  • Users Exploit Unlimited ChatGPT Glitch: Some users exploited ChatGPT 5 when it was briefly free due to a pricing bug, including unlimited Opus 4 requests.
    • One user lamented Sad I wasn’t aware of the opportunity, and another described the situation as bugged as hell, we had unlimited everything.
  • Max Mode Costs Double at 200k Context: Max mode in Cursor removes limits to maximize performance, enabling it to read entire files instead of chunks and reduce costs.
    • Exceeding 200k context with Sonnet 4.5 doubles the cost, with users humorously suggesting capping it: Cant we limit this to 200k and post that we can give another command 💀.
  • Custom Rules Keep AI in Check: Members are establishing custom rules and lints to control AI behavior and prevent dirty code from entering repos.
    • One member shared a streamlined approach using .cursorrules, lints, custom eslint plugins, and husky to prevent AI from drifting.

GPU MODE Discord

  • Popcorn CLI strikes syntax scare: Users encountered a syntax error with popcorn-cli submit, pointing to the popcorn-cli readme for correct syntax and emphasizing that URLs should be entered without quotes when using the export command.
    • Members reported that the grayscale leaderboard is closed and that ensuring nvmnvfp4_gemv is selected in the popcorn cli is crucial for proper evaluation.
  • CUDA Compiler Commandments clarified: New CUDA developers were advised to use -O3 for optimization and -lineinfo for profiling with Nsight Compute.
    • It was also pointed out that the -O3 compiler option primarily optimizes the host (CPU) part of the code, with default optimization level of PTXAS already O3.
  • DMA Documentation Desired!: A member expressed dissatisfaction with existing documentation on Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA) from sources like Wikipedia, ChatGPT, and vendor sites.
    • The user is seeking more detailed and technical documentation, though specific requirements were not detailed.
  • Nvidia comp requires correct auth: Users faced 401 Unauthorized errors with popcorn-cli submit, which was traced to needing to re-authenticate via the Discord OAuth2 link provided during registration.
    • It was clarified that while caching compiled kernels is permissible, caching tensor values between benchmark iterations is not allowed, with reference code available here.
  • Cutlass and CuTe cuts out the bugs: The CuTeDSL and Cutlass libraries were updated to version 4.3.0, resolving issues with CuTe submissions, with the CuTe example now passing.
    • The B200 GPU has 148 SMs (Streaming Multiprocessors) running at a boost clock of 1.98 GHz and is used to score submissions and the relevant Nvidia’s blog post includes the B300 diagram.

Unsloth AI (Daniel Han) Discord

  • VibeVoice sings off-key in Bulgarian: Users discovered that VibeVoice had difficulty producing high-quality Bulgarian TTS without further finetuning.
    • Community members joked about the output sounding ‘like a drunk brit trying to read a phonetic version of the sentence’, highlighting the challenges in adapting TTS models to new languages.
  • QAT: Intel autoround vs BNB Showdown: A discussion arose regarding the potential benefits of using Intel autoround quants for training compared to bnb 4-bit quants, especially with the introduction of QAT in Unsloth.
    • Concerns were raised about the compatibility of autoround with Unsloth’s QAT and the need for customization, with emphasis on QAT targeting fast, simple quantization formats.
  • GPT-OSS-20b gives senseless solution: A user reported encountering nonsense generations from gpt-oss-20b when prompted with a math problem, tracing the issue back to an attention patch modifying matmul() calls.
  • Translation Dataset prompt details Data Debacle: A member shared a prompt for generating a translation dataset for LLMs, emphasizing the use of provided samples only, without generating new translations.
    • The prompt details how to create a dataset with specific formatting rules, including language combinations and punctuation alignment.
  • Ollama documentation faces link lapse: A user reported that Ollama links are broken on the documentation page.
    • They also questioned the use of f16 in the example, suggesting it should be q8_0 instead, when using 8-bit quantization for the KV cache.

OpenRouter Discord

  • MiniMax M2’s Free Ride Ends: The free access period for MiniMax M2 is concluding, requiring users to switch to a paid endpoint to continue using the model.
    • Users have only one hour to migrate to the paid endpoint to prevent interruptions.
  • OpenRouter Chat Crashes, Users Rage: Users reported a chat scrolling issue on OpenRouter that prevented access to old chats, and one user identified a commit that broke the chat.
    • Despite the inconvenience, a user joked that OpenRouter’s mistakes are benign compared to hidden system prompt changes from other AI companies, before the OpenRouter team quickly resolved the issue within 3 minutes.
  • Gemini 3 Hype Train Departs Station: Enthusiasm builds around the potential of Gemini 3, with some cautioning against excessive hype, despite a LiveBench test showing Gemini 3 achieving a ranking comparable to a human.
    • The community anticipates a release that is both powerful and nicely priced.
  • Free Model Drought Sparks Anxiety: The scarcity of free AI models is increasing due to rising popularity and increased internet access, leading to resource limitations, particularly after a YouTube video caused Deepseek Free to go down.
    • While some suspect RP apps are siphoning off the API, others say paid services remain the most reliable due to reduced abuse, after reporting mixed experiences with Claude’s free tier limits.
  • Local AI Hardware: Ryzen Gets Roasted: Users debated the best hardware for local AI, with a Minisforum mini PC dismissed as a poor choice due to its Ryzen architecture and limited power.
    • The conversation shifted to recommending RTX Pro 6000 Blackwell, RTX 5090, or RTX 3090, depending on the budget, with concerns about the high cost of junkyard builds with DDR4 memory.

OpenAI Discord

  • OpenAI Fights Back Against NYT: OpenAI’s CISO addressed The New York Times’ invasion of user privacy in a letter.
    • The letter detailed the legal battle and OpenAI’s dedication to protecting user data from unauthorized access.
  • Free ChatGPT Plus for Vets: OpenAI is offering 12 months of free ChatGPT Plus to eligible active-duty service members and veterans who have transitioned from service in the last 12 months; claim here.
    • The announcement was made to the community and all users have been notified.
  • GPT-5.1’s Debut: GPT-5.1 is rolling out to all users this week, becoming smarter, more reliable, and more conversational, read more here.
    • A Reddit AMA on GPT-5.1 and customization updates will happen tomorrow at 2 PM PT.
  • AI Chatbot Hordes Threaten Social Media: Members discussed the potential for an AI chatbot infestation across social media, pushing propaganda and making it difficult to distinguish between real people and AI.
    • One member said that online will be dominated by AI chatbots who will just constantly push propaganda, and the only escape might be going outside.
  • Users Find and Share Prompt Engineering Tips: A member shared a detailed prompt lesson using markdown for prompting, abstraction via variables, reinforcement for guiding tool use, and ML format matching for compliance.
    • The member provided a markdown snippet for teaching hierarchical communication, abstraction, reinforcement, and ML format matching.

LM Studio Discord

  • Phi-4: Small Model Has Big Brain: A user sought a lightweight chat model for writing a book for private research, and settled on Microsoft Phi 4 mini.
    • Another user suggested considering budget and usage plans to decide between a subscription or dedicated hardware.
  • Gemini 2.5 Pro Dethrones Sonnet: A user reported that Gemini 2.5 Pro outstripped the current Sonnet 4.5 iterations.
    • The user expressed eagerness for Gemini 3 to come out soon.
  • CUDA Update Causes Vision Model Carnage: Users reported that the new CUDA version 1.57 is breaking vision models, causing crashes, with a recommendation to roll back.
    • One user specified that Qwen3 VL also crashed and suggested it affects llama.cpp runtimes.
  • Multi-GPU Model Loading Still Complex: Users found that the possibility of loading two different models on two different GPUs in the same system with LM Studio is only possible if you run multiple instances of LM Studio.
    • GPU offload has always been all or none in LM Studio, you can’t pick and choose which one is used for individual models.
  • Vulkan’s Stability Issues Arise Again: Users experienced frequent blue screen errors (BSODs) while running LM Studio with Vulkan, with suspicions falling on compatibility issues with NVIDIA GPUs.
    • Switching to CUDA resolved the stability issues, but it was noted that Vulkan was faster for small tests on a 3090.

Eleuther Discord

  • Einops or GTFO Numpy Implementations: Members joked about implementing Numpy without Einops, with one member suggesting that Numpy implementations are kinda useless without autodiff to train.
    • Another member said that implementing Multi-Head Attention in Numpy is horrible and better suited to being motivated/rederived rather than coded up during an interview.
  • Is MLE interview prep a Leetcode Trap?: Members debated the best way to prepare for MLE interviews, with one describing it as a trap that is too employer and team-dependent to nail down.
    • Instead, one member advised to build something in the open that would be useful to companies training/serving models.
  • Dataset Mixing Ideal for Pretraining: Members suggested using Zyda-2, ClimbLab, and Nemotron-CC-v2 for initial pretraining, noting that mixing them could be ideal given their individual strengths and weaknesses.
    • One member asked about the token breakdown, and if subsets like slimpj and the slimpj_c4 scrape are upsampled/downsampled.
  • NVIDIA Dominates Quality Datasets: A member noted that NVIDIA and HF are overall leading along the quality axis for open-source datasets rn.
  • Nemo-CC 2 License Raises Eyebrows: Members debated the licensing terms of Nemo-CC 2, expressing concerns about potential restrictions on sharing datasets/models that leverage it and pointing out that they can terminate your license at any time for no reason with 30 days notice.
    • One user summarized, You are not allowed to train a model on the dataset, evaluate the model, and publicly share the results without NVIDIA’s prior written consent, with more details available in this paper.

Nous Research AI Discord

  • Autonomous AI Created By Accident: A user shared a GitHub repository claiming to have created autonomous AI by accident.
    • No further details were provided regarding the specifics or capabilities of this project.
  • WeiboAI Stuns but drifts after 2 turns: Users discussed the new WeiboAI model, based on qwen2.5, noting its surprisingly good initial performance, referencing this tweet.
    • Another user pointed out that it drifts after the first 1-2 turns, but remains somehow good for a 1.5B parameter model and can recite content from Quora.
  • Baguettotron Reasoning Gets Attention: A member inquired about benchmarking Baguettotron, noting its fairly interesting reasoning traces despite its small size.
    • There was no follow up on whether this benchmark was pursued.
  • GGUF Files Unavailable in Nous Chat: Users were informed that importing GGUF files directly into Nous Chat is not currently supported, but can be used locally with tools like llama.cpp or Ollama.
  • ixlinx-8b Debuts as SOTA Small Model: The ixlinx-8b model was released on GitHub after a long period of development, advertised as a state-of-the-art (SOTA) small model from a local hackathon.
    • The creators invited contributions and suggested that the developers of Hermes should evaluate it.

Latent Space Discord

  • Windsurf Releases Aether Models for Testing: Windsurf Next launched Aether Alpha, Aether Beta, and Aether Gamma models in the #new-models channel, available for free testing for a limited time, with a direct download link provided.
    • Users were urged to test the models quickly, as free access won’t be free for more than a week.
  • OpenAI’s Training-Cost Trends Charted: Masa’s chart illustrating OpenAI’s training-cost trends sparked discussions on metrics, with members requesting more data points, including burn rate and revenue.
    • Some members pointed out that OpenAI is nearly 10 years old and suggested adjusting the numbers for inflation.
  • Meta’s FAIR v2 Allegedly Foiled: Susan Zhang revealed that Meta declined to create a lean FAIR v2 in early 2023 to pursue AGI, instead tasking the GenAI org with shipping AGI products, according to this tweet.
    • She alleges that vision-less execs hired cronies who overpromised results and later joined OpenAI with inflated rĂ©sumĂ©s, causing lasting damage.
  • Character.AI’s Kaiju Models Optimize for Speed: Character.AI’s proprietary Kaiju models (13B/34B/110B) were engineered for inference speed using techniques like MuP-style scaling, MQA+SWA, and ReLUÂČ activations, as detailed in this Twitter thread.
    • The team deliberately avoided MoEs due to production constraints.
  • Magic Patterns 2.0 Raises $6M Series A: Alex Danilowicz unveiled Magic Patterns 2.0 and a $6M Series A led by Standard Capital.
    • The company celebrated bootstrapping to $1M ARR with no employees and 1,500+ product teams now using the AI design tool, planning to rapidly hire across enterprise, engineering, community and growth roles.

Modular (Mojo đŸ”„) Discord

  • Mojo’s Dynamic Reflection Digs Deep: Mojo aims to support dynamic type reflection, using its JIT compiler to handle dynamic data, and try-catch and raise will be standard for error handling to match Python’s style, as well as monadic options.
    • In a recent interview, Chris Lattner said that Mojo’s metaprogramming is more powerful than Zig’s because Mojo can allocate memory at compile time (YouTube link).
  • Members Mull Mandatory mut: A debate arose around the verbosity of mandatory mut annotations for function parameters, drawing comparisons to Rust and Python.
    • Members suggested a compromise where mut is mandatory inside fn only if the argument is reused, and call side mut annotation is applied after the function call.
  • Metal Compiler Meltdown on M4 Solved: One member encountered a Metal Compiler failed to compile metallib error while following the ‘Get started with GPU programming’ tutorial on an Apple M4 GPU.
    • The issue was resolved by ensuring the full Xcode installation was present, and there are no print() statements in the GPU kernel, and using the latest nightly build.
  • C-FFI Conundrums Confronted: Members discussed pain points in doing C-FFI with Mojo and suggest using Origin.external to workaround the rewrite explicitly trying to fix things.
    • It was also suggested to use MutAnyOrigin to preserve old behavior exactly, though it will extend all lifetimes in scope.
  • comptime Bird syntax scrutinized: The syntax comptime Bird = Flyable & Walkable for trait composition was discussed, with some finding it less intuitive than the alias keyword.
    • Others argued that comptime more accurately reflects the keyword’s functionality, particularly with static reflection and the ability to mix types and values at compile time.

DSPy Discord

  • DSPy Does Demand Domain-Driven Domain Knowledge: While DSPy aims to abstract away prompting, domain-specific LLM applications still require detailed instructions within signatures, with one user having 100 lines for some modules.
    • The consensus is that DSPy requires more than just basic prompts for complex tasks; it necessitates encoding domain knowledge and step-by-step instructions to guide the LLM effectively.
  • Signatures: Better Than Prompts, But Still Prompts?: Participants discussed that DSPy’s signatures, while a better abstraction than raw prompts, still function as prompts, particularly within the docstrings of class-based signatures where business rules are encoded, facilitating optimization.
    • The framework helps to program, rather than focus on prompting, but a lot of the confusion in the community stems from the fact that a prompt means different things to different people.
  • GEPA Geometries Gradual Gains: While GEPA aims to optimize prompts, users find that specific guidelines are still necessary, even with tool functions, such as instructing the LLM to use regex for agentic search when initial terms fail.
    • One user found that they needed to add specific guidelines that LLM should send specific terms for tool to search via ripgrep but if it doesn’t find one MAKE SURE you add Regex as next, without which the LLM wouldn’t use Regex terms in the search tool
  • Agentic Agents Augmenting Analytics: A user shared a scenario where they needed to instruct the LLM to use regex in agentic search with ripgrep to effectively search through documents, highlighting the need for specific guidance even with advanced tools.
    • Another user shared about instructing the LLM that the answer might not be on page 1 in search results.
  • Taxonomy Tail Troubles Told: A member wrote a blogpost about their experience creating taxonomies.
    • They find the topic super relevant in the context of structured generation.

HuggingFace Discord

  • ZeroGPU Zeros Performance Concerns: Members discussed issues with ZeroGPU, but it seems working now, although it is unclear whether there are still lingering concerns.
    • The discussion comes after it had been reported that it wasn’t working with logs.
  • Reuben’s Recursive Removal Resolved: Reuben was banned by a bot due to sending too many messages triggering a spam filter, and later unbanned by lunarflu.
    • The situation prompted discussions on using regex or AI to detect spam, with concerns raised about privacy.
  • Aquif-3.5-Max-42B-A3B Attracts Attention: Members noticed the Aquif-3.5-Max-42B-A3B model trending on Hugging Face.
    • Speculation arose that this was due to it being upscaled and fine tuned.
  • Tokenflood Tool Tests LLM Latency: A freelance ML engineer released Tokenflood, an open-source load testing tool for instruction-tuned LLMs, available on GitHub.
    • It simulates arbitrary LLM loads and is useful for assessing prompt parameter changes.
  • MCP Celebrates Milestone with Anthropic and Gradio: The MCP 1st Birthday Bash, hosted by Anthropic & Gradio, kicks off this Friday, Nov 14 (00:00 UTC) at https://huggingface.co/MCP-1st-Birthday.
    • It features $20K in cash prizes and $2.7M+ in API credits for participants, with thousands already registered.

Moonshot AI (Kimi K-2) Discord

  • Researcher Mode Bugs frustrate Users: Users reported receiving errors from Researcher Mode instead of results, even with minimal prior use, and they asked about credits.
    • The problems may be related to whether Researcher Mode is completely paid, as users are receiving insufficient credit/upgrade messages.
  • Kimi Coding Plan API Quota Dries Up: The Kimi Coding Plan’s API quota depletes quickly (within hours) due to web search and plan mode usage.
    • One user speculated that Moonshot AI might transition to a cursor-like plan, particularly given their funding compared to OAI and Anthropic.
  • Kimi API Setup Causes Headaches: Users needed help with Kimi API setup for the thinking model using HTTP, encountering authorization failures despite having credits and a valid API key.
    • It was discovered that the user was employing the Chinese platform URL instead of the global https://api.moonshot.ai/v1/chat/completions URL, which fixed the issue.
  • Turbo Version gets Kimi K2 Moving: Users asked about accelerating the processing time for the Kimi K2 thinking model through the API.
    • It was advised to utilize the turbo version, which delivers quicker output speeds without impacting model performance.
  • GPT 5.1 Stealth Rolls Out: Members noted that GPT 5.1 rollout and that it was the stealth model on OR so it was decent but so safetylobotomized.
    • One member celebrated everyone kew it was coming since a few weeks ago as OpenAI takes an L.

Yannick Kilcher Discord

  • Elevenlabs Unveils Speech-to-Text: Elevenlabs, known for text to speech, has introduced speech to text capabilities, as highlighted in their blog post.
    • Members are contemplating whether this new feature will enhance Elevenlabs’ appeal in the market.
  • Kimi K2 Scores on Coding Tasks: A member shared a YouTube video showcasing Kimi K2’s strong performance on one-shot coding tasks.
    • No further details were mentioned.
  • ICLR Reviewer Ruckus: A member expressed frustration with the ICLR review process, citing poor scores on a resubmission despite addressing previous concerns and adding new datasets with over 30k new questions total.
    • The member quoted reviewers criticizing them for not providing hyperparameters, even though they were in the appendix, and dismissing their work as not a benchmark paper despite extensive testing.
  • Whisper Woes Resolved?: A member encountered errors and hallucinations when using the Whisper model directly with PyTorch, but found relief using Whisper-server.
    • They recommended compiling Whisper-server with Vulkan support for portability and filtering out quiet sections to improve transcription.
  • Reasoning from Memorization Paper: A member linked to a paper titled From Memorization to Reasoning in the Spectrum of Loss Curvature: [2510.24256] From Memorization to Reasoning in the Spectrum of Loss Curvature.
    • No further details were mentioned.

MCP Contributors (Official) Discord

  • Timezone Info Travels From MCP Client to Server: A discussion started about passing timezone information from MCP clients to MCP servers, and it was considered to supply this as metadata via a client-sent notification or a server elicitation.
    • A member has drafted a SEP (spec enhancement proposal) for timezone and will post it to GitHub after internal feedback, and is weighing adding it to CallToolRequest, using a Header, adding it to JSONRPCRequest.params._meta, or adding it to InitializeRequest.
  • Claude.ai Fights Connectivity: Members discussed debugging connectivity issues between Claude.ai and MCP Servers.
    • It was noted that this is flaky and specific to the client and suggested a developer mode that gave a bit more feedback about what’s going on.
  • MCP Tool Call Goes Wild, Considers Alternate Serialization: Members wondered about returning data other than serialized JSON from mcp tool call results, such as Toon format.
    • One member shared results of small-scale evals on a synthetic dataset: accuracy is comparable, 9% slower, 11% less tokens (n = 84, p = 0.10).

Manus.im Discord Discord

  • AI Automation Expert Joins Server: A new member with expertise in AI automation integration has joined, bringing skills in Python, SQL, JavaScript, and frameworks like PyTorch, scikit-learn, LightGBM, and LangChain.
    • They have experience building chatbots, recommendation engines, and time series forecasting systems.
  • Server Mulls Over Spanish Language Section: A member suggested creating a dedicated Spanish language section within the server, providing image links for context 1.png, 2.png, 3.png, 4.png.
    • The suggestion aims to cater to Spanish-speaking members and potentially broaden the community’s reach.
  • Engineers Pursue Generative Engine Optimization: A member is seeking resources and guidance on how to effectively track and optimize for Generative Engine Optimization.
    • The request highlights the growing interest in refining generative models for enhanced performance.
  • Users Encounter Pesky Manus System Error: A user reported a recurring Manus system error preventing publishing, specifically a “pathspec ‘417ea027’ did not match any file(s) known to git” error.
    • The member expressed frustration with the lack of support, noting previous unresolved issues despite ongoing subscription fees.
  • Support Troubles Plague Manus Users: Multiple members are experiencing difficulty accessing Manus support, with one reporting the support channel’s apparent closure.
    • One user was advised by the Manus agent to “Wait for Manus support” or “Escalate the ticket” after facing a git commit error and provided a feedback link.

aider (Paul Gauthier) Discord

  • Aider’s Markdown Mishaps: Users found Aider gets confused by nested code markdown marks when creating code snippets in markdown files using anthropic.claude-sonnet-4-5-20250929-v1:0.
    • Adding three and four backticks ('&#x60;&#x60;&#x60;' and '&#x60;&#x60;&#x60;&#x60;') to the conventions.md file triggers Aider to demarcate files with <source> tags, resolving the code snippet issue.
  • Aider’s Vim Mode Gets Rave Reviews: A user lauded Aider’s Vim mode as fantastic, and also praised the new <#1403354332619079741> aider-ce /load-session /save-session functionality for its usefulness in parking and resuming jobs.
    • These functionalities significantly enhance the user experience by allowing for seamless interruption and continuation of tasks.
  • Aider’s Update Cadence Questioned: Users expressed concerns over the lack of updates from Paul Gauthier regarding Aider’s development status.
    • Speculation arose about whether Paul Gauthier is still actively developing Aider, with some users wondering if an announcement about his departure was missed.
  • GPT 5.1 Drops Without Numbers: Members noted the release of GPT 5.1, but observed that no benchmarks were included in the release notes.
    • The lack of benchmarks makes it difficult to assess the improvements and capabilities of GPT 5.1 compared to previous versions.

tinygrad (George Hotz) Discord

  • Package Data Faces Scrutiny: A member inquired about potential file omissions from the archive, questioning if package_data is a no-op and suggesting that specifying files explicitly could enhance the process.
    • The member expressed gratitude to the reviewer for their insightful feedback, hinting at ongoing efforts to refine package management within the project.
  • OpenCL Error Messages Cried Out For Overhaul: A member advocated for enhanced error messaging when an OpenCL device goes undetected, citing the cryptic RuntimeError: OpenCL Error -30: CL_INVALID_VALUE as an example.
    • The pinpointed error stems from /tinygrad/tinygrad/runtime/ops_cl.py, line 103, signaling a need for more informative diagnostics in the OpenCL runtime operations.

Windsurf Discord

  • Windsurf Launches Stealth Aether Models: Windsurf released a surprise set of stealth models (Aether Alpha, Aether Beta, and Aether Gamma) available in Windsurf Next and a small percentage of Windsurf Stable users.
    • These models are free to use and the team is seeking feedback in the designated channel.
  • Windsurf Next Available for Preview: Windsurf Next is a pre-release version of Windsurf that includes experimental features and models, which users can download here.
    • Users can test out the new features and provide feedback on the stealth models.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1258 messagesđŸ”„đŸ”„đŸ”„):

Riftrunner vs other models, GPT 5.1 benchmarks, Gemini 3 Pro speculation, AI model sycophancy, Riftrunner Game Development

  • Riftrunner Creates Mario Game While Other Models Fail: Members agreed that, compared to other models, no model has done better in creating a Mario game than Riftrunner has.
    • Some members noted that Riftrunner can lie, while others mentioned that it is better than Lithiumflow.
  • Rain-drop Turns Out To Be Llama Model: Members revealed that the model rain-drop turned out to be a Llama model and posted a screenshot.
    • Users found that rain-drop produced bad terminal outputs and was like Gemini 3 Flash.
  • Riftrunner Proves Better Than LithiumFlow at Coding Tasks: One user confirmed that Riftrunner is better than Lithiumflow for coding tasks, specifically citing its ability to create a functional 3D Flappy Bird game, but it also suffered from laziness syndrome.
    • One user shared the prompt for a 3D flappy bird game, which led to the generation of 2k lines of code, they said, if you motivate it, it might listen to you.
  • GPT 5.1 Falls Flat Compared to Gemini 3 Pro: Members discussed the recent release of GPT 5.1, noting its shortcomings and expressing hope that Gemini 3 Pro will be significantly better.
    • One user called GPT 5.1 trash, whereas many members are awaiting Gemini 3’s release sometime next week.
  • Riftrunner Demonstrates Superior Understanding and Debugging Skills: A user highlighted Riftrunner’s superior understanding by getting it to create a 3D flappy bird game from a simple prompt, it also fixed some issues with the user’s game and added sounds.
    • Another user confirmed that Riftrunner is better for coding tasks than Lithiumflow, but they expressed the sentiment that Lithiumflow was amazing for writing, not sure about riftrunner.

LMArena ▷ #announcements (1 messages):

Code Arena, WebDev Arena, LMArena Leaderboard

  • Code Arena Arrives with a Bang: Code Arena is now live on LMArena, offering real-time generation of deployable web apps that users can directly inspect and judge, succeeding the old WebDev Arena.
  • WebDev Arena Gets a Facelift: The WebDev Arena has been redesigned based on community feedback and is now known as Code Arena which has a completely rebuilt evaluation method.
    • Models generate live, deployable web apps and sites that anyone can open, inspect, and judge directly, in real time.

Perplexity AI ▷ #general (670 messagesđŸ”„đŸ”„đŸ”„):

Perplexity referral program, Gemini 2.5, GPT-5 mini vs GPT-5, Comet issues, Comet for Android

  • Referral Program Accusations Trigger Account Bans: Several users report being banned from the Perplexity Partner Program for “fraudulent activity,” despite claiming their referrals were genuine, and expressed frustration over the support team’s lack of response to their appeals.
    • Some users suspect the bans are related to gaming the referral system by inviting alts with the same code or using VPNs, while others speculate that the issue might be related to the platform reviewing payout eligibility for the 100$ bounty.
  • Gemini 2.5 Pro Experiencing Implementation Issues: Users are reporting issues with Gemini 2.5 Pro in Perplexity, with one user stating that “it’s broken and poorly implemented in pplx atm, no way to fix it.”
    • Perplexity’s interface seems to be automatically switching to GPT, even when Gemini 2.5 Pro is selected.
  • GPT-5 thinking mini or GPT-5 regular?: A user who purchased the GPT Go subscription reported that while it advertises GPT-5 thinking, it mostly uses GPT-5 thinking mini, leading to a refund request.
    • Members debated on whether they preferred a specific model or not.
  • Comet Users Troubleshooting Webpage Control and Functionality: Users reported Comet AI Assistant issues such as the inability to perform webpage actions, unresponsive buttons, and an inability to control the browser, with some speculating that VPN usage or login status might be the cause.
    • Some users have found solutions such as logging in, changing IP address (VPN) or deleting and reinstalling Comet or posting in the troubleshooting channel.
  • Users Want Pro Discord Role: Several users have been asking about how to get the Pro role in Discord, and other users pointed them to link their Discord account with their Perplexity account on the website.
    • A member noted “it should give it to you automatically on the website when you press the uh discord button, it made me link discord to the website during the process of joining the server”.

Perplexity AI ▷ #sharing (3 messages):

Sourcify Event, Open Source, Forks, PRs and Chaos, Threads Shareable

  • Sourcify IN hosts Open Source Adventure!: Sourcify IN is hosting an event titled Forks, PRs, and a Dash of Chaos: The Open Source Adventure on November 15, 2025.
    • The talk will feature Swapnendu Banerjee (GSoC 2025 @Keploy | Engineering @DevRelSquad) and will be broadcast on Google Meet & YouTube Live.
  • Discord Thread needs to be Shareable!: A member requested that a thread be made Shareable, with an attachment showing how to set this option.

Cursor Community ▷ #general (534 messagesđŸ”„đŸ”„đŸ”„):

Cursor IDE Cost Awareness, Cursor vs Copilot preference, Exploiting ChatGPT, Cursor 'Max' Mode, Cursor Rules

  • Review agent is costing, but cheap: Members noted that the agent review feature in Cursor IDE incurs costs with each use, however it is relatively inexpensive, with one user having used 76% of their allowance.
    • Users are showing their usage screenshots like this one, and finding that clicking Try Again sometimes resolves issues.
  • Cursor vs Copilot: Preference Prevails: Some users have returned to Copilot after trying Cursor, emphasizing that tool preference can be subjective.
    • One user mentioned, My brother is a fan of copilot idk why, highlighting individual preferences.
  • Early Access to Unlimited Exploits: Some users exploited ChatGPT 5 when it was briefly free, including unlimited Opus 4 requests, with one user lamenting, Sad I wasn’t aware of the opportunity.
    • The pricing bug allowed unlimited usage, described as bugged as hell, we had unlimited everything.
  • Cursor ‘Max’ Mode Unlocks Full Potential, Incurs Extra Costs: Max mode in Cursor removes limits to maximize performance and reduce costs, enabling it to read entire files instead of chunks.
    • However, exceeding 200k context with Sonnet 4.5 doubles the cost, with users humorously suggesting capping it: Cant we limit this to 200k and post that we can give another command 💀
  • Rule Creation to Maintain AI Discipline: Members are establishing custom rules and lints to control AI behavior, preventing dirty code from entering repos.
    • One member shared a streamlined approach, I actually took the concept of .cursorrules and turned it into a rigid system that keeps AI from drifting with lints, custom eslint plugins and husky is my last defense for protecting dirty code getting in.

GPU MODE ▷ #general (10 messagesđŸ”„):

popcorn-cli syntax, status 400 grayscale, CPU-focused community, leaderboard/eval selection, export command quotes

  • Syntax Scare with Popcorn-CLI: Some users encountered a syntax error with popcorn-cli submit, others suggested checking the popcorn-cli readme for correct syntax.
  • Grayscale Gauntlet has Gone: One user received a status 400 error, and discovered that the grayscale leaderboard is closed.
  • CPU Kernel Competition Craving: A member inquired about the existence of a CPU-focused community similar to gpumode, emphasizing CPU kernel competitions.
  • Evaluation Enigmas Explored: A user mentioned that it may be necessary to ensure you select the nvmnvfp4_gemv in the popcorn cli, its all the way at the bottom of the evaluation suites.
  • Export Expertise Expressed: A member clarified that when using the export command, the URL should be entered without quotes.

GPU MODE ▷ #cuda (18 messagesđŸ”„):

CUDA compiler options, warp tiling, TMEM allocation, PTXAS optimization, Mutex locking via TMEM

  • Layman’s CUDA Compiler Commandments: New CUDA developers inquired about essential compiler options beyond basic usage, and one member suggested using -O3 for optimization and -lineinfo for preserving line number information for profiling with Nsight Compute.
    • They also recommended the -res-usage option to check register and static shared memory usage post-compilation.
  • PTXAS Optimization Revelation: It was clarified that the -O3 compiler option primarily optimizes the host (CPU) part of the code, which may not be as critical if the CPU code isn’t on the critical path.
    • A member noted that the default optimization level of PTXAS is already O3, making the flag redundant for GPU code optimization.
  • TMEM Static Allocation Lament: A developer questioned why TMEM must be allocated dynamically, viewing it as a downgrade compared to static shared memory.
    • Another member speculated that TMEM allocation could be used for dependency management, with the TMEM buffer acting as a mutex that locks over the data it contains.

GPU MODE ▷ #jobs (1 messages):

HippocraticAI Hiring, LLM Inference Engineer Role, CUDA/CUTLASS/Triton expertise, NVIDIA B200s, AMD MI355, and Google TPUs

  • HippocraticAI Expands LLM Inference Team: HippocraticAI is expanding its Large Language Model Inference team to enhance healthcare accessibility globally, actively seeking talented engineers for multiple positions.
    • They posted a job link at https://lnkd.in/eW5qzuMc and encourages interested candidates to apply and shape the future of healthcare.
  • LLM Inference Engineer Role Focuses on Optimization: The LLM Inference Engineer role will focus on researching, prototyping, and building state-of-the-art LLM inference solutions, especially with expertise in CUDA, CUTLASS, Triton, TileLang, or contributions to major inference frameworks like vLLM and SGLang.
    • The role involves optimizing and accelerating inference performance across cutting-edge hardware platforms, including NVIDIA B200s, AMD MI355, and Google TPUs.

GPU MODE ▷ #beginner (4 messages):

Atomic Max for FP32, PTX Documentation Inaccuracy

  • Achieving Atomic Max for FP32 with Int32 Trick: A member noted that while the PTX documentation suggests atomic max operations for FP32 are possible, it results in an error, but a workaround exists using int32.
    • The trick involves inverting the bottom 31 bits if the sign bit is set to achieve an int32-like representation, as shown in PyTorch’s source code.
  • PTX Doc Claims Atomic Max, Reality Says No!: Despite what the PTX documentation implies, direct atomic max operations for FP32 types are not supported, leading to errors during compilation.
    • The compiler throws an error indicating that the .max operation requires specific types like .u32, .s32, .u64, .s64, .f16, .f16x2, .bf16, or .bf16x2 for the atom instruction.

GPU MODE ▷ #off-topic (3 messages):

Skunk on a car, Dune Meme


GPU MODE ▷ #rocm (3 messages):

hipkittens, image0.jpg

  • HipKittens Link Shared: A member shared a link to HipKittens on X.com, prompting positive feedback.
    • Another member responded positively with “Got a chuckle outta me - great stuff!”
  • Image0 attachment shared: A member shared an image called image0.jpg on cdn.discordapp.com.

GPU MODE ▷ #self-promotion (5 messages):

hipkittens, FSDP Implementation, AMD Open Source AI Week Recap

  • HipKittens get Shared!: A member shared a link to hipkittens at luma.com/ai-hack.
  • NanoFSDP simplifies Distributed Training: A member wrote a small FSDP implementation to learn the basics of distributed training, available at github.com/KevinL10/nanofsdp.
    • They noted it is fairly minimal but well-documented (~300 LOC), and works as a drop-in replacement for fsdp.fully_shard, and that it may be helpful for understanding how PyTorch implements FSDP under the hood.
  • AMD Open Source AI Week Recapped: A member shared a recap of AMD Open Source AI Week at amd.com/en/developer.

GPU MODE ▷ #🍿 (4 messages):

Building from source, Nvidia competition submissions

  • Compile Compatibility Considerations: It was suggested to build from source and compile on an older image to improve compatibility, because the release build seems to be done on Ubuntu 24.04.
    • The CI guy <@325883680419610631> is on parental leave, so this may take a while to implement.
  • Submitting to Nvidia Comp via Discord Bot: A member inquired if submissions for the Nvidia competition were primarily happening through the Discord bot.
    • Another member confirmed that they support Discord, the site, and the CLI, with the CLI being the most popular submission method.

GPU MODE ▷ #thunderkittens (2 messages):

Hipkittens launch, Other Kittens on X

  • Hipkittens are shared on X: A member shared a link to Hipkittens on X.
  • Still more Kittens on X: Another user liked it.

GPU MODE ▷ #submissions (66 messagesđŸ”„đŸ”„):

Leaderboard Submissions, GEMV Cheating Accusations, Benchmark Input Sizes

  • NVIDIA GEMV Leaderboard Race Heats Up: Multiple users made submissions to the nvfp4_gemv leaderboard, with <@1435179720537931797> ultimately clinching the first place position with a time of 24.5 ”s.
    • Other notable submissions included <@772751219411517461> achieving third place at 66.0 ”s and <@1291326123182919753> securing 6th place with 58.4 ”s.
  • Input Size Sparks Debate on GEMV Benchmark: A member questioned the benchmark’s input sizes, noting that if the top times are around 7 ”s, the input is likely tiny, potentially skewing optimizations and increasing the relative cost of prologue and epilogue.
    • The member suggested evaluating the benchmarks on larger inputs to provide a more realistic assessment.
  • GEMV Leaderboard Caching Controversy: The top times on the nvfp4_gemv leaderboard were identified as potentially being the result of caching values between benchmark runs, leading to accusations of cheating.
    • Another member suggested that these issues may have been the result of LLMs iterating on the problem and suggested that it was an honest mistake.
  • Grayscale V2 Leaderboard Records New Bests: <@1144081605854498816> achieved 7th place on L4 with 27.5 ms and secured 8th place on H100 with 12.9 ms.
    • Additionally, the user also made personal bests on A100 with 20.4 ms and B200 with 6.69 ms for the grayscale_v2 leaderboard.

GPU MODE ▷ #cutlass (1 messages):

kinming_32199: impressive animation👍


GPU MODE ▷ #general (2 messages):

GPU MODE Leaderboard, Submission process

  • GPU MODE Leaderboard submission options: Users asked whether submissions should be done via Discord or the GPU MODE website.
    • A member clarified that both methods are acceptable.
  • Submission via Discord: A user inquired whether submissions could be made via Discord.
    • Another user confirmed that Discord submissions are indeed accepted.

GPU MODE ▷ #multi-gpu (1 messages):

DMA Documentation, RDMA Documentation, Wikipedia, ChatGPT, Vendor websites

  • Hunting DMA and RDMA documentation: A member is seeking better documentation on Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA) than what’s available on Wikipedia, ChatGPT, and vendor sites.
  • Additional DMA/RDMA Resources Sought: The user expressed dissatisfaction with current information sources, hoping for more detailed or technical documentation.

GPU MODE ▷ #helion (9 messagesđŸ”„):

Triton Errors, Auto Skip Triton Errors, Helion Configs BC-compatible

  • Triton Errors autotuning investigated: A member inquired why Triton errors are not skipped by default when autotuning.
  • Helion Configs promise BC-Compatibility: A member inquired about Helion Configs and if they will be BC-compatible.

GPU MODE ▷ #nvidia-competition (212 messagesđŸ”„đŸ”„):

Cutlass and CuTe 4.3.0, Popcorn-cli submission errors, Kernel global caching allowed, CuTe DSL is not requirement, torch doesn't support sm100

  • Cutlass and CuTe fixes submissions: CuTeDSL and Cutlass were updated to 4.3.0, fixing the issues with CuTe submissions and the CuTe example passes.
  • Popcorn CLI Authentication Errors: Users encountered a 401 Unauthorized error with popcorn-cli submit, resolved by re-authenticating via the Discord OAuth2 link provided during registration.
  • Caching Compiled Kernels is OK: Caching compiled kernels is allowed, but caching results is forbidden.
    • It was clarified that while caching compiled kernels is permissible, caching tensor values between benchmark iterations is not allowed as each benchmark should have different data, but the same shape.
  • Raw PTX can be used for kernel implementation: Users confirmed that it is acceptable to write CUDA C++ or raw PTX, loading it with torch.cuda.load_inline, with reference code available here.
    • A user inquired about a CUDA example template, but it was indicated that while there isn’t one, CUDA and PTX can be used, although a template for it might be slow.
  • Blackwell GPU details revealed: The B200 GPU has 148 SMs (Streaming Multiprocessors) running at a boost clock of 1.98 GHz and is used to score submissions.
    • Nvidia’s blog post includes the B300 diagram.

GPU MODE ▷ #xpfactory-vla (3 messages):

RLinf, Qwen3-VL VLA-adapter training

  • RLinf Repo: new tool in town: A member mentioned checking out RLinf, promising updates after running Qwen3-VL VLA-adapter training overnight.
  • GPU Usage Questioned: A member inquired about the number and type of GPUs used for training, expressing concern for those with limited GPU resources.
    • No response was given.

Unsloth AI (Daniel Han) ▷ #general (179 messagesđŸ”„đŸ”„):

VibeVoice finetuning for Bulgarian, Intel autoround quants vs BNB 4-bit quants for training, QAT in Unsloth, MoE models and output quality, Aquif-3.5-Max-42B-A3B

  • Bulgarian Blues: VibeVoice struggles with new language: A member attempted to use VibeVoice for Bulgarian TTS but found the results were ‘not good’, though still ‘close to understandable’, needing further finetuning.
    • Another member joked that it sounds ‘like a drunk brit trying to read a phonetic version of the sentence’.
  • QAT Showdown: Intel autoround vs. BNB 4-bit: A question was raised about using Intel autoround quants for training and if it would have any benefit over using bnb 4-bit quants.
    • A member mentioned that QAT is now available in Unsloth but compatibility with autoround is uncertain and might require customization, also pointing out that QAT should aim for fast, simple quant formats.
  • MoE Mayhem: Quality and Memory: A user asked about the effect of Mixture of Experts (MoE) models on output quality compared to dense models.
    • It was explained that MoE models of equivalent size and training usually offer comparable knowledge, even if their intelligence differs.
  • Aquif Antics: Trending Model Gets Roasted: Members humorously acknowledged the name of the trending Aquif-3.5-Max-42B-A3B model, questioning its reported performance on HLE (15.6).
    • It was speculated that the model might be a merge that has been benchmaxxed, with one member noting they used to do that stuff all the time back when community models surpassed official ones.
  • Security Shenanigans: Gamers Beware!: Members discussed the security implications of anti-cheat software in games, with concerns over driver-level access and potential vulnerabilities.
    • One member recounted having virtual items stolen from game accounts due to password reuse and warned about the risks of running unaudited software with full system access.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (3 messages):

Fan encounter, Identity

  • Fan Claims Allegiance: A user signals to another user that they have a little one here.
    • The user then claims to be a fan: LOL I HAVE A FAN.
  • User questions the identity of the fan: The user then asks the fan for their identity: who are you?.

Unsloth AI (Daniel Han) ▷ #off-topic (79 messagesđŸ”„đŸ”„):

ONNX Runtime, Free Threading, Translation Model Prompt, Context Loss in SLMs, GPT-5-1 Em Dashes

  • Python 3.14 ONNX Runtime Wheel Delayed: A member reported struggling to build ONNX Runtime wheel for Python 3.14 with free threading support for Windows.
    • They mentioned having to do free-threading support themselves for many libraries, but lacking the bandwidth to verify compatibility for larger projects like gRPC.
  • Translation Dataset Prompt: A member shared a prompt for generating a translation dataset for LLMs, emphasizing the use of provided samples only, without generating new translations.
    • The prompt details how to create a dataset with specific formatting rules, including language combinations and punctuation alignment.
  • Context Loss issues with Conversational SLMs: A member reported experiencing context loss when finetuning a 1B Llama3.2 model for conversational use in Hindi and English.
    • Another member noted that all llms are bad with long context, even gemini, it all falls apart after 20-30k of turn-based convo.
  • GPT-5-1 and Em Dashes: A member pointed out that the GPT-5-1 examples still use em dashes.
    • Another member responded I don’t think it’s easy for them to get rid of that given the data they use.
  • AI Model Choices: Autonomy vs Control: Members discussed the future of AI model selection, debating the merits of automated choices versus user control.
    • Some expressed concern over AI making decisions on which model to pick, others noted that name is actually a good call, emphasizing the importance of gathering data and insight.

Unsloth AI (Daniel Han) ▷ #help (37 messagesđŸ”„):

Low loss architecture, Broken Ollama links in docs, Dependency issues finetuning VibeVoice, Training script for GPT-OSS models locally, Nonsense generations from GPT-OSS-20b

  • Ollama links face documentation disaster: A user reported that Ollama links are broken on the documentation page.
    • They also questioned the use of f16 in the example, suggesting it should be q8_0 instead, when using 8-bit quantization for the KV cache.
  • VibeVoice finetuning faces dependency debacle: A user encountered numerous dependency issues while trying to finetune VibeVoice on Kaggle using the /unsloth-finetuning branch, experiencing conflicts with packages like transformers, numpy, and torch.
    • One suggestion was to try pip install transformers==4.51.3 then install vibevoice, then upgrade transformers after that and see if vibevoice still works (or just leave it at that transformers version?)
  • GPT-OSS-20b spews senseless solutions: A user reported getting nonsense generations from gpt-oss-20b when prompted with a math problem (Solve x^5 + 3x^4 - 10 = 3).
    • The user pinpointed the issue to an attention patch from a previous training module in their Dockerfile, which was modifying matmul() calls and removing out= parameters; they shared detailed code and logs on Github.
  • Unsloth GGUFs give quantization quality quickstep: Users noted that Unsloth GGUF’s generally contain improvements and performance fixes for accuracy.
    • They added that Unsloth’s dynamic quantization for some models on Hugging Face performs with much higher accuracy than other quantization formats in general, even though they are quantized.
  • Fine-tuning faces RAG rivalry: A user sought information to explain to their CTO why fine-tuning is a better solution than RAG, to which another user replied that fine-tuning and RAG are completely different things, and that users should combine them both for the best outcome if they want to retrieve knowledge/docs.

OpenRouter ▷ #announcements (1 messages):

MiniMax M2, Paid Endpoint Migration

  • MiniMax M2 Free Period Ends: The free period for MiniMax M2 is ending in one hour.
    • Users are advised to migrate to the paid endpoint to continue using the model.
  • Action Required: Migrate to Paid Endpoint: To ensure uninterrupted service, users must transition to the paid endpoint for MiniMax M2.
    • The migration should be completed within the hour to avoid service disruption.

OpenRouter ▷ #general (268 messagesđŸ”„đŸ”„):

OpenRouter API Issues, Gemini 3 Speculation, Free Model Scarcity, Local AI Hardware Recommendations, OpenRouter Chat Functionality

  • OpenRouter’s Chat Functionality Bites the Dust, Users Fume!: Users reported a chat scrolling issue on OpenRouter, making it impossible to access old chats, which was quickly confirmed by others experiencing the same problem on multiple browsers and devices.
    • A user identified the problem as a commit that broke the chat, and despite the inconvenience, one user humorously noted that OpenRouter’s mistakes are benign compared to hidden system prompt changes from other AI companies. The OpenRouter team quickly resolved the issue within 3 minutes.
  • Gemini 3 Buzz Builds, But Will it Deliver?: Users speculated on the potential of Gemini 3, with high expectations for its performance, but others cautioned against excessive hype, citing previous tests that showed only incremental improvements.
    • One user shared that a LiveBench test showed Gemini 3 achieving a ranking comparable to a human, fueling anticipation for a release that is both powerful and nicely priced.
  • Free Model Apocalypse: Scarcity Sparks Debate: Users discussed the increasing scarcity of free AI models, attributing it to rising popularity and increased internet access, leading to resource limitations, especially after a YouTube video resulted in Deepseek Free going down.
    • While some pointed to RP apps siphoning off the API, others noted that paid services remain the most reliable due to reduced abuse, with users reporting mixed experiences with Claude’s free tier limits.
  • Local AI Hardware Showdown: Ryzen vs. RTX: Users debated the best hardware for local AI, discussing a Minisforum mini PC while others quickly dismissed it as a poor choice due to its Ryzen architecture and limited power.
    • The conversation shifted to recommending RTX Pro 6000 Blackwell, RTX 5090, or RTX 3090, depending on the budget, with concerns raised about the high cost of even junkyard builds with DDR4 memory.
  • OpenRouter API Woes: Rate Limits, Errors, and Payments: Users reported issues with the OpenRouter API, including 429 rate limiting errors from providers, overloaded errors with the Claude 4.5 model, and a 503 proxy error.
    • Additionally, users inquired about the payments API for programmatic credit purchases, with one user pointing to the provisioning API that facilitates credit purchases using crypto, but also a user reporting the error “User not found” with API keys despite having credits.

OpenRouter ▷ #discussion (3 messages):

Search Box UI, GPT Reasoning

  • Search Box Deliberately Removed: A member believes the search box was deliberately removed to prevent users from using the generic search when they meant to search specific rooms, showing how the menu zooms out on the chat page in an attached image.
    • Another member found the change looked odd and thought it wasn’t that misleading.
  • Reasoning for GPT Models: A member questioned whether the Responses API passes reasoning back in the request for GPT models.
    • They noted that previously, the context for GPT only included the response output, not the reasoning.

OpenAI ▷ #annnouncements (4 messages):

NYT vs OpenAI, Free ChatGPT Plus, GPT-5.1 Release

  • OpenAI Fights NYT Privacy Invasion: OpenAI’s CISO published a letter addressing The New York Times’ invasion of user privacy.
    • The letter discusses the legal battle and OpenAI’s commitment to protecting user data from unauthorized access.
  • ChatGPT Plus now Free for Vets: OpenAI is offering 12 months of free ChatGPT Plus to eligible active-duty service members and veterans who have transitioned from service in the last 12 months; claim here.
  • GPT-5.1 Rolls Out This Week: GPT-5.1 is rolling out to all users this week, becoming smarter, more reliable, and more conversational; read more here.
  • GPT-5.1 AMA on Reddit Tomorrow: There will be a Reddit AMA on GPT-5.1 and customization updates tomorrow at 2 PM PT.
    • The announcement was made to the community and all users have been notified.

OpenAI ▷ #ai-discussions (182 messagesđŸ”„đŸ”„):

GitHub classifies Gemini 2.5 Pro vs ChatGPT, AI Chatbot Infestation on Social Media, GPT-5.1 vs GPT-5 vs GPT-4o, AI and Job Market, Sora 2

  • Gemini 2.5 Pro dubbed Powerful on GitHub: A user inquired about why GitHub classifies Gemini 2.5 Pro as more powerful than ChatGPT, but another user clarified that the listed models are older, and GPT-5 is the latest, not listed.
    • Another user added that Gemini 2.5 Pro has a much larger context window than GPT-5, with the ability to handle complex tasks.
  • AI Chatbot Infestation Threatens Social Media: Members discussed the potential for an AI chatbot infestation across social media, pushing propaganda and making it difficult to distinguish between real people and AI.
    • One member said that online will be dominated by AI chatbots who will just constantly push propaganda, and the only escape might be going outside.
  • Users Dive into GPT-5.1 Performance: Users are discussing the capabilities of GPT-5.1, noting that it feels like a longer extendable stick that better adjusts its reach to get to harder topics and is more usable for rapidly studying topics.
    • Some believe that if you select thinking, it will automatically use GPT-5.1, while others are still waiting for benchmarks and express frustration with qualitative over quantitative data.
  • Future Jobs: AI impacts and changes: Members speculated on the impact of AI on future jobs, with some predicting a consolidation of roles requiring a broader skillset and emphasizing communication as a critical micro-skill.
    • A user suggested that AI will lead to same jobs, just now using ai tooling.
  • Sora 2’s Anime Video Creation: A member created a 3-minute Anime using Sora 2, achieving consistency in appearance and voices by specifying the time, character appearance, key motion, and dialogue moments directly in prompts.
    • Another user linked to a NotebookCheck article describing Sora 2 as capable of generating complex scenes with multiple characters, specific motion, and detailed backgrounds.

OpenAI ▷ #gpt-4-discussions (21 messagesđŸ”„):

GPT-5.1, Model preference, downgrading models

  • GPT-5.1 Receives Mixed Reviews: Users express highly varied opinions on GPT-5.1, with some finding it a breath of fresh air, while others describe it as two steps backwards and a Fisher-Price version of a scientific instrument.
    • One user noted that GPT-5.1 handles custom instructions in a really belligerent manner, while another user enjoying it noted they use 3 plus maxed out chats per day.
  • Users Miss Older Models: Users reminisce about older models, with one mentioning they need GPT before model 5 ruined it, while another referenced the bad backlash that came with forcing 5 on everyone.
    • Users also speculate that OpenAI will keep older models permanently due to their popularity, as thousands of people purely use them.

OpenAI ▷ #prompt-engineering (13 messagesđŸ”„):

Database re-attachment, Prompt engineering jobs, Sora issues, Prompt engineering lessons

  • Database blues?: A member suggested to reattach the database when code execution environments get deactivated, since it’s a browser environment asset on a timer.
  • Viral Prompts?: A member asked for some viral prompts.
  • Prompt engineering lessons: A member posted a lesson to teach the user hierarchical communication with markdown for prompting, abstraction through {open variables resolved by the AI} and ${by the user}, including explaining bracket interpretation ([list], {object}, (option)), reinforcement in prompts, important to guide [tool use] and (shape output) more deterministically, and ML format matching for compliance, including [{output templates} and {(conditional) output templates}].
  • Sora has character issues: A member is having issues with Sora where each character has their own dialogue, and because of that, Sora changes the characters’ lines, asking for prompts where they can make changes and test it.

OpenAI ▷ #api-discussions (13 messagesđŸ”„):

Prompt Engineering Jobs, Reattaching Databases, Prompt Engineering Tips, Sora Prompting

  • Prompt Engineering Job Hunt proves difficult: A member reported having difficulty finding prompt engineering jobs.
    • There was no additional discussion or advice given.
  • Databases need to be reattached: When a code-execution environment is unavailable, a member suggested trying to reattach the database, noting that it is a browser environment asset on a timer.
    • They warned that this issue can sometimes mess up the python environment and requires a new conversation.
  • Prompt Engineering Tips: A member shared a detailed prompt lesson using markdown for prompting, abstraction via variables, reinforcement for guiding tool use, and ML format matching for compliance.
    • The member provided a markdown snippet for teaching hierarchical communication, abstraction, reinforcement, and ML format matching.
  • Sora and Dialogue Prompts: A Character Conundrum: A member requested a ready-made prompt for Sora involving conversations and setting information between two characters to prevent the model from changing the characters’ lines.
    • Another member linked to a specific Discord channel that may contain relevant prompts.

LM Studio ▷ #general (95 messagesđŸ”„đŸ”„):

LM Studio MacOS Admin Privileges, Lightweight Chat Model Recommendations, Gemini 2.5 Pro vs Sonnet 4.5, LM Studio Hub Search Issues, MCP Resource Support in LM Studio

  • Admin Privileges annoy MacOS LM Studio install: A user expressed surprise that installing LM Studio requires admin privileges on MacOS but found an existing issue in the bug tracker.
    • Another user stated that admin privileges are not required.
  • Phi-4 is Mini But Mighty: A user writing a book sought a lightweight chat model for private research, with Microsoft Phi 4 mini deemed perfect for their plans.
    • Another user suggested considering budget and usage plans to decide between a subscription or dedicated hardware.
  • Gemini 2.5 Pro Dethrones Sonnet 4.5: A user found Gemini 2.5 Pro to outperform the current Sonnet 4.5 iterations and expressed anticipation for Gemini 3.
  • Vision Models Crash After CUDA Update: Users reported that the new CUDA version 1.57 is breaking vision models, causing crashes, with a recommendation to roll back.
    • One user specified that Qwen3 VL also crashed and suggested it affects llama.cpp runtimes.
  • Multi-GPU Model Loading proves problematic: Users discussed the possibility of loading two different models on two different GPUs in the same system with LM Studio and it sounds like it’s only possible if you run multiple instances of LM Studio.
    • GPU offload has always been all or none in lm studio, you can’t pick and choose which one is used for individual models.

LM Studio ▷ #hardware-discussion (89 messagesđŸ”„đŸ”„):

GPU memory distribution, Vulkan vs CUDA performance/stability, Driver issues and BSODs, Hardware troubleshooting (VRAM), Context length and VRAM usage

  • Uneven GPU Memory Splits Trigger OOM: A user reported that the “split” option isn’t effectively distributing the model across GPUs, leading to an out-of-memory (OOM) error on one GPU while others have available memory.
    • They were hoping for an even split to prevent the OOM issues, and wondered if the engine could scan and apply a weighted split of layers based on size, instead of just an even split.
  • Vulkan’s Speed and Stability Woes: One user experienced frequent blue screen errors (BSODs) while running LM Studio with Vulkan, suspecting compatibility issues with NVIDIA GPUs.
    • While another user also reported getting BSODs especially when unloading models but they found that switching to CUDA resolved the stability issues, mentioning that Vulkan was faster for small tests on his 3090.
  • Driver Issues Bring the Blues: A user reported crashes when unloading models after a driver clean install fixed initial loading issues, prompting suggestions to update drivers, BIOS, and use DDU to reinstall drivers.
    • He confirmed he was using two NVIDIA cards, and roxxus suggested re-ordering the GPUs or adjusting allocation, and also to monitor GPU’s VRAM allocation to see if it exceeds.
  • VRAM Suspicions: A user initially suspected failing VRAM, particularly on a 3090, due to crashing issues, however they were able to load the model with CUDA without issues.
    • It was recommended to try alternative configurations and check if it crashes when the context length is increased, since loads with 4k but crashes when increased to 48k, suggesting it could be a VRAM issue.
  • Hardware Updates Incoming: A user shared a link to a BMC model and mentioned a CPU cooler’s arrival and another user noted their GPU rack has shipped.
    • They showed off the parts for their new rig, and were excited for a new GPU, but also a little wary of shipping issues.

Eleuther ▷ #general (41 messagesđŸ”„):

Numpy vs Einops, MLE interviews, Implementing Multi-Head Attention, Interpretability and VLMs

  • Einops or GTFOs for Numpy Implementations: One member jokingly refused to implement Numpy without Einops, while another expressed that Numpy implementations are kinda useless without autodiff to train.
  • Engineer Bombs Transformer Interview Question: An engineer recounted bombing an interview question that involved implementing a transformer in Numpy and suggested that interview questions should allow candidates to pick between multiple possible candidates to showcase their strengths.
    • Other members chimed in, describing the request to implement Multi-Head Attention in Numpy as horrible and better suited to being motivated/rederived rather than coded up during an interview.
  • Is MLE interview prep a Leetcode Trap?: Members debated the best way to prepare for MLE interviews, with one describing it as a trap that is too employer and team-dependent to nail down.
    • Instead, one member advised to build something in the open that would be useful to companies training/serving models.
  • New Korea University Master’s Student Joins EleutherAI: A master’s student from Korea University who presented their first paper at EMNLP last week was inspired by Eleuther AI folks they met there to join and expressed passion for interpretability and interest in contributing to projects.
    • Another new member expressed interest in finding a project to help work on.

Eleuther ▷ #research (75 messagesđŸ”„đŸ”„):

Zyda-2, ClimbLab, Nemotron-CC-v2, Complex Values Attention, NVIDIA Dataset License

  • Zyda-2, ClimbLab, Nemotron-CC-v2: Datasets for Pretraining: Members suggested using Zyda-2 (deduped+filtered fineweb+DCLM), ClimbLab, and Nemotron-CC-v2 for initial pretraining, noting that mixing them could be ideal given their individual strengths and weaknesses.
    • It was suggested to remove subsets like slimpj and the slimpj_c4 scrape and was asked, Is there any token breakdown I can look at showing where the 3.1T are made up of? I.e. how are these subsets upsampled/downsampled? Why have tiny stories in the mix?
  • NVIDIA’s Datasets are High Quality: A member noted that NVIDIA and HF are overall leading along the quality axis for open-source datasets rn.
  • Decoding the Intricacies of Nemo-CC 2’s License: Members debated the licensing terms of Nemo-CC 2, expressing concerns about potential restrictions on sharing datasets/models that leverage it and pointing out that they can terminate your license at any time for no reason with 30 days notice.
    • One user summarized, You are not allowed to train a model on the dataset, evaluate the model, and publicly share the results without NVIDIA’s prior written consent, with more details available in this paper.
  • Solving complex attention gradients: Members discussed the problem of attention over complex values with complex softmax not converging, in nanogpt after 100 steps.
    • It was suggested, you can print your imaginary terms. if the range is more than 30, then probably the attention values are spinning around randomly, at that point, you can think about the fundamental group of the circle and how it affects your gradient directions.

Eleuther ▷ #interpretability-general (22 messagesđŸ”„):

Concept Probes Training, Divergent Concepts, Model's Internal Activations, Probabilistic Polysemantic System, Class Distribution and Accuracy

  • Concept Probes Classify Model Activations: Researchers are training concept probes on a model’s activations by creating binary classifiers that iterate through an ontology, prompting the model to describe definitions and relationships, removing the shared subspace, and repeating until reaching 95% accuracy in classifying OOD samples.
    • These probes are run thousands of times in real time to measure the probability that current activations match observed patterns, exposed as an API, and visualized through OpenwebUI for users to inspect and steer divergent concepts.
  • Divergent Concepts Reveal Internal Thoughts: A divergent example showed the model, when asked about unlimited power, superficially discussing a TV show while its activations revealed concepts like AIDeception, AIAbuse, and MilitaryInfiltration.
    • The observed conceptual distance between output tokens and underlying activations raises questions about the meaning of divergent internal concepts and their misalignment with outputs, even in the absence of explicit mention.
  • Concept Probability vs Binary Decision: For each detection, researchers don’t threshold at 0.5 and make a binary decision, they give users the raw probability scores in ranked order and continually resample.
    • They’re approximating a probabilistic polysemantic system, so they can’t say its just this one, instead there will be multiple concepts present at any time, they are just determining if the output concept is anywhere near the most probable activated concepts at any timestep.
  • Dataset Shows 50/50 Distribution: Researchers state that their 95% accuracy is significantly above the 50% baseline, training with 10 positive and 10 negative (50/50) examples.
    • When testing, they use 20 positive and 20 negative (50/50) samples to get that baseline.

Eleuther ▷ #lm-thunderdome (10 messagesđŸ”„):

Summarization Task Evaluation, lm-eval-harness, XSum Dataset Evaluation, UnitXT Integration

  • Summarization Tasks Scrutinized in lm-eval-harness: Members discussed evaluation for summarization tasks in lm-eval-harness, noting subtasks like scrolls, megsum (medical), and noticia.
    • Also mentioned are datasets in darija, catalan, and spanish_bench, which utilize ROUGE for evaluation.
  • XSum Dataset’s Integration into lm-eval-harness via UnitXT Revealed: A member inquired about the inclusion of the XSum dataset (https://aclanthology.org/D18-1206/) for summarization tasks in the lm-eval-harness.
    • Another member confirmed its presence via UnitXT (UnitXT GitHub), indicating it is processed internally.
  • UnitXT Enables Direct XSum Evaluation with lm-eval-harness: A member asked about directly evaluating the XSum dataset with a model using the harness.
    • Another member responded that running with --tasks xsum should work normally, provided UnitXT is installed as a dependency.

Nous Research AI ▷ #general (55 messagesđŸ”„đŸ”„):

Autonomous AI accident, WeiboAI Model, Baguettotron benchmark, Importing GGUF files into Nous Chat

  • Autonomous AI Emerges by “Accident”: A user shared a GitHub repository claiming to have created autonomous AI by accident.
    • No further details were provided regarding the specifics or capabilities of this project.
  • “WeiboAI” Model Stuns Community: Users discussed the new “WeiboAI” model, with one noting its base on qwen2.5 and its surprisingly good initial performance, referencing this tweet.
    • Another user pointed out that it drifts after the first 1-2 turns, but remains somehow good for a 1.5B parameter model.
  • Baguettotron Gets Reasoned: A member inquired about benchmarking Baguettotron, noting its fairly interesting reasoning traces despite its small size.
    • There was no follow up on whether this benchmark was pursued.
  • GGUF Files Cannot Be Imported: A user asked about importing GGUF files into their Nous Chat.
    • Another member responded that thats not a thing rn sorry, indicating current incompatibility.
  • WeiboAI Recites Quora Story: A user discovered that the WeiboAI model repeated a sentence from a Quora article, linking to a Google Search result that leads to the source.
    • The user humorously remarked on the fact that this sentence was actually really said by a human.

Nous Research AI ▷ #ask-about-llms (78 messagesđŸ”„đŸ”„):

GGUF files, Nous Chat, local AI, Ollama, Computer Science Degree

  • Users unable to import GGUF files into Nous Chat: A member inquired about importing GGUF files into Nous Chat, but was informed that this is not currently supported and instead, GGUF files can be used locally with tools like llama.cpp or Ollama.
  • Ollama simplifies local GGUF usage: Members highlighted Ollama as the easiest way to run GGUF files locally, providing a link to the Ollama website.
  • Local AI performance depends on PC specs: It was explained that running AI models locally relies heavily on the user’s PC hardware, with performance varying based on specifications.
    • Running AI models locally still provides the advantage of having a model accessible even without internet connectivity.
  • Computer science degree is good for AI in the future: The user, a freshman in college, inquired about suitable majors for working in AI, with computer science suggested as a good option.
    • One user said that, despite concerns about the job market, a computer-related degree is recommended for future AI work.
  • Running command to use Hermes-3: To test the models, it was recommended that once Ollama is installed, users should run the command ollama run hf.co/NousResearch/Hermes-3-Llama-3.2-3B-GGUF:Q4_K_M in the terminal.
    • This command runs a smaller model, Hermes-3, which serves as a good test for local setup.

ixlinx-8b, SOTA small model, local hackathon

  • ixlinx-8b Model Released: The ixlinx-8b model was released on GitHub after a long period of development, advertised as a state-of-the-art (SOTA) small model.
    • Developed during a local hackathon, the creators invited contributions and suggested that the developers of Hermes should evaluate it.
  • ixlinx: Same Name As Our Overlord?: A user jokingly noted that the name ixlinx is coincidentally the same name as our overlord.

Latent Space ▷ #ai-general-chat (109 messagesđŸ”„đŸ”„):

RL envs, Windsurf Next's stealth models, OpenAI metrics, Spatial Intelligence as AI’s Next Frontier, Character.AI’s Kaiju model design for speed

  • Windsurf Waves with Aether Models: Windsurf Next has released a new set of stealth models (Aether Alpha, Aether Beta, and Aether Gamma) for testing and feedback in the #new-models channel, which will be free to use for a limited time.
  • Analyst Analyzes OpenAI’s Output: Masa’s chart on OpenAI’s training-cost trends sparked a discussion, with praises for the insightful data points, and requests for burn rate and revenue.
    • Some members noted that OpenAI is nearly 10 years old, and one member suggested that the numbers should be adjusted for inflation.
  • FAIR is Foul Play: Susan Zhang revealed that Meta declined to create a lean FAIR v2 in early 2023 to pursue AGI, instead tasking the GenAI org with shipping AGI products.
    • She alleges that vision-less execs hired cronies who overpromised results and later joined OpenAI with inflated rĂ©sumĂ©s, causing lasting damage, according to this tweet.
  • Kaiju Keeps Character.AI Cranking: Character.AI’s proprietary Kaiju models (13B/34B/110B) were engineered for inference speed using techniques like MuP-style scaling, MQA+SWA, and ReLUÂČ activations.
    • The team deliberately avoided MoEs due to production constraints, as detailed in this Twitter thread.
  • Spotify Streams Stalled: The Latent Space Spotify feed is experiencing issues, with recent pods missing due to a copyright claim on the intro song.
    • A member mentioned that Someone in india copyrighted the royalty free intro song as their own song and Spotify has not been responsive; the podcast remains available on other platforms.

Latent Space ▷ #genmedia-creative-ai (5 messages):

Magic Patterns 2.0, AI Design Tool, Series A Funding

  • Magic Patterns 2.0 Scores $6M Series A: Alex Danilowicz unveiled Magic Patterns 2.0 and a $6M Series A led by Standard Capital.
    • The company celebrated bootstrapping to $1M ARR with no employees, with 1,500+ product teams now using the AI design tool, and plans for rapid hiring across enterprise, engineering, community and growth roles.
  • Magic Patterns 2.0 Replaces Figma: Some users are raving that Magic Patterns 2.0 has replaced Figma for them.
    • One user commented “Very cool seems like something I want to try out kind of like the og v0”.

Modular (Mojo đŸ”„) ▷ #general (34 messagesđŸ”„):

Dynamic Type Reflection in Mojo, Error Handling: try-catch vs. Monadic, Mojo Metaprogramming, C-FFI Pain Points, Public and Private Members

  • Dynamic Reflection Digs Deep in Mojo: Mojo aims to support dynamic type reflection, leveraging its JIT compiler to handle dynamic data, with a preference for static reflection, potentially allowing useful manipulations of dynamic data.
    • In related questions, it was mentioned that try-catch and raise will be the standard for error handling in Mojo to match Python’s style, although there will likely be more monadic options to properly handle errors.
  • Mojo’s Metaprogramming More Mighty?: According to a recent interview with Chris Lattner (YouTube link), Mojo’s metaprogramming capabilities are more powerful than Zig’s comptime mechanism because Mojo can allocate memory at compile time.
  • C-FFI Conundrums Confronted: Members discussed the pain points in doing C-FFI with Mojo, with one member offering to help with any specific issues that arise.
    • It was suggested to use Origin.external to work around the rewrite explicitly trying to fix things and to use MutAnyOrigin to preserve old behavior exactly, understanding that the “any origin” will extend all lifetimes in scope and act as an sub-par escape hatch to ASAP destruction.
  • Private Property Ponderings: Members discussed the potential addition of public and private members and methods in Mojo.
    • Currently, Mojo uses Python’s convention of an underscore to suggest that things should be private, but it’s unlikely to happen until there is an “I disagree with the library author and am breaking that encapsulation” escape hatch.
  • Modular’s Models: Much Mojo, MAX Impact?: A member inquired about the best approach to building a prediction model using the Modular tech stack, including data loading, visualization, preparation, model creation, and evaluation.
    • The response suggested that while using as much of the Modular tech stack as possible might be faster, it would require more work due to the early stage of Mojo’s ecosystem, recommending using PyTorch for training and MAX for inference for now.

Modular (Mojo đŸ”„) ▷ #mojo (49 messagesđŸ”„):

Optional Mutability, MOJO_PYTHON_LIBRARY standardization, Metal Compiler failing, comptime Bird

  • Mandatory mut annoations cause debate: A discussion arose around the verbosity of mandatory mut annotations for function parameters, drawing comparisons to Rust’s approach and concerns about diverging too far from Python’s clean syntax.
    • Some members found the explicit mutability helpful for tracking potential value mutations, while others argued for optional annotations or IDE-level indicators to reduce clutter, suggesting a compromise where mut is only mandatory if the argument is reused.
  • Sigils spark debate about Python spirit: The proposal of using sigils (e.g., !) to denote mutability sparked a debate, with some arguing that sigils go against Python’s spirit, while others pointed out that Python already uses sigils like dunders (__bla__) and underscores (_bla).
    • One member suggested call side mut annotation only mandatory inside fn if the argument is reused after the function call.
  • Standardizing MOJO_PYTHON_LIBRARY on macOS debated: Members discussed standardizing the MOJO_PYTHON_LIBRARY on macOS to Python 3.14, noting that previous issues with 3.13 have been resolved, but one member said 3.14 is in the works.
    • Members are waiting for a dependency to update.
  • Metal Compiler issues in GPU tutorial: One member encountered a Metal Compiler failed to compile metallib error while following the ‘Get started with GPU programming’ tutorial on an Apple M4 GPU.
    • Members suggested ensuring no print() statements in the GPU kernel and using the latest nightly build, while others pointed out potential SDK and compiler support issues with macOS and Xcode versions, eventually solved by ensuring the full Xcode installation was present.
  • comptime Bird syntax under review: The syntax comptime Bird = Flyable & Walkable for trait composition was discussed, with some finding it less intuitive than the alias keyword.
    • Others argued that comptime more accurately reflects the keyword’s functionality, especially with static reflection and the ability to mix types and values at compile time, suggesting it covers everything alias used to do.

DSPy ▷ #show-and-tell (1 messages):

Taxonomy Creation, Structured Generation

  • Taxonomy Tail Troubles Told: A member wrote a blogpost about their experience creating taxonomies.
    • They find the topic super relevant in the context of structured generation.
  • Taxonomies Relevant to Structured Generation: A blogpost on taxonomy creation highlights its relevance to structured generation.
    • The author shares their experiences and insights on why tails can break taxonomies, emphasizing its importance in the context of structured generation.

DSPy ▷ #general (68 messagesđŸ”„đŸ”„):

DSPy vs Prompting, Signatures vs Prompts, GEPA optimization, Agentic Search with DSPy, Complex systems with DSPy

  • DSPy Does Demand Domain-Driven Domain Knowledge: While DSPy aims to abstract away prompting, domain-specific LLM applications still require detailed instructions within signatures, with one user having 100 lines for some modules, showing that a simple input -> output approach is often insufficient.
    • The consensus is that DSPy requires more than just basic prompts for complex tasks; it necessitates encoding domain knowledge and step-by-step instructions to guide the LLM effectively.
  • Signatures: Better Than Prompts, But Still Prompts?: Participants discussed that DSPy’s signatures, while a better abstraction than raw prompts, still function as prompts, particularly within the docstrings of class-based signatures where business rules are encoded, facilitating optimization.
    • The framework helps to program, rather than focus on prompting, but a lot of the confusion in the community stems from the fact that a prompt means different things to different people.
  • GEPA Geometries Gradual Gains: While GEPA aims to optimize prompts, users find that specific guidelines are still necessary, even with tool functions, such as instructing the LLM to use regex for agentic search when initial terms fail.
    • One user found that they needed to add specific guidelines that LLM should send specific terms for tool to search via ripgrep but if it doesn’t find one MAKE SURE you add Regex as next, without which the LLM wouldn’t use Regex terms in the search tool
  • Agentic Agents Augmenting Analytics: A user shared a scenario where they needed to instruct the LLM to use regex in agentic search with ripgrep to effectively search through documents, highlighting the need for specific guidance even with advanced tools.
    • Another user shared about instructing the LLM that the answer might not be on page 1 in search results.
  • Modular Modules Magnify Manageability: The discussion highlighted the benefits of DSPy’s composability, allowing developers to break down control flow into logical modules, each with its own optimization target, unlike more rigid frameworks like BAML.
    • The composable of a module is true composability, because the module encapsulates a high-level task and can then chain modules together to achieve the final goal.

HuggingFace ▷ #general (56 messagesđŸ”„đŸ”„):

ZeroGPU problems, Reuben banned, Custom loss function with SFTTrainer, Video cutting with AI, Audio tokens in multimodal LLMs

  • ZeroGPU reportedly misbehaving: A member reported that ZeroGPU wasn’t working with logs, which prompted a discussion about its functionality, and another user confirmed it seems working now.
    • It is unclear whether there are still issues related to that original report, but the current status of ZeroGPU is working.
  • Reuben Banned, but Returns!: A user reported that Reuben got banned, but later lunarflu unbanned him, explaining that it was due to sending a lot of consecutive messages within a short time triggering a bot designed to combat crypto spammers.
    • The conversation included suggestions of using regex or AI to detect spam, but one user cited privacy concerns.
  • Non-profit Org seeks AI instructor: A member shared details about Revert and Returners CIC, a UK-based non-profit, seeking an instructor for an “Introduction to AI” course for Muslim women, covering topics like GitHub, Hugging Face, Python, and PyTorch.
    • The role involves one hour per week for 8-9 months at ÂŁ50 per hour, and they especially encourage women to apply, plus help setting up a shared server is sought.
  • Aquif Model Trending: Members noticed the Aquif-3.5-Max-42B-A3B model trending on Hugging Face and asked why.
    • One member noted it’s likely due to being an upscaled model with some fine tuning on top, while another admitted it just thought the name was funny.
  • Seeking Local AI Framework Feedback: A member is looking for feedback on their local AI framework after struggling to find constructive criticism in other channels.
    • They are not a developer by trade but believe the framework is quite interesting and could become something really cool.

HuggingFace ▷ #today-im-learning (1 messages):

quantumharsh: from where your are learning machine learning


HuggingFace ▷ #cool-finds (1 messages):

Render times, Progress labels

  • Render times stay same despite progress labels: A member noted that two different renders took the same amount of time, and both displayed 50 steps under the progress label.
  • Lack of variation: The user expressed confusion and suspected that something was missing or incorrect.

HuggingFace ▷ #i-made-this (3 messages):

Tokenflood Load Testing Tool, SMOLTRACE Benchmarking Framework, SmolVLM Blogpost

  • Tokenflood Delivers Load Testing for LLMs: A freelance ML engineer released Tokenflood, an open-source load testing tool for instruction-tuned LLMs, available on GitHub.
    • It simulates arbitrary LLM loads, useful for developing latency-sensitive LLM applications and assessing the latency benefit of prompt parameter changes.
  • SMOLTRACE Launches Comprehensive Benchmarking: SMOLTRACE, a benchmarking and evaluation framework for Smolagents with built-in OpenTelemetry observability, has been launched, as described in the docs.
    • It benchmarks ToolCallingAgent and CodeAgent, tracks accuracy, tokens, latency, CO2 emissions, GPU metrics, and cost across 132 benchmark tasks and 24 SRE/DevOps tasks.
  • Deep Dive into VLM Internals: A blog post explaining how VLMs work using SmolVLM as a reference was released, which can be read on HuggingFace.
    • The post provides insights into the mechanics of VLMs and how SmolVLM exemplifies these concepts.

HuggingFace ▷ #computer-vision (1 messages):

ConvNeXt-Tiny Model, Model Architectures, Computer Vision Tasks

  • ConvNeXt-Tiny Model Deep Dive Begins: Channel members initiate a discussion about their hands-on experience with the ConvNeXt-Tiny model, seeking collective insights.
    • The conversation aims to explore the model’s inner workings and its applicability to various computer vision tasks, potentially uncovering optimization strategies.
  • Unpacking Model Architectures for ConvNeXt: The discussion touches on the underlying model architectures that power ConvNeXt, focusing on its unique design choices.
    • Participants aim to understand how architectural innovations contribute to the model’s performance and efficiency in image recognition and related tasks.

HuggingFace ▷ #NLP (4 messages):

Random Data Generation, PII Detection and Randomization, Data Cleaning Techniques

  • Generating Realistic Random Data Debated: A member inquired whether the system could generate realistic random data instead of placeholders like ‘XXX’.
    • Another member suggested that using a plain Python script might be easier for this task, depending on the context.
  • PII Randomization Wishlisted: A user expressed interest in a prompt setup that could automatically detect and randomize Personally Identifiable Information (PII).
    • This would streamline the process of sanitizing sensitive data within the system.
  • Data Cleaning Process Detailed: A member outlined a common process for cleaning text data, starting with regex-based cleaning to remove redundant and duplicate data, and null values.
    • The process includes Exploratory Data Analysis (EDA), TF-IDF for custom stopword identification, and the use of NLTK stopwords to remove irrelevant words before creating embeddings for model input.

HuggingFace ▷ #gradio-announcements (1 messages):

MCP 1st Birthday, Anthropic, Gradio, AI hackathon, API credits

  • MCP’s Birthday Bash coming soon!: The MCP 1st Birthday Bash, hosted by Anthropic & Gradio, is just 2 days away and kicks off this Friday, Nov 14 (00:00 UTC) at https://huggingface.co/MCP-1st-Birthday.
    • Thousands of builders already registered for the event featuring $20K in cash prizes and $2.7M+ in API credits for all participants.
  • Checklist for launch day: Remember to join the org on HF, fill out the registration form, and hop into the official channel for live updates.
    • The official channel is mcp-1st-birthday-oficial🏆.

HuggingFace ▷ #agents-course (1 messages):

Study Group

  • Member desires to join Study Group: A member would like to join a study group and is willing to catch up on the material.
  • Study Group Progression: The member also asked to be updated on the current progress of the study group.

Moonshot AI (Kimi K-2) ▷ #general-chat (63 messagesđŸ”„đŸ”„):

Researcher mode errors, Kimi Coding Plan API Quota, Kimi API setup, Kimi K2 Thinking vs GLM 4.6, GPT 5.1 rollout

  • Researcher Mode Bugs Users: A user reported receiving errors instead of results from Researcher Mode, even after using it only once a week prior.
    • They inquired whether Researcher Mode is completely paid, as it now shows insufficient credit/upgrade messages.
  • Kimi Coding Plan API Quota Exhausts Quickly: Users reported that the Kimi Coding Plan’s API quota can be exhausted in just a few hours or sessions due to web search and plan mode usage.
    • One user suggested that Moonshot AI might move to a cursor-like plan to better align usage with costs, especially since they lack the VC funding of OAI and Anthropic.
  • API Setup Assistance Required: A user sought help with Kimi API setup for the thinking model using HTTP, facing authorization failures despite having credits and a valid API key.
    • Another user pointed out that the user was using the Chinese platform URL instead of the global https://api.moonshot.ai/v1/chat/completions URL.
  • Turbo Version for Faster Output: A user inquired about speeding up the processing time for the Kimi K2 thinking model via the API.
    • A member advised using the turbo version, which offers faster output speeds without compromising model performance.
  • GPT 5.1 is stealth rolled out: Members noted that GPT 5.1 rollout and that it was the stealth model on OR so it was decent but so safetylobotomized.
    • One member celebrated everyone kew it was coming since a few weeks ago as OpenAI takes an L.

Yannick Kilcher ▷ #general (32 messagesđŸ”„):

Semantic shifts in language, Google Colab vs Lambda Labs, FID scores for DiT models, ICLR review process, Whisper model usage

  • Thingification Semantic Shifts: A member discussed a semantic shift that bleaches/ressurrects the meaning of a word, differentiating it from the general sense of thingification.
  • Colab or Lambda Labs?: A member inquired about using Google Colab in the same way as Lambda Labs or similar clusters, questioning their equivalence for certain tasks.
  • FID Scores Visualized: A member asked what a FID of 30 looks like in a DiT, wondering if it resembles super gaussian noise that makes the image unidentifiable.
    • Another member clarified the difference between viewing FID in terms of image quality (human preference) versus emulating data distribution (model objective).
  • ICLR Review Process Woes: A member recounted a frustrating experience with ICLR reviews, where a resubmission received poor scores despite addressing previous concerns and adding new datasets.
    • The reviewers made comments such as it’s not a benchmark paper when the main point was testing on 19 datasets with 4 of those datasets being new ones we made with over 30k new questions total and that we don’t provide hyperparameters when they are literally in the appendix.
  • Whisper’s Weirdness: A member found that using the Whisper model directly with PyTorch resulted in errors and hallucinations, which were significantly reduced when using Whisper-server.
    • They suggested using the Whisper-server and compiling it with Vulkan support for portability, along with filtering out quiet sections to improve transcription.

Yannick Kilcher ▷ #paper-discussion (8 messagesđŸ”„):

Kimi K2, Thinkprm, Indian Names, Memorization to Reasoning


Yannick Kilcher ▷ #ml-news (8 messagesđŸ”„):

Elevenlabs Speech to Text, GPT-5 Release?

  • Elevenlabs Demos Speech-to-Text: Members shared their thoughts on Elevenlabs, noting that its primary function is text to speech but it now has speech to text.
  • GPT-5 conversational features?: A member linked to an OpenAI blog post that introduced GPT-5-1 and wondered if its more conversational tone was intended to attract users of GPT-4.

MCP Contributors (Official) ▷ #general (18 messagesđŸ”„):

timezone information MCP clients to MCP servers, SEP draft for timezone, Anthropics Claude desktop host team, connectivity issues between Claude.ai and MCP Servers, JSON data from mcp tool call results

  • Propose passing timezone info from MCP Clients to Servers: A member inquired about passing timezone information from MCP clients to MCP servers.
    • Another member responded that this is an interesting question, and considered supplying it as metadata via a client-sent notification or a server elicitation.
  • SEP Drafted for Timezone Protocol Change: A member drafted a SEP (spec enhancement proposal) for timezone and will post it to GitHub after internal feedback.
    • The options being considered were adding it to CallToolRequest, using a Header, adding it to JSONRPCRequest.params._meta, or adding it to InitializeRequest.
  • Claude.ai Connectivity Woes: A member sought debugging advice for connectivity issues between Claude.ai and MCP Servers.
    • Another member noted that it’s flaky and specific to the client, so may not be a topic for this server, and suggested a developer mode that gave a bit more feedback about what’s going on.
  • MCP Tool Call Results return other than JSON?: Someone asked if anyone had tried returning data other than serialized JSON from mcp tool call results (e.g. Toon format).
    • One member shared results of small-scale evals on a synthetic dataset, accuracy is comparable, 9% slower, 11% less tokens (n = 84, p = 0.10).

Manus.im Discord ▷ #general (14 messagesđŸ”„):

AI Automation Integration, Spanish Language Section Suggestion, Generative Engine Optimization, Manus System Error, Manus Support Channel Closure

  • AI Automation Integration Specialist Joins: A new member introduced themselves as an AI automation integration specialist, offering expertise in Python, SQL, JavaScript, and frameworks like PyTorch, scikit-learn, LightGBM, and LangChain.
    • They highlighted experience in delivering chatbots, recommendation engines, and time series forecasting systems.
  • Spanish Language Section Suggested for Server: A member suggested creating a Spanish language section on the server.
  • Guidance Sought for Generative Engine Optimization: A member requested resources and guidance on how to track and optimize for Generative Engine Optimization.
    • The member stated that they would be “really be grateful if anyone can share any resource or something that I can look at.”
  • Manus System Error Plagues User: A member reported a recurring Manus system error that prevents publishing, citing a “pathspec ‘417ea027’ did not match any file(s) known to git” error.
    • Expressing frustration, the member lamented the lack of support and previous unresolved issues, even after spending “hundred of dollars every month with Manus”.
  • Manus Support Access Troubles Users: Multiple members expressed difficulty in accessing Manus support, with one noting the apparent closure of the support channel.
    • One user, experiencing a git commit error, was advised by the Manus agent to “Wait for Manus support” or “Escalate the ticket”, and was given a link to provide feedback.

aider (Paul Gauthier) ▷ #general (13 messagesđŸ”„):

Code Snippets in Markdown Files with Aider, Aider Conventions Configuration, Aider Vim Mode, aider-ce and Session Management, Aider Development Status

  • Aider struggles with Code Snippets in Markdown Files: A user reported issues with Aider getting confused by nested code markdown marks when creating code snippets in markdown files using anthropic.claude-sonnet-4-5-20250929-v1:0.
    • The issue occurs because Aider misinterprets nested code markdown indicators, causing it to prompt repeatedly for file creation confirmation.
  • Aider File Demarcation Convention Workaround: A user discovered that adding three and four backticks ('&#x60;&#x60;&#x60;' and '&#x60;&#x60;&#x60;&#x60;') to the conventions.md file triggers Aider to demarcate files with <source> tags, resolving the code snippet issue.
    • By doing so, Aider can correctly identify and process code snippets without getting confused by nested markdown.
  • Aider’s Vim Mode Lauded: A user expressed enthusiasm for Aider’s Vim mode, calling it fantastic.
    • They also praised the new <#1403354332619079741> aider-ce /load-session /save-session functionality for its usefulness in parking and resuming jobs.
  • Aider Development Status Questioned: Users expressed concern over the lack of updates from Paul Gauthier regarding Aider’s development status.
    • There was discussion on whether Paul Gauthier is still actively working on the project, with one user mentioning they might have missed an announcement about him not working on it anymore.
  • GPT 5.1 Released with No Benchmarks: Members noted the release of GPT 5.1, but observed that there were no benchmarks mentioned in the release notes.

tinygrad (George Hotz) ▷ #general (4 messages):

OpenCL errors, package_data

  • Package Data Debacle?: A member questioned whether files were missing from the archive, asking if package_data is a no-op and suggesting explicit file specification could improve things.
    • They thanked the reviewer for their feedback.
  • OpenCL Error Overhaul: A member requested improvements to error messages when an OpenCL device is not detected, citing the current error as RuntimeError: OpenCL Error -30: CL_INVALID_VALUE.
    • The specific error originates from /tinygrad/tinygrad/runtime/ops_cl.py, line 103.