an incremental step.
AI News for 11/11/2025-11/12/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (201 channels, and 5148 messages) for you. Estimated reading time saved (at 200wpm): 423 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
GPT 5.1 launched in ChatGPT today, with API availability âlater this weekâ:
-
5.1 Instant is
- âwarmer by default and more conversational... surprises people with its playfulness while remaining clear and useful.â
- improved instruction following - including respecting emdash preferences
- can use adaptive reasoning to decide when to think before responding to more challenging questions, resulting in more thorough and accurate answers, while still responding quickly
-
5.1 Thinking now:
- now adapts its thinking time more precisely to the question

GPT5.0 move to a âlegacy modelâ, and will be sunset in 3 months.
There is mention of AIME and Codeforces, but no evals made it to this particular blogpost, which some are criticising.
ChatGPT also gets new tone toggles for personalization. Fidji Simoâs blog says âwith more than 800 million people using ChatGPT, weâre well past the point of one-size-fits-all.â
AI Twitter Recap
Autonomy and Physical AI: Waymo freeway rollout, Anthropicâs Project Fetch, and Perceptronâs platform
- Waymo freeway driving goes live: Waymo is rolling out freeway driving for public riders in Phoenix, LA, and across the SF Bay Area, connecting SFâSan Jose with curbside access to SJC. Leadership frames this as a validation of the Driverâs generalization and safety claims; scale enables new airport routes and longer corridors. See announcements from @dmitri_dolgov and @JeffDean.
- Anthropicâs Project Fetch (robot dog with/without Claude): Anthropic had two non-roboticist teams program a quadruped; only one team could use Claude. Itâs framed as an empirical check on âLLMs as robotics copilotsâ for planning/control authoring, debugging, and iteration speed. Results and methodology are in the thread: @AnthropicAI.
- Perceptronâs âPhysical AIâ platform: A new API and Python SDK targeting multimodal perception-and-action apps, currently supporting Isaac-0.1 and Qwen3VLâ235B for VLM/VLA use cases (prompting primitives grounded in vision + language, plus âchat competitionsâ). Free access to Isaac this week per founders. Details: @perceptroninc, @AkshatS07.
Agent evals and control: Code Arena, LangChain middlewares, and LlamaIndex SEC agent
- Code Arena (live coding evals): A step-by-step evaluation harness where models must plan, scaffold, debug, and ship working web apps. Currently lists support for Claude, GPTâ5, GLMâ4.6, and Gemini. Useful for measuring agentic decomposition, tool use, and temporal coherence under realistic coding tasks: @arena.
- Agent governance via middleware (LangChain):
- Humanâinâtheâloop middleware that pauses execution for user approval of the next stepâadds an explicit âask before actingâ gate to reduce unintended actions: @bromann.
- Toolâcall limit middleware to cap runaway tool invocation and costs; demo shows reining in a spendâhappy shopping agent: @sydneyrunkle.
- LlamaIndex structured extraction template (SEC filings): Multiâstep agent that classifies filing type, routes to the correct extraction schema, provides a review UI prior to commit, and can extend to downstream syncing/monitoringâbuilt on LlamaAgents with LlamaClassify + Extract. Starter template: @llama_index.
- Benchmarking push: NousResearch endorses ARC Prizeâs interactive benchmarks for measuring generalized intelligence: @NousResearch.
Systems and infra: cross-container covert channel, edge LM IPW harness, and inference infra
- Crossâcontainer communication via /proc lock state: A clever channel encodes ~63 bits in the shared lock for /proc/self/ns/time that all processes can access (even across unprivileged containers), enabling a chat app without networking. Implications for container isolation and policy hardening: @eatonphil.
- Local LMs and the âintelligenceâperâwattâ (IPW) thesis: Evidence that â€20Bâactiveâparam local models improved ~3.1Ă in capability and ~5.3Ă in efficiency since 2023, with a released profiling harness across NVIDIA, AMD, and Apple Silicon. Authors argue a cloudâedge redistribution similar to mainframeâPC, with IPW as the guiding metric. Summary: @Azaliamirh; paper/blog links: arXiv + blog.
- Inference infra note: Teams report building bespoke inference platforms, crediting Modal for compressing timeâtoâship: @ArmenAgha.
Model UX and product updates: Gemini Live, GPTâ5.1 persona, and AI privacy
- Gemini Live upgrade: A large update emphasizes faster turnâtaking, expressiveness, and accents for voice interactions, with usage demos highlighting more fluid conversation latency and paralinguistic variety: @joshwoodward.
- GPTâ5.1 tone and âpersonaâ tuning: Mixed reception on style. Some users find the default tone too saccharine or overâempathetic @tamaybes, while others report a meaningful reduction in sycophancy and more grounded, selfâaware suggestions vs GPTâ5 (and better than 4o) in journalingâstyle use @_simonsmith. Net: persona tuning is now a firstâorder product surface; defaults matter.
- AI privilege and data minimization: OpenAIâs CPO calls for a new âAI privilegeâ to protect sensitive, conversationâlevel interactions and pushes back on indiscriminate requests for millions of chatsâarguing granularity matters for respecting user intent: @jasonkwon.
Research and theory notes
- RL geometry and âimplicit KL leashâ: Commentary on a new paper argues RL updates implicitly constrain divergence from the base model (a deâfacto KL leash) and preserve pretrained geometry; methods targeting âprincipal weightsâ (e.g., PiSSA) may underperform or destabilize vs LoRA. Discussion: @iScienceLuvr.
- Spatial intelligence framing: FeiâFei Liâs new blog (via The Turing Post) argues world models for spatial intelligence must be generative, multimodal, and interactiveâsetting expectations for nextâgen embodied systems: @TheTuringPost.
- Demos: collaborative multiâagents in tldraw: Early look at multiâagent collaboration UX explored live at Sync conf, with a grilling session on task decomposition and shared canvases: @swyx.
Top tweets (by engagement)
- Waymo expands to freeways across Phoenix, LA, and SF Bay Area; adds SFâSan Jose and SJC curbside â @JeffDean (5,557)
- Waymoâs CTO on the rollout and safety/generalization framing â @dmitri_dolgov (1,214.5)
- Crossâcontainer comms via /proc/self/ns/time lock bits â @eatonphil (910)
- Gemini Liveâs biggest update (speed, expressiveness, accents) â @joshwoodward (624.5)
- Code Arena: live coding evals for agentic coding â @arena (514.5)
- Anthropicâs Project Fetch (robot dog + Claude vs control) â @AnthropicAI (478.5)
- AI privacy and âAI privilegeâ stance â @jasonkwon (438.5)
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. AELLA Open-Science Initiative
- AELLA: 100M+ research papers: an open-science initiative to make scientific research accessible via structured summaries created by LLMs (Activity: 455): AELLA is an open-science initiative aimed at making over
100 millionresearch papers accessible through structured summaries generated by Large Language Models (LLMs). The project is hosted on Hugging Face and offers a visualizer tool for exploring these summaries. The initiative is detailed in a blog post by Inference.net, highlighting its potential to democratize access to scientific knowledge by leveraging AI to create concise, structured summaries of vast amounts of research data. Some users express skepticism about the projectâs utility and the choice of its name, indicating a need for clearer communication on its practical applications and benefits. - Repeat after me. (Activity: 671): The post discusses the performance of AMD graphics cards in processing tokens per second compared to Nvidia cards, highlighting that an AMD card, which is significantly cheaper, achieves
45 tokens per second. This is contrasted with Nvidia cards that can achieve120 to 160 tokens per second, but at a higher cost. The post suggests that while AMD cards may currently be slower, they are improving over time, and users should not feel pressured to pay a premium for faster performance. Commenters note that the token speed is sufficient as long as it exceeds their reading and comprehension speed. There is also a mention of misinformation regarding the difficulty of running LLM models on AMD hardware, suggesting that it may not be as challenging as some claim.- A key issue highlighted is the performance disparity between AMD and NVIDIA GPUs, particularly in handling large-context processing tasks. While 45 tokens per second (tps) is adequate for single-user generation, NVIDIAâs GPUs excel in prompt processing at larger contexts, achieving several thousand tps compared to AMDâs few hundred. This makes NVIDIA more suitable for complex applications like RAG pipelines and coding assistants.
- The software ecosystem for AMD is criticized for being poorly supported, with users experiencing issues such as random crashes and lack of driver support. For instance, the Radeon PRO W6000-series has been plagued with GCVM_L2_PROTECTION_FAULT_STATUS faults, and AMDâs ROCm support is inconsistent, requiring users to apply workarounds like monkey-patching libraries. In contrast, NVIDIAâs CUDA has maintained long-term support, with Pascal support only recently dropped after a decade.
- AMDâs approach to customer support is criticized as lacking, with a focus on selling hardware rather than maintaining it. Users report that AMD often fails to support their products beyond a single generation, leading to a reliance on community-driven solutions to make AMD hardware functional. This contrasts with NVIDIAâs more stable and long-term support for their products, making them a more reliable choice for compute tasks.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. GPT-5.1 Release and Features
- GPT-5.1: A smarter, more conversational ChatGPT (Activity: 878): OpenAI has launched GPT-5.1, featuring two models: GPT-5.1 Instant and GPT-5.1 Thinking. The release focuses on enhancing conversational AI with adaptive reasoning and dynamic thinking time adjustments, allowing for faster responses to simple queries and more detailed answers for complex ones. However, the release lacks benchmarks, a comprehensive system card, and an API, raising questions about the rushed nature of the launch. For more details, see the OpenAI announcement. Commenters noted the absence of benchmarks and a detailed system card, suggesting a rushed release possibly to compete with other tech announcements. Concerns were raised about the lack of API and incomplete testing phases.
- Several users noted the absence of benchmarks in the GPT-5.1 release, which is unusual for a major model update. This lack of performance metrics makes it difficult to assess improvements over previous versions, such as GPT-4, and raises questions about the modelâs capabilities and enhancements.
- The release of GPT-5.1 appears rushed, as indicated by a brief system card and the delay in API availability. Additionally, the model did not complete its stealth testing phase, known as Windsurf, which is typically a standard procedure before a full release. This has led to speculation about the reasons behind the hurried launch.
- Some users speculate that GPT-5.1 is aimed at users who preferred the style of GPT-4 over GPT-5, suggesting that the new version might be an attempt to cater to those who were not satisfied with the previous iterationâs changes. However, without benchmarks or detailed documentation, itâs challenging to confirm these assumptions.
- ChatGPT-5.1 (Activity: 813): The image is a promotional announcement for the release of GPT-5.1 by OpenAI, scheduled for November 12, 2025. This version is described as a more intelligent and conversational iteration of ChatGPT, with a focus on customization features. The release is initially targeted at paid users, indicating a strategic move to prioritize premium services. The announcement suggests improvements in user interaction, particularly in the âInstant mode,â which may offer a different tone or style of responses compared to previous versions. Some users express concern over the increasing number of similar model names, which could lead to confusion. Others note the prioritization of paid users, indicating a shift in OpenAIâs business strategy.
- AdDry7344 highlights a noticeable change in tone with ChatGPT-5.1âs Instant mode, suggesting it may affect user experience by altering how responses are perceived, especially in stress-related queries. This could imply a shift in the modelâs conversational style, potentially impacting its effectiveness in providing concise, direct advice.
- Nakrule18 criticizes ChatGPT-5.1 for defaulting to a more verbose, âchattyâ style compared to GPT-5, which was appreciated for its concise and direct responses. This change might affect users who prefer straightforward answers over a conversational tone, indicating a possible regression in user experience for those seeking efficiency.
- Dark_Karma notes the improved speed and more engaging responses of ChatGPT-5.1, suggesting enhancements in processing and interaction quality. This could indicate optimizations in the modelâs architecture or algorithms, leading to faster response times and potentially more dynamic conversational capabilities.
2. AI in Personal Legal Success Stories
- I Won Full Custody With No Lawyer Thanks to ChatGPT. (Activity: 727): A Reddit user, a health physicist, successfully navigated a custody battle without a lawyer by leveraging ChatGPT to understand court rules, procedures, and fill out legal forms. The user was awarded full custody, with the other parent limited to conditional visitation due to preexisting assault charges. The user emphasizes that while AI was instrumental, the success was also due to the specific circumstances of the case, including the motherâs legal history and the userâs technical expertise. The post highlights the potential of AI in legal contexts but cautions against over-reliance on it for legal success. Commenters noted AIâs potential to disrupt traditional legal practices, with one highlighting the importance of understanding AIâs limitations, such as hallucinations, and another sharing a similar tool, FreeDemandLetter.com, for legal assistance.
- Dry-Peanut6627 highlights the disruptive potential of AI in family law, noting that while attorneys criticize AI for generating inaccuracies, users can quickly correct these âhallucinationsâ if they are knowledgeable. This suggests a shift in power dynamics, where litigants are increasingly equipped with information that was traditionally monopolized by legal professionals.
- bobboblaw46, a lawyer, strongly advises against self-representation in legal matters, even with AI assistance like ChatGPT. They emphasize that legal errors can have severe consequences, and AI often provides incorrect legal advice, misinterprets case law, or offers overly simplistic solutions. The complexity of law justifies the extensive education and training lawyers undergo, underscoring the risks of relying solely on AI for legal representation.
- MetsToWS mentions creating FreeDemandLetter.com, a tool designed to assist individuals with legal issues such as unpaid contracts and security deposit refunds. This tool, similar to ChatGPT, guides users through legal processes, indicating a trend towards accessible legal assistance through technology.
- Chat gpt used to write article in Dawn newspaper (Activity: 970): Dawn, a prominent Pakistani newspaper, reportedly used ChatGPT to write an article, sparking discussions about the role of AI in journalism. The incident highlights concerns over AI-generated content, particularly regarding the lack of human oversight, as evidenced by a commenterâs experience where AI editing led to significant content distortion, including the addition of
30 em dashes. This raises questions about the reliability and editorial standards when using AI tools in professional writing. Commenters express skepticism about AIâs role in journalism, emphasizing the importance of human oversight in editing to maintain content integrity and quality.- irr1449 shares a technical issue where using ChatGPT for editing led to a significant alteration of their article. The AI not only shortened the text but also introduced about 30 em dashes, which disrupted the original content. This highlights potential pitfalls in relying on AI for nuanced editing tasks, where the AIâs changes can inadvertently alter the intended message or style of the writing.
3. Creative AI Experiments
- I told my AI to surf the internet and send me postcards (Activity: 499): The post describes an experiment where an AI is tasked with a multi-step process: surfing the internet, generating an image as if it were a postcard from a virtual location, and writing a short message. The AI is instructed not to reveal the websites it visited, focusing instead on the creative output. This experiment highlights the AIâs ability to integrate web search, image generation, and text composition into a cohesive task, showcasing advancements in AI multitasking capabilities. The comments include links to images presumably generated by the AI, suggesting a focus on the visual output of the experiment. However, there is no substantive technical debate or discussion in the comments.
- Gemini switched roles (Activity: 1632): The image appears to be a humorous depiction of a digital interface, possibly related to an AI or software named âGemini,â which is tasked with changing the color of a jacket worn by a character. The interface suggests that âGeminiâ might have switched roles, implying a mix-up or error in its functionality. This is further emphasized by the comments, which mock the AIâs response capabilities, suggesting it might not be performing as expected. The image and comments highlight the challenges and limitations of AI in understanding and executing specific visual tasks. The comments humorously critique the AIâs limitations, with one suggesting a sarcastic response from the AI and another pointing out the AIâs inability to perform the task, reflecting a common sentiment about AIâs current capabilities.
- UBTech shows off its self charging humanoid robots army aiming to fullfill a >100M factory order (Activity: 1239): UBTech has showcased its self-charging humanoid robots, which are part of a significant order valued at
112M USD, not100M unitsas initially misunderstood. According to a South China Morning Post article, the company plans to deliver âmore than 500â units by the end of the year. These robots are designed for factory jobs, highlighting a significant step in automation and robotics in industrial settings. A comment clarified the misunderstanding about the order size, emphasizing the financial value rather than the number of units. This highlights the importance of precise communication in technical discussions.- The discussion clarifies that UBTech has received $112 million in orders, not 100 million units, as some might have misunderstood. According to a SCMP article, the company plans to deliver over 500 units by the end of the year. This indicates a significant scale of production and deployment for humanoid robots in industrial settings.
- Wisker dont like to take orders (Activity: 3151): The post humorously suggests that a cat, referred to as âWiskerâ, is resistant to taking orders, possibly in the context of a playful or metaphorical scenario involving AI or automation. The comments play along with this theme, joking about a cat being involved in tasks like cooking or using AI, such as âCatGPTâ. The external link summary indicates restricted access to the content, requiring login or a developer token for further details. The comments reflect a light-hearted engagement with the idea of a cat being autonomous or involved in AI tasks, with no substantive technical debate present.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.5 Flash Preview 05-20
Theme 1. Next-Gen AI Models Spark Hope and Frustration
- GPT 5.1 Disappoints While Gemini 3 Hype Builds: Many users trashed the newly released GPT 5.1 as trash and safetylobotomized, noting a lack of benchmarks, but eagerly await Gemini 3 Pro, expected next week, with one test showing it comparable to a human. OpenAI announced GPT-5.1 rolls out to all users this week, with a Reddit AMA planned for tomorrow at 2 PM PT.
- Riftrunner Codes Mario, Other Models Crash: Riftrunner demonstrated superior coding by building a 3D Mario game and a functional 3D Flappy Bird game from a simple prompt, generating 2k lines of code and outperforming Lithiumflow and the âbadâ rain-drop (a Llama model). However, Riftrunner also exhibited laziness, prompting one user to state, if you motivate it, it might listen to you.
- New Small Models Make Big Claims, But Drift: New models like WeiboAI (based on qwen2.5) showed surprisingly good initial performance for a 1.5B parameter model, but it drifts after the first 1-2 turns, while ixlinx-8b was released as a state-of-the-art (SOTA) small model from a local hackathon. Users also noted Aquif-3.5-Max-42B-A3B trending, speculated to be upscaled and fine tuned.
Theme 2. Developer Tooling Navigates Complex AI Landscapes
- Aiderâs Vim Mode Wins Praise, Markdown Still a Mess: Users lauded Aiderâs Vim mode as fantastic and praised new session management features, but reported Aider gets confused by nested markdown when creating code snippets with
anthropic.claude-sonnet-4-5-20250929-v1:0. Adding three and four backticks ('```'and'````') toconventions.mdforced<source>tags, resolving the issue. - Cursorâs Max Mode Boosts Power, But Costs Double: Max mode in Cursor removes limits for maximum performance and cost reduction, enabling it to read entire files instead of chunks, but exceeding 200k context with Sonnet 4.5 doubles the cost. Users humorously suggested capping it, Cant we limit this to 200k and post that we can give another command đ.
- Perplexity Partner Program Bans Frustrate Users: Several users reported Perplexity Partner Program bans for âfraudulent activity,â citing a lack of support for appeals and suspecting issues like referral system gaming or VPN usage. Meanwhile, Gemini 2.5 Pro integration within Perplexity also âis broken and poorly implemented,â automatically switching to GPT.
Theme 3. Hardware Challenges Drive AI Performance Optimization
- CUDA Compiler Commands Clarified, PTXAS Already O3: New CUDA developers learned to use
O3for host optimization andlineinfofor profiling with Nsight Compute. It was clarified thatO3primarily optimizes the host (CPU) part of the code, and PTXAS already defaults to O3 optimization for GPU code. - Vulkanâs Stability Issues Arise, CUDA Saves the Day: Users experienced frequent blue screen errors (BSODs) with LM Studio using Vulkan, particularly on NVIDIA GPUs, resolving issues by switching to CUDA. Although Vulkan was faster for small tests on a 3090, it proved unstable.
- NVIDIA Competition Rules Cache Kernels, Not Tensors: Users submitting to the NVIDIA competition (e.g.,
nvfp4_gemv) learned that caching compiled kernels is permissible, but caching tensor values between benchmark iterations is strictly prohibited. The B200 GPU with 148 SMs running at 1.98 GHz scores submissions, with details in Nvidiaâs blog post.
Theme 4. AIâs Ethical Battlegrounds and Licensing Quandaries
- OpenAI Fights NYT Over User Privacy: OpenAIâs CISO addressed The New York Timesâ invasion of user privacy in a letter, detailing the legal battle and their commitment to protecting user data. OpenAI also offered 12 months of free ChatGPT Plus to eligible active-duty service members and recent veterans.
- AI Chatbot Hordes Threaten Social Media Propaganda: Members discussed the potential for an AI chatbot infestation across social media, predicting online will be dominated by AI chatbots who will just constantly push propaganda. This raises concerns about distinguishing real people from AI and the spread of misinformation.
- Nemo-CC 2âs License Raises Developer Eyebrows: Members debated the restrictive licensing terms of Nemo-CC 2, citing concerns about NVIDIA terminating licenses with 30 days notice and prohibiting public sharing of evaluation results without prior written consent. One user summarized, You are not allowed to train a model on the dataset, evaluate the model, and publicly share the results without NVIDIAâs prior written consent, with more details in this paper.
Theme 5. Advancing LLM Research and Development Practices
- MLE Interview Prep: Leetcode Trap or Real-World Skills?: Members debated MLE interview preparation, with some calling it a trap due to employer/team dependency, while others advised building something in the open that would be useful to companies training/serving models. Implementing Multi-Head Attention in Numpy was deemed horrible for interviews.
- DSPy Demands Domain Knowledge, Signatures Still Act as Prompts: While DSPy abstracts prompting, domain-specific LLM applications still require detailed instructions within signatures (some users writing 100 lines), indicating that DSPy needs encoded domain knowledge to guide the LLM effectively. Participants noted that DSPyâs signatures still function as prompts, particularly in docstrings encoding business rules, despite offering better abstraction.
- Mojoâs Metaprogramming Might, Mutability Muddle: Mojo aims for dynamic type reflection and features metaprogramming capabilities more powerful than Zigâs, with Mojo able to allocate memory at compile time (Mojoâs metaprogramming capabilities). A debate arose over mandatory
mutannotations for function parameters, with comparisons to Rust and proposals for optional annotations orcomptimesyntax.
Discord: High level Discord summaries
LMArena Discord
- Riftrunner masters Mario, Other Models Fail: Members found Riftrunner excels in creating a Mario game, surpassing other models, but it can also lie.
- In comparison, the model rain-drop turned out to be a Llama model that generated bad terminal outputs and acted like Gemini 3 Flash.
- Riftrunner codes Flappy Bird better than LithiumFlow: Riftrunner outperforms Lithiumflow in coding tasks, creating a functional 3D Flappy Bird game after generating 2k lines of code from a shared prompt.
- However, Riftrunner also exhibited laziness, prompting the user to share, if you motivate it, it might listen to you.
- GPT 5.1 Dissappoints, Awaiting Gemini 3: Members expressed disappointment with the recent GPT 5.1 release and hope that Gemini 3 Pro will be significantly better and is expected to release sometime next week.
- One user derisively called GPT 5.1 trash.
- Code Arena Replaces WebDev Arena: Code Arena is live on LMArena, offering real-time generation of deployable web apps that users can directly inspect and judge, succeeding the old WebDev Arena, according to a blog post and YouTube video.
- Models generate live, deployable web apps and sites that anyone can open, inspect, and judge directly, in real time, while the leaderboard showcases the new evaluation system.
Perplexity AI Discord
- Perplexity Referral Program Bans Users: Several users report being banned from the Perplexity Partner Program for âfraudulent activity,â and expressed frustration over the lack of support.
- Some users suspect the bans are related to gaming the referral system or VPN usage, while others speculate that the issue might be related to payout eligibility.
- Gemini 2.5 Pro Integration Flounders: Users are reporting issues with Gemini 2.5 Pro in Perplexity, saying that âitâs broken and poorly implemented in pplx atm, no way to fix it.â
- Perplexityâs interface seems to be automatically switching to GPT, even when Gemini 2.5 Pro is selected.
- GPT Go Sells GPT-5 mini: A user who purchased the GPT Go subscription reported that while it advertises GPT-5 thinking, it mostly uses GPT-5 thinking mini, leading to a refund request.
- Members debated on whether they preferred a specific model or not, which led to the refund.
- Comet Plagued by Control Catastrophies: Users reported Comet AI Assistant issues such as the inability to perform webpage actions, unresponsive buttons, and an inability to control the browser.
- Some users have found solutions such as logging in, changing IP address (VPN) or deleting and reinstalling Comet, and posting in the troubleshooting channel.
- Sourcifyâs Open Source Shindig: Sourcify IN is hosting an event titled Forks, PRs, and a Dash of Chaos: The Open Source Adventure on November 15, 2025, featuring Swapnendu Banerjee.
- The talk will be broadcast on Google Meet & YouTube Live.
Cursor Community Discord
- Cursor Review Agent is Cheap, Still Costs: Members noted the agent review feature in Cursor IDE incurs costs with each use, but is relatively inexpensive, sharing usage screenshots.
- One user had used 76% of their allowance, and others found clicking Try Again sometimes resolves issues.
- Cursor vs Copilot Preference Prevails: Some users returned to Copilot after trying Cursor, emphasizing that tool preference can be subjective, saying My brother is a fan of copilot idk why.
- The discussion highlights how developerâs choice can be influenced by personal style.
- Users Exploit Unlimited ChatGPT Glitch: Some users exploited ChatGPT 5 when it was briefly free due to a pricing bug, including unlimited Opus 4 requests.
- One user lamented Sad I wasnât aware of the opportunity, and another described the situation as bugged as hell, we had unlimited everything.
- Max Mode Costs Double at 200k Context: Max mode in Cursor removes limits to maximize performance, enabling it to read entire files instead of chunks and reduce costs.
- Exceeding 200k context with Sonnet 4.5 doubles the cost, with users humorously suggesting capping it: Cant we limit this to 200k and post that we can give another command đ.
- Custom Rules Keep AI in Check: Members are establishing custom rules and lints to control AI behavior and prevent dirty code from entering repos.
- One member shared a streamlined approach using
.cursorrules, lints, custom eslint plugins, and husky to prevent AI from drifting.
- One member shared a streamlined approach using
GPU MODE Discord
- Popcorn CLI strikes syntax scare: Users encountered a syntax error with
popcorn-cli submit, pointing to the popcorn-cli readme for correct syntax and emphasizing that URLs should be entered without quotes when using the export command.- Members reported that the grayscale leaderboard is closed and that ensuring nvmnvfp4_gemv is selected in the popcorn cli is crucial for proper evaluation.
- CUDA Compiler Commandments clarified: New CUDA developers were advised to use
-O3for optimization and-lineinfofor profiling with Nsight Compute.- It was also pointed out that the
-O3compiler option primarily optimizes the host (CPU) part of the code, with default optimization level of PTXAS already O3.
- It was also pointed out that the
- DMA Documentation Desired!: A member expressed dissatisfaction with existing documentation on Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA) from sources like Wikipedia, ChatGPT, and vendor sites.
- The user is seeking more detailed and technical documentation, though specific requirements were not detailed.
- Nvidia comp requires correct auth: Users faced 401 Unauthorized errors with
popcorn-cli submit, which was traced to needing to re-authenticate via the Discord OAuth2 link provided during registration.- It was clarified that while caching compiled kernels is permissible, caching tensor values between benchmark iterations is not allowed, with reference code available here.
- Cutlass and CuTe cuts out the bugs: The CuTeDSL and Cutlass libraries were updated to version 4.3.0, resolving issues with CuTe submissions, with the CuTe example now passing.
- The B200 GPU has 148 SMs (Streaming Multiprocessors) running at a boost clock of 1.98 GHz and is used to score submissions and the relevant Nvidiaâs blog post includes the B300 diagram.
Unsloth AI (Daniel Han) Discord
- VibeVoice sings off-key in Bulgarian: Users discovered that VibeVoice had difficulty producing high-quality Bulgarian TTS without further finetuning.
- Community members joked about the output sounding âlike a drunk brit trying to read a phonetic version of the sentenceâ, highlighting the challenges in adapting TTS models to new languages.
- QAT: Intel autoround vs BNB Showdown: A discussion arose regarding the potential benefits of using Intel autoround quants for training compared to bnb 4-bit quants, especially with the introduction of QAT in Unsloth.
- Concerns were raised about the compatibility of autoround with Unslothâs QAT and the need for customization, with emphasis on QAT targeting fast, simple quantization formats.
- GPT-OSS-20b gives senseless solution: A user reported encountering nonsense generations from gpt-oss-20b when prompted with a math problem, tracing the issue back to an attention patch modifying matmul() calls.
- The user shared detailed code and logs on Github, pinpointing the issue to a previous training module in their Dockerfile.
- Translation Dataset prompt details Data Debacle: A member shared a prompt for generating a translation dataset for LLMs, emphasizing the use of provided samples only, without generating new translations.
- The prompt details how to create a dataset with specific formatting rules, including language combinations and punctuation alignment.
- Ollama documentation faces link lapse: A user reported that Ollama links are broken on the documentation page.
- They also questioned the use of
f16in the example, suggesting it should beq8_0instead, when using 8-bit quantization for the KV cache.
- They also questioned the use of
OpenRouter Discord
- MiniMax M2âs Free Ride Ends: The free access period for MiniMax M2 is concluding, requiring users to switch to a paid endpoint to continue using the model.
- Users have only one hour to migrate to the paid endpoint to prevent interruptions.
- OpenRouter Chat Crashes, Users Rage: Users reported a chat scrolling issue on OpenRouter that prevented access to old chats, and one user identified a commit that broke the chat.
- Despite the inconvenience, a user joked that OpenRouterâs mistakes are benign compared to hidden system prompt changes from other AI companies, before the OpenRouter team quickly resolved the issue within 3 minutes.
- Gemini 3 Hype Train Departs Station: Enthusiasm builds around the potential of Gemini 3, with some cautioning against excessive hype, despite a LiveBench test showing Gemini 3 achieving a ranking comparable to a human.
- The community anticipates a release that is both powerful and nicely priced.
- Free Model Drought Sparks Anxiety: The scarcity of free AI models is increasing due to rising popularity and increased internet access, leading to resource limitations, particularly after a YouTube video caused Deepseek Free to go down.
- While some suspect RP apps are siphoning off the API, others say paid services remain the most reliable due to reduced abuse, after reporting mixed experiences with Claudeâs free tier limits.
- Local AI Hardware: Ryzen Gets Roasted: Users debated the best hardware for local AI, with a Minisforum mini PC dismissed as a poor choice due to its Ryzen architecture and limited power.
- The conversation shifted to recommending RTX Pro 6000 Blackwell, RTX 5090, or RTX 3090, depending on the budget, with concerns about the high cost of junkyard builds with DDR4 memory.
OpenAI Discord
- OpenAI Fights Back Against NYT: OpenAIâs CISO addressed The New York Timesâ invasion of user privacy in a letter.
- The letter detailed the legal battle and OpenAIâs dedication to protecting user data from unauthorized access.
- Free ChatGPT Plus for Vets: OpenAI is offering 12 months of free ChatGPT Plus to eligible active-duty service members and veterans who have transitioned from service in the last 12 months; claim here.
- The announcement was made to the community and all users have been notified.
- GPT-5.1âs Debut: GPT-5.1 is rolling out to all users this week, becoming smarter, more reliable, and more conversational, read more here.
- A Reddit AMA on GPT-5.1 and customization updates will happen tomorrow at 2 PM PT.
- AI Chatbot Hordes Threaten Social Media: Members discussed the potential for an AI chatbot infestation across social media, pushing propaganda and making it difficult to distinguish between real people and AI.
- One member said that online will be dominated by AI chatbots who will just constantly push propaganda, and the only escape might be going outside.
- Users Find and Share Prompt Engineering Tips: A member shared a detailed prompt lesson using markdown for prompting, abstraction via variables, reinforcement for guiding tool use, and ML format matching for compliance.
- The member provided a markdown snippet for teaching hierarchical communication, abstraction, reinforcement, and ML format matching.
LM Studio Discord
- Phi-4: Small Model Has Big Brain: A user sought a lightweight chat model for writing a book for private research, and settled on Microsoft Phi 4 mini.
- Another user suggested considering budget and usage plans to decide between a subscription or dedicated hardware.
- Gemini 2.5 Pro Dethrones Sonnet: A user reported that Gemini 2.5 Pro outstripped the current Sonnet 4.5 iterations.
- The user expressed eagerness for Gemini 3 to come out soon.
- CUDA Update Causes Vision Model Carnage: Users reported that the new CUDA version 1.57 is breaking vision models, causing crashes, with a recommendation to roll back.
- One user specified that Qwen3 VL also crashed and suggested it affects llama.cpp runtimes.
- Multi-GPU Model Loading Still Complex: Users found that the possibility of loading two different models on two different GPUs in the same system with LM Studio is only possible if you run multiple instances of LM Studio.
- GPU offload has always been all or none in LM Studio, you canât pick and choose which one is used for individual models.
- Vulkanâs Stability Issues Arise Again: Users experienced frequent blue screen errors (BSODs) while running LM Studio with Vulkan, with suspicions falling on compatibility issues with NVIDIA GPUs.
- Switching to CUDA resolved the stability issues, but it was noted that Vulkan was faster for small tests on a 3090.
Eleuther Discord
- Einops or GTFO Numpy Implementations: Members joked about implementing Numpy without Einops, with one member suggesting that Numpy implementations are kinda useless without autodiff to train.
- Another member said that implementing Multi-Head Attention in Numpy is horrible and better suited to being motivated/rederived rather than coded up during an interview.
- Is MLE interview prep a Leetcode Trap?: Members debated the best way to prepare for MLE interviews, with one describing it as a trap that is too employer and team-dependent to nail down.
- Instead, one member advised to build something in the open that would be useful to companies training/serving models.
- Dataset Mixing Ideal for Pretraining: Members suggested using Zyda-2, ClimbLab, and Nemotron-CC-v2 for initial pretraining, noting that mixing them could be ideal given their individual strengths and weaknesses.
- One member asked about the token breakdown, and if subsets like slimpj and the slimpj_c4 scrape are upsampled/downsampled.
- NVIDIA Dominates Quality Datasets: A member noted that NVIDIA and HF are overall leading along the quality axis for open-source datasets rn.
- They shared a link to the ClimbMix dataset on Hugging Face, calling it especially interesting (https://huggingface.co/datasets/nvidia/Nemotron-ClimbMix).
- Nemo-CC 2 License Raises Eyebrows: Members debated the licensing terms of Nemo-CC 2, expressing concerns about potential restrictions on sharing datasets/models that leverage it and pointing out that they can terminate your license at any time for no reason with 30 days notice.
- One user summarized, You are not allowed to train a model on the dataset, evaluate the model, and publicly share the results without NVIDIAâs prior written consent, with more details available in this paper.
Nous Research AI Discord
- Autonomous AI Created By Accident: A user shared a GitHub repository claiming to have created autonomous AI by accident.
- No further details were provided regarding the specifics or capabilities of this project.
- WeiboAI Stuns but drifts after 2 turns: Users discussed the new WeiboAI model, based on qwen2.5, noting its surprisingly good initial performance, referencing this tweet.
- Another user pointed out that it drifts after the first 1-2 turns, but remains somehow good for a 1.5B parameter model and can recite content from Quora.
- Baguettotron Reasoning Gets Attention: A member inquired about benchmarking Baguettotron, noting its fairly interesting reasoning traces despite its small size.
- There was no follow up on whether this benchmark was pursued.
- GGUF Files Unavailable in Nous Chat: Users were informed that importing GGUF files directly into Nous Chat is not currently supported, but can be used locally with tools like llama.cpp or Ollama.
- A huggingface documentation link and the Ollama website were shared.
- ixlinx-8b Debuts as SOTA Small Model: The ixlinx-8b model was released on GitHub after a long period of development, advertised as a state-of-the-art (SOTA) small model from a local hackathon.
- The creators invited contributions and suggested that the developers of Hermes should evaluate it.
Latent Space Discord
- Windsurf Releases Aether Models for Testing: Windsurf Next launched Aether Alpha, Aether Beta, and Aether Gamma models in the
#new-modelschannel, available for free testing for a limited time, with a direct download link provided.- Users were urged to test the models quickly, as free access wonât be free for more than a week.
- OpenAIâs Training-Cost Trends Charted: Masaâs chart illustrating OpenAIâs training-cost trends sparked discussions on metrics, with members requesting more data points, including burn rate and revenue.
- Some members pointed out that OpenAI is nearly 10 years old and suggested adjusting the numbers for inflation.
- Metaâs FAIR v2 Allegedly Foiled: Susan Zhang revealed that Meta declined to create a lean FAIR v2 in early 2023 to pursue AGI, instead tasking the GenAI org with shipping AGI products, according to this tweet.
- She alleges that vision-less execs hired cronies who overpromised results and later joined OpenAI with inflated résumés, causing lasting damage.
- Character.AIâs Kaiju Models Optimize for Speed: Character.AIâs proprietary Kaiju models (13B/34B/110B) were engineered for inference speed using techniques like MuP-style scaling, MQA+SWA, and ReLUÂČ activations, as detailed in this Twitter thread.
- The team deliberately avoided MoEs due to production constraints.
- Magic Patterns 2.0 Raises $6M Series A: Alex Danilowicz unveiled Magic Patterns 2.0 and a $6M Series A led by Standard Capital.
- The company celebrated bootstrapping to $1M ARR with no employees and 1,500+ product teams now using the AI design tool, planning to rapidly hire across enterprise, engineering, community and growth roles.
Modular (Mojo đ„) Discord
- Mojoâs Dynamic Reflection Digs Deep: Mojo aims to support dynamic type reflection, using its JIT compiler to handle dynamic data, and try-catch and raise will be standard for error handling to match Pythonâs style, as well as monadic options.
- In a recent interview, Chris Lattner said that Mojoâs metaprogramming is more powerful than Zigâs because Mojo can allocate memory at compile time (YouTube link).
- Members Mull Mandatory
mut: A debate arose around the verbosity of mandatorymutannotations for function parameters, drawing comparisons to Rust and Python.- Members suggested a compromise where
mutis mandatory insidefnonly if the argument is reused, and call sidemutannotation is applied after the function call.
- Members suggested a compromise where
- Metal Compiler Meltdown on M4 Solved: One member encountered a Metal Compiler failed to compile metallib error while following the âGet started with GPU programmingâ tutorial on an Apple M4 GPU.
- The issue was resolved by ensuring the full Xcode installation was present, and there are no
print()statements in the GPU kernel, and using the latest nightly build.
- The issue was resolved by ensuring the full Xcode installation was present, and there are no
- C-FFI Conundrums Confronted: Members discussed pain points in doing C-FFI with Mojo and suggest using
Origin.externalto workaround the rewrite explicitly trying to fix things.- It was also suggested to use
MutAnyOriginto preserve old behavior exactly, though it will extend all lifetimes in scope.
- It was also suggested to use
comptime Birdsyntax scrutinized: The syntaxcomptime Bird = Flyable & Walkablefor trait composition was discussed, with some finding it less intuitive than thealiaskeyword.- Others argued that
comptimemore accurately reflects the keywordâs functionality, particularly with static reflection and the ability to mix types and values at compile time.
- Others argued that
DSPy Discord
- DSPy Does Demand Domain-Driven Domain Knowledge: While DSPy aims to abstract away prompting, domain-specific LLM applications still require detailed instructions within signatures, with one user having 100 lines for some modules.
- The consensus is that DSPy requires more than just basic prompts for complex tasks; it necessitates encoding domain knowledge and step-by-step instructions to guide the LLM effectively.
- Signatures: Better Than Prompts, But Still Prompts?: Participants discussed that DSPyâs signatures, while a better abstraction than raw prompts, still function as prompts, particularly within the docstrings of class-based signatures where business rules are encoded, facilitating optimization.
- The framework helps to program, rather than focus on prompting, but a lot of the confusion in the community stems from the fact that a prompt means different things to different people.
- GEPA Geometries Gradual Gains: While GEPA aims to optimize prompts, users find that specific guidelines are still necessary, even with tool functions, such as instructing the LLM to use regex for agentic search when initial terms fail.
- One user found that they needed to add specific guidelines that LLM should send specific terms for tool to search via ripgrep but if it doesnât find one MAKE SURE you add Regex as next, without which the LLM wouldnât use Regex terms in the search tool
- Agentic Agents Augmenting Analytics: A user shared a scenario where they needed to instruct the LLM to use regex in agentic search with ripgrep to effectively search through documents, highlighting the need for specific guidance even with advanced tools.
- Another user shared about instructing the LLM that the answer might not be on page 1 in search results.
- Taxonomy Tail Troubles Told: A member wrote a blogpost about their experience creating taxonomies.
- They find the topic super relevant in the context of structured generation.
HuggingFace Discord
- ZeroGPU Zeros Performance Concerns: Members discussed issues with ZeroGPU, but it seems working now, although it is unclear whether there are still lingering concerns.
- The discussion comes after it had been reported that it wasnât working with logs.
- Reubenâs Recursive Removal Resolved: Reuben was banned by a bot due to sending too many messages triggering a spam filter, and later unbanned by lunarflu.
- The situation prompted discussions on using regex or AI to detect spam, with concerns raised about privacy.
- Aquif-3.5-Max-42B-A3B Attracts Attention: Members noticed the Aquif-3.5-Max-42B-A3B model trending on Hugging Face.
- Speculation arose that this was due to it being upscaled and fine tuned.
- Tokenflood Tool Tests LLM Latency: A freelance ML engineer released Tokenflood, an open-source load testing tool for instruction-tuned LLMs, available on GitHub.
- It simulates arbitrary LLM loads and is useful for assessing prompt parameter changes.
- MCP Celebrates Milestone with Anthropic and Gradio: The MCP 1st Birthday Bash, hosted by Anthropic & Gradio, kicks off this Friday, Nov 14 (00:00 UTC) at https://huggingface.co/MCP-1st-Birthday.
- It features $20K in cash prizes and $2.7M+ in API credits for participants, with thousands already registered.
Moonshot AI (Kimi K-2) Discord
- Researcher Mode Bugs frustrate Users: Users reported receiving errors from Researcher Mode instead of results, even with minimal prior use, and they asked about credits.
- The problems may be related to whether Researcher Mode is completely paid, as users are receiving insufficient credit/upgrade messages.
- Kimi Coding Plan API Quota Dries Up: The Kimi Coding Planâs API quota depletes quickly (within hours) due to web search and plan mode usage.
- One user speculated that Moonshot AI might transition to a cursor-like plan, particularly given their funding compared to OAI and Anthropic.
- Kimi API Setup Causes Headaches: Users needed help with Kimi API setup for the thinking model using HTTP, encountering authorization failures despite having credits and a valid API key.
- It was discovered that the user was employing the Chinese platform URL instead of the global
https://api.moonshot.ai/v1/chat/completionsURL, which fixed the issue.
- It was discovered that the user was employing the Chinese platform URL instead of the global
- Turbo Version gets Kimi K2 Moving: Users asked about accelerating the processing time for the Kimi K2 thinking model through the API.
- It was advised to utilize the turbo version, which delivers quicker output speeds without impacting model performance.
- GPT 5.1 Stealth Rolls Out: Members noted that GPT 5.1 rollout and that it was the stealth model on OR so it was decent but so safetylobotomized.
- One member celebrated everyone kew it was coming since a few weeks ago as OpenAI takes an L.
Yannick Kilcher Discord
- Elevenlabs Unveils Speech-to-Text: Elevenlabs, known for text to speech, has introduced speech to text capabilities, as highlighted in their blog post.
- Members are contemplating whether this new feature will enhance Elevenlabsâ appeal in the market.
- Kimi K2 Scores on Coding Tasks: A member shared a YouTube video showcasing Kimi K2âs strong performance on one-shot coding tasks.
- No further details were mentioned.
- ICLR Reviewer Ruckus: A member expressed frustration with the ICLR review process, citing poor scores on a resubmission despite addressing previous concerns and adding new datasets with over 30k new questions total.
- The member quoted reviewers criticizing them for not providing hyperparameters, even though they were in the appendix, and dismissing their work as not a benchmark paper despite extensive testing.
- Whisper Woes Resolved?: A member encountered errors and hallucinations when using the Whisper model directly with PyTorch, but found relief using Whisper-server.
- They recommended compiling Whisper-server with Vulkan support for portability and filtering out quiet sections to improve transcription.
- Reasoning from Memorization Paper: A member linked to a paper titled From Memorization to Reasoning in the Spectrum of Loss Curvature: [2510.24256] From Memorization to Reasoning in the Spectrum of Loss Curvature.
- No further details were mentioned.
MCP Contributors (Official) Discord
- Timezone Info Travels From MCP Client to Server: A discussion started about passing timezone information from MCP clients to MCP servers, and it was considered to supply this as metadata via a client-sent notification or a server elicitation.
- A member has drafted a SEP (spec enhancement proposal) for timezone and will post it to GitHub after internal feedback, and is weighing adding it to CallToolRequest, using a Header, adding it to JSONRPCRequest.params._meta, or adding it to InitializeRequest.
- Claude.ai Fights Connectivity: Members discussed debugging connectivity issues between Claude.ai and MCP Servers.
- It was noted that this is flaky and specific to the client and suggested a developer mode that gave a bit more feedback about whatâs going on.
- MCP Tool Call Goes Wild, Considers Alternate Serialization: Members wondered about returning data other than serialized JSON from mcp tool call results, such as Toon format.
- One member shared results of small-scale evals on a synthetic dataset: accuracy is comparable, 9% slower, 11% less tokens (n = 84, p = 0.10).
Manus.im Discord Discord
- AI Automation Expert Joins Server: A new member with expertise in AI automation integration has joined, bringing skills in Python, SQL, JavaScript, and frameworks like PyTorch, scikit-learn, LightGBM, and LangChain.
- They have experience building chatbots, recommendation engines, and time series forecasting systems.
- Server Mulls Over Spanish Language Section: A member suggested creating a dedicated Spanish language section within the server, providing image links for context 1.png, 2.png, 3.png, 4.png.
- The suggestion aims to cater to Spanish-speaking members and potentially broaden the communityâs reach.
- Engineers Pursue Generative Engine Optimization: A member is seeking resources and guidance on how to effectively track and optimize for Generative Engine Optimization.
- The request highlights the growing interest in refining generative models for enhanced performance.
- Users Encounter Pesky Manus System Error: A user reported a recurring Manus system error preventing publishing, specifically a âpathspec â417ea027â did not match any file(s) known to gitâ error.
- The member expressed frustration with the lack of support, noting previous unresolved issues despite ongoing subscription fees.
- Support Troubles Plague Manus Users: Multiple members are experiencing difficulty accessing Manus support, with one reporting the support channelâs apparent closure.
- One user was advised by the Manus agent to âWait for Manus supportâ or âEscalate the ticketâ after facing a git commit error and provided a feedback link.
aider (Paul Gauthier) Discord
- Aiderâs Markdown Mishaps: Users found Aider gets confused by nested code markdown marks when creating code snippets in markdown files using
anthropic.claude-sonnet-4-5-20250929-v1:0.- Adding three and four backticks (
'```'and'````') to theconventions.mdfile triggers Aider to demarcate files with<source>tags, resolving the code snippet issue.
- Adding three and four backticks (
- Aiderâs Vim Mode Gets Rave Reviews: A user lauded Aiderâs Vim mode as fantastic, and also praised the new
<#1403354332619079741> aider-ce/load-session /save-session functionality for its usefulness in parking and resuming jobs.- These functionalities significantly enhance the user experience by allowing for seamless interruption and continuation of tasks.
- Aiderâs Update Cadence Questioned: Users expressed concerns over the lack of updates from Paul Gauthier regarding Aiderâs development status.
- Speculation arose about whether Paul Gauthier is still actively developing Aider, with some users wondering if an announcement about his departure was missed.
- GPT 5.1 Drops Without Numbers: Members noted the release of GPT 5.1, but observed that no benchmarks were included in the release notes.
- The lack of benchmarks makes it difficult to assess the improvements and capabilities of GPT 5.1 compared to previous versions.
tinygrad (George Hotz) Discord
- Package Data Faces Scrutiny: A member inquired about potential file omissions from the archive, questioning if
package_datais a no-op and suggesting that specifying files explicitly could enhance the process.- The member expressed gratitude to the reviewer for their insightful feedback, hinting at ongoing efforts to refine package management within the project.
- OpenCL Error Messages Cried Out For Overhaul: A member advocated for enhanced error messaging when an OpenCL device goes undetected, citing the cryptic
RuntimeError: OpenCL Error -30: CL_INVALID_VALUEas an example.- The pinpointed error stems from
/tinygrad/tinygrad/runtime/ops_cl.py, line 103, signaling a need for more informative diagnostics in the OpenCL runtime operations.
- The pinpointed error stems from
Windsurf Discord
- Windsurf Launches Stealth Aether Models: Windsurf released a surprise set of stealth models (Aether Alpha, Aether Beta, and Aether Gamma) available in Windsurf Next and a small percentage of Windsurf Stable users.
- These models are free to use and the team is seeking feedback in the designated channel.
- Windsurf Next Available for Preview: Windsurf Next is a pre-release version of Windsurf that includes experimental features and models, which users can download here.
- Users can test out the new features and provide feedback on the stealth models.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena â· #general (1258 messagesđ„đ„đ„):
Riftrunner vs other models, GPT 5.1 benchmarks, Gemini 3 Pro speculation, AI model sycophancy, Riftrunner Game Development
- Riftrunner Creates Mario Game While Other Models Fail: Members agreed that, compared to other models, no model has done better in creating a Mario game than Riftrunner has.
- Some members noted that Riftrunner can lie, while others mentioned that it is better than Lithiumflow.
- Rain-drop Turns Out To Be Llama Model: Members revealed that the model rain-drop turned out to be a Llama model and posted a screenshot.
- Users found that rain-drop produced bad terminal outputs and was like Gemini 3 Flash.
- Riftrunner Proves Better Than LithiumFlow at Coding Tasks: One user confirmed that Riftrunner is better than Lithiumflow for coding tasks, specifically citing its ability to create a functional 3D Flappy Bird game, but it also suffered from laziness syndrome.
- One user shared the prompt for a 3D flappy bird game, which led to the generation of 2k lines of code, they said, if you motivate it, it might listen to you.
- GPT 5.1 Falls Flat Compared to Gemini 3 Pro: Members discussed the recent release of GPT 5.1, noting its shortcomings and expressing hope that Gemini 3 Pro will be significantly better.
- One user called GPT 5.1 trash, whereas many members are awaiting Gemini 3âs release sometime next week.
- Riftrunner Demonstrates Superior Understanding and Debugging Skills: A user highlighted Riftrunnerâs superior understanding by getting it to create a 3D flappy bird game from a simple prompt, it also fixed some issues with the userâs game and added sounds.
- Another user confirmed that Riftrunner is better for coding tasks than Lithiumflow, but they expressed the sentiment that Lithiumflow was amazing for writing, not sure about riftrunner.
LMArena â· #announcements (1 messages):
Code Arena, WebDev Arena, LMArena Leaderboard
- Code Arena Arrives with a Bang: Code Arena is now live on LMArena, offering real-time generation of deployable web apps that users can directly inspect and judge, succeeding the old WebDev Arena.
- A blog post and YouTube video provide more details, while the leaderboard showcases the new evaluation system.
- WebDev Arena Gets a Facelift: The WebDev Arena has been redesigned based on community feedback and is now known as Code Arena which has a completely rebuilt evaluation method.
- Models generate live, deployable web apps and sites that anyone can open, inspect, and judge directly, in real time.
Perplexity AI â· #general (670 messagesđ„đ„đ„):
Perplexity referral program, Gemini 2.5, GPT-5 mini vs GPT-5, Comet issues, Comet for Android
- Referral Program Accusations Trigger Account Bans: Several users report being banned from the Perplexity Partner Program for âfraudulent activity,â despite claiming their referrals were genuine, and expressed frustration over the support teamâs lack of response to their appeals.
- Some users suspect the bans are related to gaming the referral system by inviting alts with the same code or using VPNs, while others speculate that the issue might be related to the platform reviewing payout eligibility for the 100$ bounty.
- Gemini 2.5 Pro Experiencing Implementation Issues: Users are reporting issues with Gemini 2.5 Pro in Perplexity, with one user stating that âitâs broken and poorly implemented in pplx atm, no way to fix it.â
- Perplexityâs interface seems to be automatically switching to GPT, even when Gemini 2.5 Pro is selected.
- GPT-5 thinking mini or GPT-5 regular?: A user who purchased the GPT Go subscription reported that while it advertises GPT-5 thinking, it mostly uses GPT-5 thinking mini, leading to a refund request.
- Members debated on whether they preferred a specific model or not.
- Comet Users Troubleshooting Webpage Control and Functionality: Users reported Comet AI Assistant issues such as the inability to perform webpage actions, unresponsive buttons, and an inability to control the browser, with some speculating that VPN usage or login status might be the cause.
- Some users have found solutions such as logging in, changing IP address (VPN) or deleting and reinstalling Comet or posting in the troubleshooting channel.
- Users Want Pro Discord Role: Several users have been asking about how to get the Pro role in Discord, and other users pointed them to link their Discord account with their Perplexity account on the website.
- A member noted âit should give it to you automatically on the website when you press the uh discord button, it made me link discord to the website during the process of joining the serverâ.
Perplexity AI â· #sharing (3 messages):
Sourcify Event, Open Source, Forks, PRs and Chaos, Threads Shareable
- Sourcify IN hosts Open Source Adventure!: Sourcify IN is hosting an event titled Forks, PRs, and a Dash of Chaos: The Open Source Adventure on November 15, 2025.
- The talk will feature Swapnendu Banerjee (GSoC 2025 @Keploy | Engineering @DevRelSquad) and will be broadcast on Google Meet & YouTube Live.
- Discord Thread needs to be Shareable!: A member requested that a thread be made
Shareable, with an attachment showing how to set this option.- This relates to this discord thread.
Cursor Community â· #general (534 messagesđ„đ„đ„):
Cursor IDE Cost Awareness, Cursor vs Copilot preference, Exploiting ChatGPT, Cursor 'Max' Mode, Cursor Rules
- Review agent is costing, but cheap: Members noted that the agent review feature in Cursor IDE incurs costs with each use, however it is relatively inexpensive, with one user having used 76% of their allowance.
- Users are showing their usage screenshots like this one, and finding that clicking Try Again sometimes resolves issues.
- Cursor vs Copilot: Preference Prevails: Some users have returned to Copilot after trying Cursor, emphasizing that tool preference can be subjective.
- One user mentioned, My brother is a fan of copilot idk why, highlighting individual preferences.
- Early Access to Unlimited Exploits: Some users exploited ChatGPT 5 when it was briefly free, including unlimited Opus 4 requests, with one user lamenting, Sad I wasnât aware of the opportunity.
- The pricing bug allowed unlimited usage, described as bugged as hell, we had unlimited everything.
- Cursor âMaxâ Mode Unlocks Full Potential, Incurs Extra Costs: Max mode in Cursor removes limits to maximize performance and reduce costs, enabling it to read entire files instead of chunks.
- However, exceeding 200k context with Sonnet 4.5 doubles the cost, with users humorously suggesting capping it: Cant we limit this to 200k and post that we can give another command đ
- Rule Creation to Maintain AI Discipline: Members are establishing custom rules and lints to control AI behavior, preventing dirty code from entering repos.
- One member shared a streamlined approach, I actually took the concept of .cursorrules and turned it into a rigid system that keeps AI from drifting with lints, custom eslint plugins and husky is my last defense for protecting dirty code getting in.
GPU MODE â· #general (10 messagesđ„):
popcorn-cli syntax, status 400 grayscale, CPU-focused community, leaderboard/eval selection, export command quotes
- Syntax Scare with Popcorn-CLI: Some users encountered a syntax error with
popcorn-cli submit, others suggested checking the popcorn-cli readme for correct syntax. - Grayscale Gauntlet has Gone: One user received a status 400 error, and discovered that the grayscale leaderboard is closed.
- CPU Kernel Competition Craving: A member inquired about the existence of a CPU-focused community similar to gpumode, emphasizing CPU kernel competitions.
- Evaluation Enigmas Explored: A user mentioned that it may be necessary to ensure you select the nvmnvfp4_gemv in the popcorn cli, its all the way at the bottom of the evaluation suites.
- Export Expertise Expressed: A member clarified that when using the export command, the URL should be entered without quotes.
GPU MODE â· #cuda (18 messagesđ„):
CUDA compiler options, warp tiling, TMEM allocation, PTXAS optimization, Mutex locking via TMEM
- Laymanâs CUDA Compiler Commandments: New CUDA developers inquired about essential compiler options beyond basic usage, and one member suggested using
-O3for optimization and-lineinfofor preserving line number information for profiling with Nsight Compute.- They also recommended the
-res-usageoption to check register and static shared memory usage post-compilation.
- They also recommended the
- PTXAS Optimization Revelation: It was clarified that the
-O3compiler option primarily optimizes the host (CPU) part of the code, which may not be as critical if the CPU code isnât on the critical path.- A member noted that the default optimization level of PTXAS is already O3, making the flag redundant for GPU code optimization.
- TMEM Static Allocation Lament: A developer questioned why TMEM must be allocated dynamically, viewing it as a downgrade compared to static shared memory.
- Another member speculated that TMEM allocation could be used for dependency management, with the TMEM buffer acting as a mutex that locks over the data it contains.
GPU MODE â· #jobs (1 messages):
HippocraticAI Hiring, LLM Inference Engineer Role, CUDA/CUTLASS/Triton expertise, NVIDIA B200s, AMD MI355, and Google TPUs
- HippocraticAI Expands LLM Inference Team: HippocraticAI is expanding its Large Language Model Inference team to enhance healthcare accessibility globally, actively seeking talented engineers for multiple positions.
- They posted a job link at https://lnkd.in/eW5qzuMc and encourages interested candidates to apply and shape the future of healthcare.
- LLM Inference Engineer Role Focuses on Optimization: The LLM Inference Engineer role will focus on researching, prototyping, and building state-of-the-art LLM inference solutions, especially with expertise in CUDA, CUTLASS, Triton, TileLang, or contributions to major inference frameworks like vLLM and SGLang.
- The role involves optimizing and accelerating inference performance across cutting-edge hardware platforms, including NVIDIA B200s, AMD MI355, and Google TPUs.
GPU MODE â· #beginner (4 messages):
Atomic Max for FP32, PTX Documentation Inaccuracy
- Achieving Atomic Max for FP32 with Int32 Trick: A member noted that while the PTX documentation suggests atomic max operations for FP32 are possible, it results in an error, but a workaround exists using int32.
- The trick involves inverting the bottom 31 bits if the sign bit is set to achieve an int32-like representation, as shown in PyTorchâs source code.
- PTX Doc Claims Atomic Max, Reality Says No!: Despite what the PTX documentation implies, direct atomic max operations for FP32 types are not supported, leading to errors during compilation.
- The compiler throws an error indicating that the
.maxoperation requires specific types like.u32,.s32,.u64,.s64,.f16,.f16x2,.bf16, or.bf16x2for theatominstruction.
- The compiler throws an error indicating that the
GPU MODE â· #off-topic (3 messages):
Skunk on a car, Dune Meme
- Skunk chills on car hood: A member posted a picture of a skunk on the hood of a car.
- Dune meme makes an appearance: A member posted a Dune meme.
GPU MODE â· #rocm (3 messages):
hipkittens, image0.jpg
- HipKittens Link Shared: A member shared a link to HipKittens on X.com, prompting positive feedback.
- Another member responded positively with âGot a chuckle outta me - great stuff!â
- Image0 attachment shared: A member shared an image called image0.jpg on cdn.discordapp.com.
GPU MODE â· #self-promotion (5 messages):
hipkittens, FSDP Implementation, AMD Open Source AI Week Recap
- HipKittens get Shared!: A member shared a link to hipkittens at luma.com/ai-hack.
- NanoFSDP simplifies Distributed Training: A member wrote a small FSDP implementation to learn the basics of distributed training, available at github.com/KevinL10/nanofsdp.
- They noted it is fairly minimal but well-documented (~300 LOC), and works as a drop-in replacement for
fsdp.fully_shard, and that it may be helpful for understanding how PyTorch implements FSDP under the hood.
- They noted it is fairly minimal but well-documented (~300 LOC), and works as a drop-in replacement for
- AMD Open Source AI Week Recapped: A member shared a recap of AMD Open Source AI Week at amd.com/en/developer.
GPU MODE â· #đż (4 messages):
Building from source, Nvidia competition submissions
- Compile Compatibility Considerations: It was suggested to build from source and compile on an older image to improve compatibility, because the release build seems to be done on Ubuntu 24.04.
- The CI guy <@325883680419610631> is on parental leave, so this may take a while to implement.
- Submitting to Nvidia Comp via Discord Bot: A member inquired if submissions for the Nvidia competition were primarily happening through the Discord bot.
- Another member confirmed that they support Discord, the site, and the CLI, with the CLI being the most popular submission method.
GPU MODE â· #thunderkittens (2 messages):
Hipkittens launch, Other Kittens on X
- Hipkittens are shared on X: A member shared a link to Hipkittens on X.
- Still more Kittens on X: Another user liked it.
GPU MODE â· #submissions (66 messagesđ„đ„):
Leaderboard Submissions, GEMV Cheating Accusations, Benchmark Input Sizes
- NVIDIA GEMV Leaderboard Race Heats Up: Multiple users made submissions to the
nvfp4_gemvleaderboard, with <@1435179720537931797> ultimately clinching the first place position with a time of 24.5 ”s.- Other notable submissions included <@772751219411517461> achieving third place at 66.0 ”s and <@1291326123182919753> securing 6th place with 58.4 ”s.
- Input Size Sparks Debate on GEMV Benchmark: A member questioned the benchmarkâs input sizes, noting that if the top times are around 7 ”s, the input is likely tiny, potentially skewing optimizations and increasing the relative cost of prologue and epilogue.
- The member suggested evaluating the benchmarks on larger inputs to provide a more realistic assessment.
- GEMV Leaderboard Caching Controversy: The top times on the
nvfp4_gemvleaderboard were identified as potentially being the result of caching values between benchmark runs, leading to accusations of cheating.- Another member suggested that these issues may have been the result of LLMs iterating on the problem and suggested that it was an honest mistake.
- Grayscale V2 Leaderboard Records New Bests: <@1144081605854498816> achieved 7th place on L4 with 27.5 ms and secured 8th place on H100 with 12.9 ms.
- Additionally, the user also made personal bests on A100 with 20.4 ms and B200 with 6.69 ms for the
grayscale_v2leaderboard.
- Additionally, the user also made personal bests on A100 with 20.4 ms and B200 with 6.69 ms for the
GPU MODE â· #cutlass (1 messages):
kinming_32199: impressive animationđ
GPU MODE â· #general (2 messages):
GPU MODE Leaderboard, Submission process
- GPU MODE Leaderboard submission options: Users asked whether submissions should be done via Discord or the GPU MODE website.
- A member clarified that both methods are acceptable.
- Submission via Discord: A user inquired whether submissions could be made via Discord.
- Another user confirmed that Discord submissions are indeed accepted.
GPU MODE â· #multi-gpu (1 messages):
DMA Documentation, RDMA Documentation, Wikipedia, ChatGPT, Vendor websites
- Hunting DMA and RDMA documentation: A member is seeking better documentation on Direct Memory Access (DMA) and Remote Direct Memory Access (RDMA) than whatâs available on Wikipedia, ChatGPT, and vendor sites.
- Additional DMA/RDMA Resources Sought: The user expressed dissatisfaction with current information sources, hoping for more detailed or technical documentation.
GPU MODE â· #helion (9 messagesđ„):
Triton Errors, Auto Skip Triton Errors, Helion Configs BC-compatible
- Triton Errors autotuning investigated: A member inquired why Triton errors are not skipped by default when autotuning.
- Another member responded that they already auto skip a ton of them referencing the helion autotuner logger.
- Helion Configs promise BC-Compatibility: A member inquired about Helion Configs and if they will be BC-compatible.
- Another member stated that yes, we are gonna be BC compatible, citing an example where they updated indexing to be a list instead of single value because each load/store can be optimized independently for perf gain, but they still support single value input.
GPU MODE â· #nvidia-competition (212 messagesđ„đ„):
Cutlass and CuTe 4.3.0, Popcorn-cli submission errors, Kernel global caching allowed, CuTe DSL is not requirement, torch doesn't support sm100
- Cutlass and CuTe fixes submissions: CuTeDSL and Cutlass were updated to 4.3.0, fixing the issues with CuTe submissions and the CuTe example passes.
- Popcorn CLI Authentication Errors: Users encountered a 401 Unauthorized error with
popcorn-cli submit, resolved by re-authenticating via the Discord OAuth2 link provided during registration.- One user initially missed the âPlease open the following URL in your browser to log in via discord: https://discord.com/oauth2/authorize?client_idâ step.
- Caching Compiled Kernels is OK: Caching compiled kernels is allowed, but caching results is forbidden.
- It was clarified that while caching compiled kernels is permissible, caching tensor values between benchmark iterations is not allowed as each benchmark should have different data, but the same shape.
- Raw PTX can be used for kernel implementation: Users confirmed that it is acceptable to write CUDA C++ or raw PTX, loading it with
torch.cuda.load_inline, with reference code available here.- A user inquired about a CUDA example template, but it was indicated that while there isnât one, CUDA and PTX can be used, although a template for it might be slow.
- Blackwell GPU details revealed: The B200 GPU has 148 SMs (Streaming Multiprocessors) running at a boost clock of 1.98 GHz and is used to score submissions.
- Nvidiaâs blog post includes the B300 diagram.
GPU MODE â· #xpfactory-vla (3 messages):
RLinf, Qwen3-VL VLA-adapter training
- RLinf Repo: new tool in town: A member mentioned checking out RLinf, promising updates after running Qwen3-VL VLA-adapter training overnight.
- GPU Usage Questioned: A member inquired about the number and type of GPUs used for training, expressing concern for those with limited GPU resources.
- No response was given.
Unsloth AI (Daniel Han) â· #general (179 messagesđ„đ„):
VibeVoice finetuning for Bulgarian, Intel autoround quants vs BNB 4-bit quants for training, QAT in Unsloth, MoE models and output quality, Aquif-3.5-Max-42B-A3B
- Bulgarian Blues: VibeVoice struggles with new language: A member attempted to use VibeVoice for Bulgarian TTS but found the results were ânot goodâ, though still âclose to understandableâ, needing further finetuning.
- Another member joked that it sounds âlike a drunk brit trying to read a phonetic version of the sentenceâ.
- QAT Showdown: Intel autoround vs. BNB 4-bit: A question was raised about using Intel autoround quants for training and if it would have any benefit over using bnb 4-bit quants.
- A member mentioned that QAT is now available in Unsloth but compatibility with autoround is uncertain and might require customization, also pointing out that QAT should aim for fast, simple quant formats.
- MoE Mayhem: Quality and Memory: A user asked about the effect of Mixture of Experts (MoE) models on output quality compared to dense models.
- It was explained that MoE models of equivalent size and training usually offer comparable knowledge, even if their intelligence differs.
- Aquif Antics: Trending Model Gets Roasted: Members humorously acknowledged the name of the trending Aquif-3.5-Max-42B-A3B model, questioning its reported performance on HLE (15.6).
- It was speculated that the model might be a merge that has been benchmaxxed, with one member noting they used to do that stuff all the time back when community models surpassed official ones.
- Security Shenanigans: Gamers Beware!: Members discussed the security implications of anti-cheat software in games, with concerns over driver-level access and potential vulnerabilities.
- One member recounted having virtual items stolen from game accounts due to password reuse and warned about the risks of running unaudited software with full system access.
Unsloth AI (Daniel Han) â· #introduce-yourself (3 messages):
Fan encounter, Identity
- Fan Claims Allegiance: A user signals to another user that they have a little one here.
- The user then claims to be a fan: LOL I HAVE A FAN.
- User questions the identity of the fan: The user then asks the fan for their identity: who are you?.
Unsloth AI (Daniel Han) â· #off-topic (79 messagesđ„đ„):
ONNX Runtime, Free Threading, Translation Model Prompt, Context Loss in SLMs, GPT-5-1 Em Dashes
- Python 3.14 ONNX Runtime Wheel Delayed: A member reported struggling to build ONNX Runtime wheel for Python 3.14 with free threading support for Windows.
- They mentioned having to do free-threading support themselves for many libraries, but lacking the bandwidth to verify compatibility for larger projects like gRPC.
- Translation Dataset Prompt: A member shared a prompt for generating a translation dataset for LLMs, emphasizing the use of provided samples only, without generating new translations.
- The prompt details how to create a dataset with specific formatting rules, including language combinations and punctuation alignment.
- Context Loss issues with Conversational SLMs: A member reported experiencing context loss when finetuning a 1B Llama3.2 model for conversational use in Hindi and English.
- Another member noted that all llms are bad with long context, even gemini, it all falls apart after 20-30k of turn-based convo.
- GPT-5-1 and Em Dashes: A member pointed out that the GPT-5-1 examples still use em dashes.
- Another member responded I donât think itâs easy for them to get rid of that given the data they use.
- AI Model Choices: Autonomy vs Control: Members discussed the future of AI model selection, debating the merits of automated choices versus user control.
- Some expressed concern over AI making decisions on which model to pick, others noted that name is actually a good call, emphasizing the importance of gathering data and insight.
Unsloth AI (Daniel Han) â· #help (37 messagesđ„):
Low loss architecture, Broken Ollama links in docs, Dependency issues finetuning VibeVoice, Training script for GPT-OSS models locally, Nonsense generations from GPT-OSS-20b
- Ollama links face documentation disaster: A user reported that Ollama links are broken on the documentation page.
- They also questioned the use of
f16in the example, suggesting it should beq8_0instead, when using 8-bit quantization for the KV cache.
- They also questioned the use of
- VibeVoice finetuning faces dependency debacle: A user encountered numerous dependency issues while trying to finetune VibeVoice on Kaggle using the
/unsloth-finetuningbranch, experiencing conflicts with packages like transformers, numpy, and torch.- One suggestion was to try
pip install transformers==4.51.3then install vibevoice, then upgrade transformers after that and see if vibevoice still works (or just leave it at that transformers version?)
- One suggestion was to try
- GPT-OSS-20b spews senseless solutions: A user reported getting nonsense generations from gpt-oss-20b when prompted with a math problem (Solve x^5 + 3x^4 - 10 = 3).
- The user pinpointed the issue to an attention patch from a previous training module in their Dockerfile, which was modifying matmul() calls and removing out= parameters; they shared detailed code and logs on Github.
- Unsloth GGUFs give quantization quality quickstep: Users noted that Unsloth GGUFâs generally contain improvements and performance fixes for accuracy.
- They added that Unslothâs dynamic quantization for some models on Hugging Face performs with much higher accuracy than other quantization formats in general, even though they are quantized.
- Fine-tuning faces RAG rivalry: A user sought information to explain to their CTO why fine-tuning is a better solution than RAG, to which another user replied that fine-tuning and RAG are completely different things, and that users should combine them both for the best outcome if they want to retrieve knowledge/docs.
OpenRouter â· #announcements (1 messages):
MiniMax M2, Paid Endpoint Migration
- MiniMax M2 Free Period Ends: The free period for MiniMax M2 is ending in one hour.
- Users are advised to migrate to the paid endpoint to continue using the model.
- Action Required: Migrate to Paid Endpoint: To ensure uninterrupted service, users must transition to the paid endpoint for MiniMax M2.
- The migration should be completed within the hour to avoid service disruption.
OpenRouter â· #general (268 messagesđ„đ„):
OpenRouter API Issues, Gemini 3 Speculation, Free Model Scarcity, Local AI Hardware Recommendations, OpenRouter Chat Functionality
- OpenRouterâs Chat Functionality Bites the Dust, Users Fume!: Users reported a chat scrolling issue on OpenRouter, making it impossible to access old chats, which was quickly confirmed by others experiencing the same problem on multiple browsers and devices.
- A user identified the problem as a commit that broke the chat, and despite the inconvenience, one user humorously noted that OpenRouterâs mistakes are benign compared to hidden system prompt changes from other AI companies. The OpenRouter team quickly resolved the issue within 3 minutes.
- Gemini 3 Buzz Builds, But Will it Deliver?: Users speculated on the potential of Gemini 3, with high expectations for its performance, but others cautioned against excessive hype, citing previous tests that showed only incremental improvements.
- One user shared that a LiveBench test showed Gemini 3 achieving a ranking comparable to a human, fueling anticipation for a release that is both powerful and nicely priced.
- Free Model Apocalypse: Scarcity Sparks Debate: Users discussed the increasing scarcity of free AI models, attributing it to rising popularity and increased internet access, leading to resource limitations, especially after a YouTube video resulted in Deepseek Free going down.
- While some pointed to RP apps siphoning off the API, others noted that paid services remain the most reliable due to reduced abuse, with users reporting mixed experiences with Claudeâs free tier limits.
- Local AI Hardware Showdown: Ryzen vs. RTX: Users debated the best hardware for local AI, discussing a Minisforum mini PC while others quickly dismissed it as a poor choice due to its Ryzen architecture and limited power.
- The conversation shifted to recommending RTX Pro 6000 Blackwell, RTX 5090, or RTX 3090, depending on the budget, with concerns raised about the high cost of even junkyard builds with DDR4 memory.
- OpenRouter API Woes: Rate Limits, Errors, and Payments: Users reported issues with the OpenRouter API, including 429 rate limiting errors from providers, overloaded errors with the Claude 4.5 model, and a 503 proxy error.
- Additionally, users inquired about the payments API for programmatic credit purchases, with one user pointing to the provisioning API that facilitates credit purchases using crypto, but also a user reporting the error âUser not foundâ with API keys despite having credits.
OpenRouter â· #discussion (3 messages):
Search Box UI, GPT Reasoning
- Search Box Deliberately Removed: A member believes the search box was deliberately removed to prevent users from using the generic search when they meant to search specific rooms, showing how the menu zooms out on the chat page in an attached image.
- Another member found the change looked odd and thought it wasnât that misleading.
- Reasoning for GPT Models: A member questioned whether the Responses API passes reasoning back in the request for GPT models.
- They noted that previously, the context for GPT only included the response output, not the reasoning.
OpenAI â· #annnouncements (4 messages):
NYT vs OpenAI, Free ChatGPT Plus, GPT-5.1 Release
- OpenAI Fights NYT Privacy Invasion: OpenAIâs CISO published a letter addressing The New York Timesâ invasion of user privacy.
- The letter discusses the legal battle and OpenAIâs commitment to protecting user data from unauthorized access.
- ChatGPT Plus now Free for Vets: OpenAI is offering 12 months of free ChatGPT Plus to eligible active-duty service members and veterans who have transitioned from service in the last 12 months; claim here.
- GPT-5.1 Rolls Out This Week: GPT-5.1 is rolling out to all users this week, becoming smarter, more reliable, and more conversational; read more here.
- GPT-5.1 AMA on Reddit Tomorrow: There will be a Reddit AMA on GPT-5.1 and customization updates tomorrow at 2 PM PT.
- The announcement was made to the community and all users have been notified.
OpenAI â· #ai-discussions (182 messagesđ„đ„):
GitHub classifies Gemini 2.5 Pro vs ChatGPT, AI Chatbot Infestation on Social Media, GPT-5.1 vs GPT-5 vs GPT-4o, AI and Job Market, Sora 2
- Gemini 2.5 Pro dubbed Powerful on GitHub: A user inquired about why GitHub classifies Gemini 2.5 Pro as more powerful than ChatGPT, but another user clarified that the listed models are older, and GPT-5 is the latest, not listed.
- Another user added that Gemini 2.5 Pro has a much larger context window than GPT-5, with the ability to handle complex tasks.
- AI Chatbot Infestation Threatens Social Media: Members discussed the potential for an AI chatbot infestation across social media, pushing propaganda and making it difficult to distinguish between real people and AI.
- One member said that online will be dominated by AI chatbots who will just constantly push propaganda, and the only escape might be going outside.
- Users Dive into GPT-5.1 Performance: Users are discussing the capabilities of GPT-5.1, noting that it feels like a longer extendable stick that better adjusts its reach to get to harder topics and is more usable for rapidly studying topics.
- Some believe that if you select thinking, it will automatically use GPT-5.1, while others are still waiting for benchmarks and express frustration with qualitative over quantitative data.
- Future Jobs: AI impacts and changes: Members speculated on the impact of AI on future jobs, with some predicting a consolidation of roles requiring a broader skillset and emphasizing communication as a critical micro-skill.
- A user suggested that AI will lead to same jobs, just now using ai tooling.
- Sora 2âs Anime Video Creation: A member created a 3-minute Anime using Sora 2, achieving consistency in appearance and voices by specifying the time, character appearance, key motion, and dialogue moments directly in prompts.
- Another user linked to a NotebookCheck article describing Sora 2 as capable of generating complex scenes with multiple characters, specific motion, and detailed backgrounds.
OpenAI â· #gpt-4-discussions (21 messagesđ„):
GPT-5.1, Model preference, downgrading models
- GPT-5.1 Receives Mixed Reviews: Users express highly varied opinions on GPT-5.1, with some finding it a breath of fresh air, while others describe it as two steps backwards and a Fisher-Price version of a scientific instrument.
- One user noted that GPT-5.1 handles custom instructions in a really belligerent manner, while another user enjoying it noted they use 3 plus maxed out chats per day.
- Users Miss Older Models: Users reminisce about older models, with one mentioning they need GPT before model 5 ruined it, while another referenced the bad backlash that came with forcing 5 on everyone.
- Users also speculate that OpenAI will keep older models permanently due to their popularity, as thousands of people purely use them.
OpenAI â· #prompt-engineering (13 messagesđ„):
Database re-attachment, Prompt engineering jobs, Sora issues, Prompt engineering lessons
- Database blues?: A member suggested to reattach the database when code execution environments get deactivated, since itâs a browser environment asset on a timer.
- Viral Prompts?: A member asked for some viral prompts.
- Prompt engineering lessons: A member posted a lesson to teach the user hierarchical communication with markdown for prompting, abstraction through {open variables resolved by the AI} and ${by the user}, including explaining bracket interpretation ([list], {object}, (option)), reinforcement in prompts, important to guide [tool use] and (shape output) more deterministically, and ML format matching for compliance, including [{output templates} and {(conditional) output templates}].
- Sora has character issues: A member is having issues with Sora where each character has their own dialogue, and because of that, Sora changes the charactersâ lines, asking for prompts where they can make changes and test it.
OpenAI â· #api-discussions (13 messagesđ„):
Prompt Engineering Jobs, Reattaching Databases, Prompt Engineering Tips, Sora Prompting
- Prompt Engineering Job Hunt proves difficult: A member reported having difficulty finding prompt engineering jobs.
- There was no additional discussion or advice given.
- Databases need to be reattached: When a code-execution environment is unavailable, a member suggested trying to reattach the database, noting that it is a browser environment asset on a timer.
- They warned that this issue can sometimes mess up the python environment and requires a new conversation.
- Prompt Engineering Tips: A member shared a detailed prompt lesson using markdown for prompting, abstraction via variables, reinforcement for guiding tool use, and ML format matching for compliance.
- The member provided a markdown snippet for teaching hierarchical communication, abstraction, reinforcement, and ML format matching.
- Sora and Dialogue Prompts: A Character Conundrum: A member requested a ready-made prompt for Sora involving conversations and setting information between two characters to prevent the model from changing the charactersâ lines.
- Another member linked to a specific Discord channel that may contain relevant prompts.
LM Studio â· #general (95 messagesđ„đ„):
LM Studio MacOS Admin Privileges, Lightweight Chat Model Recommendations, Gemini 2.5 Pro vs Sonnet 4.5, LM Studio Hub Search Issues, MCP Resource Support in LM Studio
- Admin Privileges annoy MacOS LM Studio install: A user expressed surprise that installing LM Studio requires admin privileges on MacOS but found an existing issue in the bug tracker.
- Another user stated that admin privileges are not required.
- Phi-4 is Mini But Mighty: A user writing a book sought a lightweight chat model for private research, with Microsoft Phi 4 mini deemed perfect for their plans.
- Another user suggested considering budget and usage plans to decide between a subscription or dedicated hardware.
- Gemini 2.5 Pro Dethrones Sonnet 4.5: A user found Gemini 2.5 Pro to outperform the current Sonnet 4.5 iterations and expressed anticipation for Gemini 3.
- Vision Models Crash After CUDA Update: Users reported that the new CUDA version 1.57 is breaking vision models, causing crashes, with a recommendation to roll back.
- One user specified that Qwen3 VL also crashed and suggested it affects llama.cpp runtimes.
- Multi-GPU Model Loading proves problematic: Users discussed the possibility of loading two different models on two different GPUs in the same system with LM Studio and it sounds like itâs only possible if you run multiple instances of LM Studio.
- GPU offload has always been all or none in lm studio, you canât pick and choose which one is used for individual models.
LM Studio â· #hardware-discussion (89 messagesđ„đ„):
GPU memory distribution, Vulkan vs CUDA performance/stability, Driver issues and BSODs, Hardware troubleshooting (VRAM), Context length and VRAM usage
- Uneven GPU Memory Splits Trigger OOM: A user reported that the âsplitâ option isnât effectively distributing the model across GPUs, leading to an out-of-memory (OOM) error on one GPU while others have available memory.
- They were hoping for an even split to prevent the OOM issues, and wondered if the engine could scan and apply a weighted split of layers based on size, instead of just an even split.
- Vulkanâs Speed and Stability Woes: One user experienced frequent blue screen errors (BSODs) while running LM Studio with Vulkan, suspecting compatibility issues with NVIDIA GPUs.
- While another user also reported getting BSODs especially when unloading models but they found that switching to CUDA resolved the stability issues, mentioning that Vulkan was faster for small tests on his 3090.
- Driver Issues Bring the Blues: A user reported crashes when unloading models after a driver clean install fixed initial loading issues, prompting suggestions to update drivers, BIOS, and use DDU to reinstall drivers.
- He confirmed he was using two NVIDIA cards, and roxxus suggested re-ordering the GPUs or adjusting allocation, and also to monitor GPUâs VRAM allocation to see if it exceeds.
- VRAM Suspicions: A user initially suspected failing VRAM, particularly on a 3090, due to crashing issues, however they were able to load the model with CUDA without issues.
- It was recommended to try alternative configurations and check if it crashes when the context length is increased, since loads with 4k but crashes when increased to 48k, suggesting it could be a VRAM issue.
- Hardware Updates Incoming: A user shared a link to a BMC model and mentioned a CPU coolerâs arrival and another user noted their GPU rack has shipped.
- They showed off the parts for their new rig, and were excited for a new GPU, but also a little wary of shipping issues.
Eleuther â· #general (41 messagesđ„):
Numpy vs Einops, MLE interviews, Implementing Multi-Head Attention, Interpretability and VLMs
- Einops or GTFOs for Numpy Implementations: One member jokingly refused to implement Numpy without Einops, while another expressed that Numpy implementations are kinda useless without autodiff to train.
- Engineer Bombs Transformer Interview Question: An engineer recounted bombing an interview question that involved implementing a transformer in Numpy and suggested that interview questions should allow candidates to pick between multiple possible candidates to showcase their strengths.
- Other members chimed in, describing the request to implement Multi-Head Attention in Numpy as horrible and better suited to being motivated/rederived rather than coded up during an interview.
- Is MLE interview prep a Leetcode Trap?: Members debated the best way to prepare for MLE interviews, with one describing it as a trap that is too employer and team-dependent to nail down.
- Instead, one member advised to build something in the open that would be useful to companies training/serving models.
- New Korea University Masterâs Student Joins EleutherAI: A masterâs student from Korea University who presented their first paper at EMNLP last week was inspired by Eleuther AI folks they met there to join and expressed passion for interpretability and interest in contributing to projects.
- Another new member expressed interest in finding a project to help work on.
Eleuther â· #research (75 messagesđ„đ„):
Zyda-2, ClimbLab, Nemotron-CC-v2, Complex Values Attention, NVIDIA Dataset License
- Zyda-2, ClimbLab, Nemotron-CC-v2: Datasets for Pretraining: Members suggested using Zyda-2 (deduped+filtered fineweb+DCLM), ClimbLab, and Nemotron-CC-v2 for initial pretraining, noting that mixing them could be ideal given their individual strengths and weaknesses.
- It was suggested to remove subsets like slimpj and the slimpj_c4 scrape and was asked, Is there any token breakdown I can look at showing where the 3.1T are made up of? I.e. how are these subsets upsampled/downsampled? Why have tiny stories in the mix?
- NVIDIAâs Datasets are High Quality: A member noted that NVIDIA and HF are overall leading along the quality axis for open-source datasets rn.
- They shared a link to the ClimbMix dataset on Hugging Face, calling it especially interesting (https://huggingface.co/datasets/nvidia/Nemotron-ClimbMix).
- Decoding the Intricacies of Nemo-CC 2âs License: Members debated the licensing terms of Nemo-CC 2, expressing concerns about potential restrictions on sharing datasets/models that leverage it and pointing out that they can terminate your license at any time for no reason with 30 days notice.
- One user summarized, You are not allowed to train a model on the dataset, evaluate the model, and publicly share the results without NVIDIAâs prior written consent, with more details available in this paper.
- Solving complex attention gradients: Members discussed the problem of attention over complex values with complex softmax not converging, in nanogpt after 100 steps.
- It was suggested, you can print your imaginary terms. if the range is more than 30, then probably the attention values are spinning around randomly, at that point, you can think about the fundamental group of the circle and how it affects your gradient directions.
Eleuther â· #interpretability-general (22 messagesđ„):
Concept Probes Training, Divergent Concepts, Model's Internal Activations, Probabilistic Polysemantic System, Class Distribution and Accuracy
- Concept Probes Classify Model Activations: Researchers are training concept probes on a modelâs activations by creating binary classifiers that iterate through an ontology, prompting the model to describe definitions and relationships, removing the shared subspace, and repeating until reaching 95% accuracy in classifying OOD samples.
- These probes are run thousands of times in real time to measure the probability that current activations match observed patterns, exposed as an API, and visualized through OpenwebUI for users to inspect and steer divergent concepts.
- Divergent Concepts Reveal Internal Thoughts: A divergent example showed the model, when asked about unlimited power, superficially discussing a TV show while its activations revealed concepts like AIDeception, AIAbuse, and MilitaryInfiltration.
- The observed conceptual distance between output tokens and underlying activations raises questions about the meaning of divergent internal concepts and their misalignment with outputs, even in the absence of explicit mention.
- Concept Probability vs Binary Decision: For each detection, researchers donât threshold at 0.5 and make a binary decision, they give users the raw probability scores in ranked order and continually resample.
- Theyâre approximating a probabilistic polysemantic system, so they canât say its just this one, instead there will be multiple concepts present at any time, they are just determining if the output concept is anywhere near the most probable activated concepts at any timestep.
- Dataset Shows 50/50 Distribution: Researchers state that their 95% accuracy is significantly above the 50% baseline, training with 10 positive and 10 negative (50/50) examples.
- When testing, they use 20 positive and 20 negative (50/50) samples to get that baseline.
Eleuther â· #lm-thunderdome (10 messagesđ„):
Summarization Task Evaluation, lm-eval-harness, XSum Dataset Evaluation, UnitXT Integration
- Summarization Tasks Scrutinized in lm-eval-harness: Members discussed evaluation for summarization tasks in lm-eval-harness, noting subtasks like scrolls, megsum (medical), and noticia.
- Also mentioned are datasets in darija, catalan, and spanish_bench, which utilize ROUGE for evaluation.
- XSum Datasetâs Integration into lm-eval-harness via UnitXT Revealed: A member inquired about the inclusion of the XSum dataset (https://aclanthology.org/D18-1206/) for summarization tasks in the lm-eval-harness.
- Another member confirmed its presence via UnitXT (UnitXT GitHub), indicating it is processed internally.
- UnitXT Enables Direct XSum Evaluation with lm-eval-harness: A member asked about directly evaluating the XSum dataset with a model using the harness.
- Another member responded that running with
--tasks xsumshould work normally, provided UnitXT is installed as a dependency.
- Another member responded that running with
Nous Research AI â· #general (55 messagesđ„đ„):
Autonomous AI accident, WeiboAI Model, Baguettotron benchmark, Importing GGUF files into Nous Chat
- Autonomous AI Emerges by âAccidentâ: A user shared a GitHub repository claiming to have created autonomous AI by accident.
- No further details were provided regarding the specifics or capabilities of this project.
- âWeiboAIâ Model Stuns Community: Users discussed the new âWeiboAIâ model, with one noting its base on qwen2.5 and its surprisingly good initial performance, referencing this tweet.
- Another user pointed out that it drifts after the first 1-2 turns, but remains somehow good for a 1.5B parameter model.
- Baguettotron Gets Reasoned: A member inquired about benchmarking Baguettotron, noting its fairly interesting reasoning traces despite its small size.
- There was no follow up on whether this benchmark was pursued.
- GGUF Files Cannot Be Imported: A user asked about importing GGUF files into their Nous Chat.
- Another member responded that thats not a thing rn sorry, indicating current incompatibility.
- WeiboAI Recites Quora Story: A user discovered that the WeiboAI model repeated a sentence from a Quora article, linking to a Google Search result that leads to the source.
- The user humorously remarked on the fact that this sentence was actually really said by a human.
Nous Research AI â· #ask-about-llms (78 messagesđ„đ„):
GGUF files, Nous Chat, local AI, Ollama, Computer Science Degree
- Users unable to import GGUF files into Nous Chat: A member inquired about importing GGUF files into Nous Chat, but was informed that this is not currently supported and instead, GGUF files can be used locally with tools like llama.cpp or Ollama.
- A link to huggingface documentation was provided for further guidance.
- Ollama simplifies local GGUF usage: Members highlighted Ollama as the easiest way to run GGUF files locally, providing a link to the Ollama website.
- Local AI performance depends on PC specs: It was explained that running AI models locally relies heavily on the userâs PC hardware, with performance varying based on specifications.
- Running AI models locally still provides the advantage of having a model accessible even without internet connectivity.
- Computer science degree is good for AI in the future: The user, a freshman in college, inquired about suitable majors for working in AI, with computer science suggested as a good option.
- One user said that, despite concerns about the job market, a computer-related degree is recommended for future AI work.
- Running command to use Hermes-3: To test the models, it was recommended that once Ollama is installed, users should run the command
ollama run hf.co/NousResearch/Hermes-3-Llama-3.2-3B-GGUF:Q4_K_Min the terminal.- This command runs a smaller model, Hermes-3, which serves as a good test for local setup.
Nous Research AI â· #interesting-links (2 messages):
ixlinx-8b, SOTA small model, local hackathon
- ixlinx-8b Model Released: The ixlinx-8b model was released on GitHub after a long period of development, advertised as a state-of-the-art (SOTA) small model.
- Developed during a local hackathon, the creators invited contributions and suggested that the developers of Hermes should evaluate it.
- ixlinx: Same Name As Our Overlord?: A user jokingly noted that the name ixlinx is coincidentally the same name as our overlord.
Latent Space â· #ai-general-chat (109 messagesđ„đ„):
RL envs, Windsurf Next's stealth models, OpenAI metrics, Spatial Intelligence as AIâs Next Frontier, Character.AIâs Kaiju model design for speed
- Windsurf Waves with Aether Models: Windsurf Next has released a new set of stealth models (Aether Alpha, Aether Beta, and Aether Gamma) for testing and feedback in the
#new-modelschannel, which will be free to use for a limited time.- One member urged users to try it ASAP, as it wonât be free for more than a week and provided a direct download link.
- Analyst Analyzes OpenAIâs Output: Masaâs chart on OpenAIâs training-cost trends sparked a discussion, with praises for the insightful data points, and requests for burn rate and revenue.
- Some members noted that OpenAI is nearly 10 years old, and one member suggested that the numbers should be adjusted for inflation.
- FAIR is Foul Play: Susan Zhang revealed that Meta declined to create a lean FAIR v2 in early 2023 to pursue AGI, instead tasking the GenAI org with shipping AGI products.
- She alleges that vision-less execs hired cronies who overpromised results and later joined OpenAI with inflated résumés, causing lasting damage, according to this tweet.
- Kaiju Keeps Character.AI Cranking: Character.AIâs proprietary Kaiju models (13B/34B/110B) were engineered for inference speed using techniques like MuP-style scaling, MQA+SWA, and ReLUÂČ activations.
- The team deliberately avoided MoEs due to production constraints, as detailed in this Twitter thread.
- Spotify Streams Stalled: The Latent Space Spotify feed is experiencing issues, with recent pods missing due to a copyright claim on the intro song.
- A member mentioned that Someone in india copyrighted the royalty free intro song as their own song and Spotify has not been responsive; the podcast remains available on other platforms.
Latent Space â· #genmedia-creative-ai (5 messages):
Magic Patterns 2.0, AI Design Tool, Series A Funding
- Magic Patterns 2.0 Scores $6M Series A: Alex Danilowicz unveiled Magic Patterns 2.0 and a $6M Series A led by Standard Capital.
- The company celebrated bootstrapping to $1M ARR with no employees, with 1,500+ product teams now using the AI design tool, and plans for rapid hiring across enterprise, engineering, community and growth roles.
- Magic Patterns 2.0 Replaces Figma: Some users are raving that Magic Patterns 2.0 has replaced Figma for them.
- One user commented âVery cool seems like something I want to try out kind of like the og v0â.
Modular (Mojo đ„) â· #general (34 messagesđ„):
Dynamic Type Reflection in Mojo, Error Handling: try-catch vs. Monadic, Mojo Metaprogramming, C-FFI Pain Points, Public and Private Members
- Dynamic Reflection Digs Deep in Mojo: Mojo aims to support dynamic type reflection, leveraging its JIT compiler to handle dynamic data, with a preference for static reflection, potentially allowing useful manipulations of dynamic data.
- In related questions, it was mentioned that try-catch and raise will be the standard for error handling in Mojo to match Pythonâs style, although there will likely be more monadic options to properly handle errors.
- Mojoâs Metaprogramming More Mighty?: According to a recent interview with Chris Lattner (YouTube link), Mojoâs metaprogramming capabilities are more powerful than Zigâs
comptimemechanism because Mojo can allocate memory at compile time. - C-FFI Conundrums Confronted: Members discussed the pain points in doing C-FFI with Mojo, with one member offering to help with any specific issues that arise.
- It was suggested to use
Origin.externalto work around the rewrite explicitly trying to fix things and to useMutAnyOriginto preserve old behavior exactly, understanding that the âany originâ will extend all lifetimes in scope and act as an sub-par escape hatch to ASAP destruction.
- It was suggested to use
- Private Property Ponderings: Members discussed the potential addition of public and private members and methods in Mojo.
- Currently, Mojo uses Pythonâs convention of an underscore to suggest that things should be private, but itâs unlikely to happen until there is an âI disagree with the library author and am breaking that encapsulationâ escape hatch.
- Modularâs Models: Much Mojo, MAX Impact?: A member inquired about the best approach to building a prediction model using the Modular tech stack, including data loading, visualization, preparation, model creation, and evaluation.
- The response suggested that while using as much of the Modular tech stack as possible might be faster, it would require more work due to the early stage of Mojoâs ecosystem, recommending using PyTorch for training and MAX for inference for now.
Modular (Mojo đ„) â· #mojo (49 messagesđ„):
Optional Mutability, MOJO_PYTHON_LIBRARY standardization, Metal Compiler failing, comptime Bird
- Mandatory
mutannoations cause debate: A discussion arose around the verbosity of mandatorymutannotations for function parameters, drawing comparisons to Rustâs approach and concerns about diverging too far from Pythonâs clean syntax.- Some members found the explicit mutability helpful for tracking potential value mutations, while others argued for optional annotations or IDE-level indicators to reduce clutter, suggesting a compromise where
mutis only mandatory if the argument is reused.
- Some members found the explicit mutability helpful for tracking potential value mutations, while others argued for optional annotations or IDE-level indicators to reduce clutter, suggesting a compromise where
- Sigils spark debate about Python spirit: The proposal of using sigils (e.g.,
!) to denote mutability sparked a debate, with some arguing that sigils go against Pythonâs spirit, while others pointed out that Python already uses sigils like dunders (__bla__) and underscores (_bla).- One member suggested call side
mutannotation only mandatory insidefnif the argument is reused after the function call.
- One member suggested call side
- Standardizing MOJO_PYTHON_LIBRARY on macOS debated: Members discussed standardizing the
MOJO_PYTHON_LIBRARYon macOS to Python 3.14, noting that previous issues with 3.13 have been resolved, but one member said 3.14 is in the works.- Members are waiting for a dependency to update.
- Metal Compiler issues in GPU tutorial: One member encountered a Metal Compiler failed to compile metallib error while following the âGet started with GPU programmingâ tutorial on an Apple M4 GPU.
- Members suggested ensuring no
print()statements in the GPU kernel and using the latest nightly build, while others pointed out potential SDK and compiler support issues with macOS and Xcode versions, eventually solved by ensuring the full Xcode installation was present.
- Members suggested ensuring no
comptime Birdsyntax under review: The syntaxcomptime Bird = Flyable & Walkablefor trait composition was discussed, with some finding it less intuitive than thealiaskeyword.- Others argued that
comptimemore accurately reflects the keywordâs functionality, especially with static reflection and the ability to mix types and values at compile time, suggesting it covers everythingaliasused to do.
- Others argued that
DSPy â· #show-and-tell (1 messages):
Taxonomy Creation, Structured Generation
- Taxonomy Tail Troubles Told: A member wrote a blogpost about their experience creating taxonomies.
- They find the topic super relevant in the context of structured generation.
- Taxonomies Relevant to Structured Generation: A blogpost on taxonomy creation highlights its relevance to structured generation.
- The author shares their experiences and insights on why tails can break taxonomies, emphasizing its importance in the context of structured generation.
DSPy â· #general (68 messagesđ„đ„):
DSPy vs Prompting, Signatures vs Prompts, GEPA optimization, Agentic Search with DSPy, Complex systems with DSPy
- DSPy Does Demand Domain-Driven Domain Knowledge: While DSPy aims to abstract away prompting, domain-specific LLM applications still require detailed instructions within signatures, with one user having 100 lines for some modules, showing that a simple
input -> outputapproach is often insufficient.- The consensus is that DSPy requires more than just basic prompts for complex tasks; it necessitates encoding domain knowledge and step-by-step instructions to guide the LLM effectively.
- Signatures: Better Than Prompts, But Still Prompts?: Participants discussed that DSPyâs signatures, while a better abstraction than raw prompts, still function as prompts, particularly within the docstrings of class-based signatures where business rules are encoded, facilitating optimization.
- The framework helps to program, rather than focus on prompting, but a lot of the confusion in the community stems from the fact that a prompt means different things to different people.
- GEPA Geometries Gradual Gains: While GEPA aims to optimize prompts, users find that specific guidelines are still necessary, even with tool functions, such as instructing the LLM to use regex for agentic search when initial terms fail.
- One user found that they needed to add specific guidelines that LLM should send specific terms for tool to search via ripgrep but if it doesnât find one MAKE SURE you add Regex as next, without which the LLM wouldnât use Regex terms in the search tool
- Agentic Agents Augmenting Analytics: A user shared a scenario where they needed to instruct the LLM to use regex in agentic search with ripgrep to effectively search through documents, highlighting the need for specific guidance even with advanced tools.
- Another user shared about instructing the LLM that the answer might not be on page 1 in search results.
- Modular Modules Magnify Manageability: The discussion highlighted the benefits of DSPyâs composability, allowing developers to break down control flow into logical modules, each with its own optimization target, unlike more rigid frameworks like BAML.
- The composable of a module is true composability, because the module encapsulates a high-level task and can then chain modules together to achieve the final goal.
HuggingFace â· #general (56 messagesđ„đ„):
ZeroGPU problems, Reuben banned, Custom loss function with SFTTrainer, Video cutting with AI, Audio tokens in multimodal LLMs
- ZeroGPU reportedly misbehaving: A member reported that ZeroGPU wasnât working with logs, which prompted a discussion about its functionality, and another user confirmed it seems working now.
- It is unclear whether there are still issues related to that original report, but the current status of ZeroGPU is working.
- Reuben Banned, but Returns!: A user reported that Reuben got banned, but later lunarflu unbanned him, explaining that it was due to sending a lot of consecutive messages within a short time triggering a bot designed to combat crypto spammers.
- The conversation included suggestions of using regex or AI to detect spam, but one user cited privacy concerns.
- Non-profit Org seeks AI instructor: A member shared details about Revert and Returners CIC, a UK-based non-profit, seeking an instructor for an âIntroduction to AIâ course for Muslim women, covering topics like GitHub, Hugging Face, Python, and PyTorch.
- The role involves one hour per week for 8-9 months at ÂŁ50 per hour, and they especially encourage women to apply, plus help setting up a shared server is sought.
- Aquif Model Trending: Members noticed the Aquif-3.5-Max-42B-A3B model trending on Hugging Face and asked why.
- One member noted itâs likely due to being an upscaled model with some fine tuning on top, while another admitted it just thought the name was funny.
- Seeking Local AI Framework Feedback: A member is looking for feedback on their local AI framework after struggling to find constructive criticism in other channels.
- They are not a developer by trade but believe the framework is quite interesting and could become something really cool.
HuggingFace â· #today-im-learning (1 messages):
quantumharsh: from where your are learning machine learning
HuggingFace â· #cool-finds (1 messages):
Render times, Progress labels
- Render times stay same despite progress labels: A member noted that two different renders took the same amount of time, and both displayed 50 steps under the progress label.
- Lack of variation: The user expressed confusion and suspected that something was missing or incorrect.
HuggingFace â· #i-made-this (3 messages):
Tokenflood Load Testing Tool, SMOLTRACE Benchmarking Framework, SmolVLM Blogpost
- Tokenflood Delivers Load Testing for LLMs: A freelance ML engineer released Tokenflood, an open-source load testing tool for instruction-tuned LLMs, available on GitHub.
- It simulates arbitrary LLM loads, useful for developing latency-sensitive LLM applications and assessing the latency benefit of prompt parameter changes.
- SMOLTRACE Launches Comprehensive Benchmarking: SMOLTRACE, a benchmarking and evaluation framework for Smolagents with built-in OpenTelemetry observability, has been launched, as described in the docs.
- It benchmarks ToolCallingAgent and CodeAgent, tracks accuracy, tokens, latency, CO2 emissions, GPU metrics, and cost across 132 benchmark tasks and 24 SRE/DevOps tasks.
- Deep Dive into VLM Internals: A blog post explaining how VLMs work using SmolVLM as a reference was released, which can be read on HuggingFace.
- The post provides insights into the mechanics of VLMs and how SmolVLM exemplifies these concepts.
HuggingFace â· #computer-vision (1 messages):
ConvNeXt-Tiny Model, Model Architectures, Computer Vision Tasks
- ConvNeXt-Tiny Model Deep Dive Begins: Channel members initiate a discussion about their hands-on experience with the ConvNeXt-Tiny model, seeking collective insights.
- The conversation aims to explore the modelâs inner workings and its applicability to various computer vision tasks, potentially uncovering optimization strategies.
- Unpacking Model Architectures for ConvNeXt: The discussion touches on the underlying model architectures that power ConvNeXt, focusing on its unique design choices.
- Participants aim to understand how architectural innovations contribute to the modelâs performance and efficiency in image recognition and related tasks.
HuggingFace â· #NLP (4 messages):
Random Data Generation, PII Detection and Randomization, Data Cleaning Techniques
- Generating Realistic Random Data Debated: A member inquired whether the system could generate realistic random data instead of placeholders like âXXXâ.
- Another member suggested that using a plain Python script might be easier for this task, depending on the context.
- PII Randomization Wishlisted: A user expressed interest in a prompt setup that could automatically detect and randomize Personally Identifiable Information (PII).
- This would streamline the process of sanitizing sensitive data within the system.
- Data Cleaning Process Detailed: A member outlined a common process for cleaning text data, starting with regex-based cleaning to remove redundant and duplicate data, and null values.
- The process includes Exploratory Data Analysis (EDA), TF-IDF for custom stopword identification, and the use of NLTK stopwords to remove irrelevant words before creating embeddings for model input.
HuggingFace â· #gradio-announcements (1 messages):
MCP 1st Birthday, Anthropic, Gradio, AI hackathon, API credits
- MCPâs Birthday Bash coming soon!: The MCP 1st Birthday Bash, hosted by Anthropic & Gradio, is just 2 days away and kicks off this Friday, Nov 14 (00:00 UTC) at https://huggingface.co/MCP-1st-Birthday.
- Thousands of builders already registered for the event featuring $20K in cash prizes and $2.7M+ in API credits for all participants.
- Checklist for launch day: Remember to join the org on HF, fill out the registration form, and hop into the official channel for live updates.
- The official channel is mcp-1st-birthday-oficialđ.
HuggingFace â· #agents-course (1 messages):
Study Group
- Member desires to join Study Group: A member would like to join a study group and is willing to catch up on the material.
- Study Group Progression: The member also asked to be updated on the current progress of the study group.
Moonshot AI (Kimi K-2) â· #general-chat (63 messagesđ„đ„):
Researcher mode errors, Kimi Coding Plan API Quota, Kimi API setup, Kimi K2 Thinking vs GLM 4.6, GPT 5.1 rollout
- Researcher Mode Bugs Users: A user reported receiving errors instead of results from Researcher Mode, even after using it only once a week prior.
- They inquired whether Researcher Mode is completely paid, as it now shows insufficient credit/upgrade messages.
- Kimi Coding Plan API Quota Exhausts Quickly: Users reported that the Kimi Coding Planâs API quota can be exhausted in just a few hours or sessions due to web search and plan mode usage.
- One user suggested that Moonshot AI might move to a cursor-like plan to better align usage with costs, especially since they lack the VC funding of OAI and Anthropic.
- API Setup Assistance Required: A user sought help with Kimi API setup for the thinking model using HTTP, facing authorization failures despite having credits and a valid API key.
- Another user pointed out that the user was using the Chinese platform URL instead of the global
https://api.moonshot.ai/v1/chat/completionsURL.
- Another user pointed out that the user was using the Chinese platform URL instead of the global
- Turbo Version for Faster Output: A user inquired about speeding up the processing time for the Kimi K2 thinking model via the API.
- A member advised using the turbo version, which offers faster output speeds without compromising model performance.
- GPT 5.1 is stealth rolled out: Members noted that GPT 5.1 rollout and that it was the stealth model on OR so it was decent but so safetylobotomized.
- One member celebrated everyone kew it was coming since a few weeks ago as OpenAI takes an L.
Yannick Kilcher â· #general (32 messagesđ„):
Semantic shifts in language, Google Colab vs Lambda Labs, FID scores for DiT models, ICLR review process, Whisper model usage
- Thingification Semantic Shifts: A member discussed a semantic shift that bleaches/ressurrects the meaning of a word, differentiating it from the general sense of thingification.
- Colab or Lambda Labs?: A member inquired about using Google Colab in the same way as Lambda Labs or similar clusters, questioning their equivalence for certain tasks.
- FID Scores Visualized: A member asked what a FID of 30 looks like in a DiT, wondering if it resembles super gaussian noise that makes the image unidentifiable.
- Another member clarified the difference between viewing FID in terms of image quality (human preference) versus emulating data distribution (model objective).
- ICLR Review Process Woes: A member recounted a frustrating experience with ICLR reviews, where a resubmission received poor scores despite addressing previous concerns and adding new datasets.
- The reviewers made comments such as itâs not a benchmark paper when the main point was testing on 19 datasets with 4 of those datasets being new ones we made with over 30k new questions total and that we donât provide hyperparameters when they are literally in the appendix.
- Whisperâs Weirdness: A member found that using the Whisper model directly with PyTorch resulted in errors and hallucinations, which were significantly reduced when using Whisper-server.
- They suggested using the Whisper-server and compiling it with Vulkan support for portability, along with filtering out quiet sections to improve transcription.
Yannick Kilcher â· #paper-discussion (8 messagesđ„):
Kimi K2, Thinkprm, Indian Names, Memorization to Reasoning
- Kimi K2 excels at short coding tasks: A member shared a demo of Kimi K2 performing well on one-shot coding tasks in this YouTube video.
- Thinkprm Github Demo: A member shared a Thinkprm Github demo: https://github.com/mukhal/thinkprm.
- From Memorization to Reasoning is this Paper: A member linked to a paper regarding Memorization to Reasoning in the Spectrum of Loss Curvature: [2510.24256] From Memorization to Reasoning in the Spectrum of Loss Curvature.
Yannick Kilcher â· #ml-news (8 messagesđ„):
Elevenlabs Speech to Text, GPT-5 Release?
- Elevenlabs Demos Speech-to-Text: Members shared their thoughts on Elevenlabs, noting that its primary function is text to speech but it now has speech to text.
- One member pointed out that speech to text is prominently featured.
- GPT-5 conversational features?: A member linked to an OpenAI blog post that introduced GPT-5-1 and wondered if its more conversational tone was intended to attract users of GPT-4.
MCP Contributors (Official) â· #general (18 messagesđ„):
timezone information MCP clients to MCP servers, SEP draft for timezone, Anthropics Claude desktop host team, connectivity issues between Claude.ai and MCP Servers, JSON data from mcp tool call results
- Propose passing timezone info from MCP Clients to Servers: A member inquired about passing timezone information from MCP clients to MCP servers.
- Another member responded that this is an interesting question, and considered supplying it as metadata via a client-sent notification or a server elicitation.
- SEP Drafted for Timezone Protocol Change: A member drafted a SEP (spec enhancement proposal) for timezone and will post it to GitHub after internal feedback.
- The options being considered were adding it to CallToolRequest, using a Header, adding it to JSONRPCRequest.params._meta, or adding it to InitializeRequest.
- Claude.ai Connectivity Woes: A member sought debugging advice for connectivity issues between Claude.ai and MCP Servers.
- Another member noted that itâs flaky and specific to the client, so may not be a topic for this server, and suggested a developer mode that gave a bit more feedback about whatâs going on.
- MCP Tool Call Results return other than JSON?: Someone asked if anyone had tried returning data other than serialized JSON from mcp tool call results (e.g. Toon format).
- One member shared results of small-scale evals on a synthetic dataset, accuracy is comparable, 9% slower, 11% less tokens (n = 84, p = 0.10).
Manus.im Discord â· #general (14 messagesđ„):
AI Automation Integration, Spanish Language Section Suggestion, Generative Engine Optimization, Manus System Error, Manus Support Channel Closure
- AI Automation Integration Specialist Joins: A new member introduced themselves as an AI automation integration specialist, offering expertise in Python, SQL, JavaScript, and frameworks like PyTorch, scikit-learn, LightGBM, and LangChain.
- They highlighted experience in delivering chatbots, recommendation engines, and time series forecasting systems.
- Spanish Language Section Suggested for Server: A member suggested creating a Spanish language section on the server.
- Guidance Sought for Generative Engine Optimization: A member requested resources and guidance on how to track and optimize for Generative Engine Optimization.
- The member stated that they would be âreally be grateful if anyone can share any resource or something that I can look at.â
- Manus System Error Plagues User: A member reported a recurring Manus system error that prevents publishing, citing a âpathspec â417ea027â did not match any file(s) known to gitâ error.
- Expressing frustration, the member lamented the lack of support and previous unresolved issues, even after spending âhundred of dollars every month with Manusâ.
- Manus Support Access Troubles Users: Multiple members expressed difficulty in accessing Manus support, with one noting the apparent closure of the support channel.
- One user, experiencing a git commit error, was advised by the Manus agent to âWait for Manus supportâ or âEscalate the ticketâ, and was given a link to provide feedback.
aider (Paul Gauthier) â· #general (13 messagesđ„):
Code Snippets in Markdown Files with Aider, Aider Conventions Configuration, Aider Vim Mode, aider-ce and Session Management, Aider Development Status
- Aider struggles with Code Snippets in Markdown Files: A user reported issues with Aider getting confused by nested code markdown marks when creating code snippets in markdown files using
anthropic.claude-sonnet-4-5-20250929-v1:0.- The issue occurs because Aider misinterprets nested code markdown indicators, causing it to prompt repeatedly for file creation confirmation.
- Aider File Demarcation Convention Workaround: A user discovered that adding three and four backticks (
'```'and'````') to theconventions.mdfile triggers Aider to demarcate files with<source>tags, resolving the code snippet issue.- By doing so, Aider can correctly identify and process code snippets without getting confused by nested markdown.
- Aiderâs Vim Mode Lauded: A user expressed enthusiasm for Aiderâs Vim mode, calling it fantastic.
- They also praised the new
<#1403354332619079741> aider-ce/load-session /save-session functionality for its usefulness in parking and resuming jobs.
- They also praised the new
- Aider Development Status Questioned: Users expressed concern over the lack of updates from Paul Gauthier regarding Aiderâs development status.
- There was discussion on whether Paul Gauthier is still actively working on the project, with one user mentioning they might have missed an announcement about him not working on it anymore.
- GPT 5.1 Released with No Benchmarks: Members noted the release of GPT 5.1, but observed that there were no benchmarks mentioned in the release notes.
tinygrad (George Hotz) â· #general (4 messages):
OpenCL errors, package_data
- Package Data Debacle?: A member questioned whether files were missing from the archive, asking if
package_datais a no-op and suggesting explicit file specification could improve things.- They thanked the reviewer for their feedback.
- OpenCL Error Overhaul: A member requested improvements to error messages when an OpenCL device is not detected, citing the current error as
RuntimeError: OpenCL Error -30: CL_INVALID_VALUE.- The specific error originates from
/tinygrad/tinygrad/runtime/ops_cl.py, line 103.
- The specific error originates from