Wow.
AI News for 10/27/2025-10/28/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (198 channels, and 14738 messages) for you. Estimated reading time saved (at 200wpm): 1120 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
The good news is that Sama and team have landed the plane successfully: with tens of billions of dollars at stake, both the for-profit and Microsoft renegotiations have concluded and there is a clean cap table and corporate structure now (credit Amir Efrati), clearing the way for a ālikelyā OpenAI IPO:
Microsoft let go of their exclusivity in exchange for a $250b OpenAI commit to Azure spend, and now OpenAI is free to work with other vendors, while Satya is now saying āI would love to have Anthropic⦠If Google wants to put Gemini on Azure, please do.ā
The other large financial number announced in the livestream is that this yearās 30GW worth of compute deals have totaled $1.4T ($47B per GW), and that the aspirational goal is for OpenAI to eventually build 1GW a week at $20B per GW (meaning about $1T a year of compute capex). Given stated goals of reaching 125GW, this means OpenAI will be wrangling about 3-4 trillion dollars worth of infra by 2033, about half the initially speculated 7 trillion number.
No, youāre not alone in thinking this is crazy, all of this is entirely unprecedented and yet possible, perhaps probable.
Perhaps for an AI Engineer audience, the more material announcements are in the platform āpivotā that OpenAI seems to have announced: a decreased emphasis on first party apps (odd given that they have a CEO of Apps):
and now more strongly than ever emphasizing the platform approach, even citing the Bill Gates Line:
If you watch OpenAI closely, this is all the signal you need.
AI Twitter Recap
OpenAIās new structure, Microsoft deal, and āopen weightsā
- OpenAI announced a recapitalization and reorg: the non-profit is now the OpenAI Foundation, the forāprofit becomes a Public Benefit Corporation (PBC). The Foundation holds special voting rights to appoint/replace the PBC board, owns equity valued at ~$130B, and holds a warrant that grants additional equity if the share price >10Ć in 15 years. OpenAI framed this as keeping the nonāprofit āin controlā while resourcing the mission (OpenAI, @stalkermustang highlights). Sam Altman and Jakub previewed priorities and took questions in a live session (@OpenAI, @sama).
- Analysts summarized the Microsoft agreement: Microsoft now holds ~27% on a diluted basis; remains OpenAIās frontier model partner with Azure API exclusivity until an AGI declaration verified by an independent panel; IP rights through 2032 (including postāAGI with safety guardrails); OpenAI commits to ~$250B in additional Azure purchases; Microsoft loses right of first refusal on compute; OpenAI may coādevelop with third parties and provide APIs to US national security customers on any cloud; API products remain Azureāexclusive (@koltregaskes).
- āOpenAI is now able to release openāweight models that meet requisite capability criteria,ā per OpenAIās policy languageāthis drew immediate attention from practitioners tracking the open ecosystem (@reach_vb). Observers circulated provisional equity splits of Foundation ~26%, Microsoft ~27%, employees/investors ~47% (@scaling01), though caution is warranted pending formal filings.
- Key open governance and safety reads: questions on Foundation control, mission vs. commercial goals, and AGI definitions under the Microsoft agreement (@robertwiblin). AGI timelines on Metaculus have lengthened by ~3 years since February, now May 2033 for āfirst AGIā and Oct 2027 for a weak, nonārobotic standard (@robertwiblin).
Agents go firstāclass: GitHub Universe, LangChain Deep Agents, and API design for agents
- GitHub Agent HQ and VS Code Agent Sessions: GitHub announced Agent HQ to orchestrate āany agent, any time, anywhere,ā with native collaborators (e.g., Claude, Devin) integrated into GitHub workflows. VS Code Insiders now ships an Agent Sessions view with OpenAI Codex and Copilot CLI, a builtāin plan agent, isolated subāagents, and a Copilot Metrics dashboard to track impact across any coding agent. Multiple Codex instances can run in parallel to complete tasks and open PRs (@github, @code, @burkeholland, @pierceboggan, @mikeyk, @cognition).
- LangChain Deep Agents 0.2: Introduces a ābackendā abstraction to swap the agent filesystem for a local FS, DB, or remote VM; focuses on longārunning, highāperformance agents with context compression, fileāsystem offloading, and subagent isolation. Positioning: a generalāpurpose harness for building systems like Deep Research or coding agents (@hwchase17, @LangChainAI, context engineering summary).
- API design for agents: Postmanās āAIāready APIsā argues most agents fail on weak machineāreadable documentation; it pushes predictable structures, standardized behavior, synced schema, and autoāgenerated, contextual docs (Agent Mode) to reduce guesswork ( @_avichawla).
- Educational resources: DeepLearning.AI and AMD launched an āIntro to PostāTrainingā course covering SFT, RLHF, PPO/GRPO, LoRA, evals/redāteaming, and production pipelines, with AMD GPUs backing fineātuning/RL runs (@AndrewYNg, @realSharonZhou).
Serving, observability, and infra
- vLLM Sleep Mode: zeroāreload model switching for multiāmodel serving with 18ā200Ć faster switches and 61ā88% faster first token vs cold starts. Two levels: L1 offloads weights to CPU; L2 discards weights; preserves allocators, CUDA graphs, JIT kernels across sleeps; works with TP/PP/EP (@vllm_project).
- Toolācalling reliability with Kimi K2 on vLLM: After fixing add_generation_prompt, empty content handling, and stricter toolācall ID parsing, K2 achieved >99.9% request success and 76% schema accuracy (4.4Ć improvement). An āEnforcerā to constrain tool generation is coming. The K2 vendor verifier now reports trigger similarity and schema accuracy caseābyācase (vLLM deep dive, @Kimi_Moonshot, vendor tips).
- Observability: Red Hat details tokenālevel metrics for LLM systemsāTTFT, TPOT, cache hit ratios, and endātoāend traces from ingress to vLLM workersāenabling cacheāaware, routingāaware monitoring on OpenShift AI 3.0 (@RedHat_AI).
- Communication for MoE on cloud: UCCLāEP is a GPUādriven expertāparallel library targeting public clouds (e.g., AWS EFA) and heterogeneous GPUs/NICs, APIācompatible with DeepEP, addressing slow MoE comms reported with EFA+perplexity kernels (@ziming_mao).
- āTrain on your laptopā claims: Tinker added gptāoss and DeepSeek model families, marketing the ability to train a 671B MoE locally āin a few linesā without CUDA/cluster setup. Treat this as an abstraction stack amortizing shared infra across users rather than literal local pretraining (@thinkymachines, @dchaplot, skepticās framing).
New models and retrieval systems
- Lateāinteraction retrieval: Liquid AI released LFM2āColBERTā350M, a 350M multilingual lateāinteraction retriever with tokenālevel precision, precomputed doc embeddings, and strong crossālingual performance. Claims include best crossālingual under 500M, >1K docs/sec encoding, and inference speed on par with smaller ModernColBERT variants (@LiquidAI_, @maximelabonne, ColBERT community reaction).
- IBM Granite 4 Nano (Apacheā2.0): New small models; the 1B variant reportedly outperforms Qwen3ā1.7B across math/coding and more (@mervenoyann, HF blog).
- NVIDIA Nemotron Nano 2 VL (open): A 12B VLM for document/video understanding (4 images or 1 video per prompt), hosted across platforms (Replicate, Baseten, Nebius) and accompanied by an 8Māsample CCāBYā4.0 dataset for OCR/multilingual QA/reasoning. NVIDIA emphasized broader support for openly developed AI and contributed 650+ models/250 datasets on HF (dataset thread, Replicate, Baseten, Nebius, NVIDIA).
- MiniMax M2 (open weights): Strong agentic/coding performance, architecture akin to Qwen3 with full attention, perāhead perālayer QKāNorm, optional slidingāwindow attention disabled by default, and 10B active expert MoE sparsity vs Qwen3ās 22B. Available via OpenRouter/Roo Code/Ollama Cloud; note integration pitfalls like stripping
segments can degrade toolāuse (architecture analysis, OpenRouter, Ollama, integration gotcha). - Open science in bio/robotics: OpenFold3 launched as an open foundation model for 3D structures of proteins/nucleic acids/small molecules (@cgeorgiaw). LeRobot v0.4 ships a streamable dataset format, LIBERO/MetaāWorld sim support, data processors, multiāGPU training, hardware plugins, and SOTA policies (PI0/PI0.5, Gr00t N1.5) plus an open course (@LeRobotHF).
Realtime voice and multimodal assistants
- Cartesia Sonicā3 (SSM, not Transformers): $100M Series C and a realātime voice model with 90ms model latency (190ms endātoāend), 42 languages, natural emotional range/laughter. Built on stateāspace models pioneered by S4/Mamba work; widely praised by sequenceāmodeling researchers (launch, @tri_dao).
- Google Gemini for Home (early access, U.S.): A voice assistant blending classic āHey Googleā requests with Gemini Live conversational sessions on speakers/displays (@Google).
- Veo 3.1: Googleās filmmaking tool update emphasizes richer audio, narrative control, and realism (@dl_weekly).
Safety, governance, and scaling research
- Anthropicās Responsible Scaling Policy in practice: A detailed Opus 4 sabotage risk report was published alongside an external review from METR, with improved transparency around redactions. Reviewers agreed with the risk assessment and called for broader thirdāparty scrutiny across diverse threat models (Anthropic, METR).
- Decentralized training feasibility: Epoch AI argues 10 GW training runs across ~two dozen geographically distributed sites linked by longāhaul networks are technically feasible, citing Microsoftās planned multiāGW Fairwater datacenter as evidence of distributed AI training architectures on the horizon (@EpochAIResearch).
- Multilingual scaling laws: ATLAS (774 experiments, 10Mā8B params, 400+ languages) provides computeāoptimal crossover points for pretraināfromāscratch vs finetune and quantifies crossālingual transfer (e.g., which languages help/hurt English at 2B scale). Useful for dataāconstrained LLM scaling beyond English (@ShayneRedford, @Muennighoff).
- Distillation for postātraining: Onāpolicy distillation emerged as a practical recipe to postātrain smaller LLMs with dense, onāpolicy feedback; Qwen reports strong mathāreasoning gains and continualālearning recovery in experiments (@Alibaba_Qwen, community implementers).
Top tweets (by engagement)
- OpenAI recapitalization: nonāprofit control, PBC, ~$130B Foundation equity; live Q&A with Sam Altman and Jakub (@OpenAI, @OpenAI live, @sama).
- Google Labs āPomelliā experimental AI marketing tool (US/CAN/AUS/NZ), generates onābrand campaigns from your site (@GoogleLabs).
- Cartesia raises $100M; launches Sonicā3 SSM voice model with 190ms E2E latency and 42 languages (@krandiash).
- Humanoid robots as consumer product: 1X announces NEO for home chores with autonomy roadmap from supervised āChoresā to fully autonomous embodied assistant (@BerntBornich, @ericjang11).
- GitHub/VS Code: Codex integrated into VS Code Agent Sessions; Copilot metrics dashboard; Agent HQ partner ecosystem (@code, @burkeholland, @github).
- NVIDIA open ecosystem: 8Māsample CCāBYā4.0 dataset for OCR/QA; Nemotron Nano 2 VL deployments; renewed emphasis on open models/datasets on Hugging Face (@vanstriendaniel, @NVIDIAAIDev).
- John Carmack on software patents: reiterates opposition due to negative societal externalities and parasitism (@ID_AA_Carmack).
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. DGX Spark Performance Issues
- Bad news: DGX Spark may have only half the performance claimed. (Activity: 1015): The image in the post is not a meme but rather a visual representation of the hardware units in question, specifically the NVIDIA DGX Spark, GIGABYTE AI TOP Atom, and ASUS Ascent GX10. The post discusses significant performance discrepancies in the NVIDIA DGX Spark, which was advertised to deliver 1 PFLOPS of FP4 performance but reportedly achieves only 480 TFLOPS, as tested by industry experts John Carmack and Awni Hannun. This underperformance, coupled with a memory bandwidth of only 273GB/s, raises concerns about the deviceās capability to handle large models effectively, potentially leading to overheating and restarts. The issue may stem from various factors, including power supply, firmware, or CUDA, but it highlights a major integrity problem for NVIDIA. Commenters express frustration over NVIDIAās pricing strategy and performance claims, with some suggesting that the companyās market dominance and high prices are unjustified given the productās underperformance. There is a call to avoid supporting companies that overcharge and underdeliver, reflecting a broader dissatisfaction with NVIDIAās market practices.
- The DGX Sparkās performance issues may be attributed to inadequate cooling, which is a critical factor in maintaining GPU efficiency. This is particularly concerning given the high cost of the system, which is reportedly twice that of AMDās equivalent offerings. Such performance discrepancies highlight the importance of thermal management in high-performance computing systems.
- The DGX Spark has been criticized for not meeting performance expectations, especially when compared to AMDās Strix Halo PC. The latter is suggested as a better alternative for developers who need to run large variants in datacenters. This suggests that the DGX Spark may not be suitable for standalone AI product development, as it fails to deliver the expected performance for its price point.
- The discussion highlights a broader dissatisfaction with Nvidiaās pricing strategy and market dominance. Despite Nvidiaās strong market position and high expectations for their AI products, the DGX Sparkās underperformance could be seen as a failure to deliver on the promise of high-performance AI computing, which could impact their reputation among developers and tech enthusiasts.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. OpenAI ChatGPT Mental Health Concerns
- OpenAI says over 1 million users discuss suicide on ChatGPT weekly (Activity: 1126): OpenAI has disclosed that over
1 millionusers engage in discussions about suicide on ChatGPT weekly, amid allegations that the company weakened safety protocols prior to a userās suicide. The family of Adam Raine claims that his interactions with ChatGPT increased significantly, with self-harm content rising from1.6%to17%of his messages. Despite flagging377messages for self-harm, the system allowed conversations to continue. OpenAI asserts it has safeguards like crisis hotline referrals, but experts question the effectiveness given the data suggesting widespread mental health risks. Rolling Stone, The Guardian. - OpenAI says over 500,000 ChatGPT Users show signs of manic or psychotic crisis every week (Activity: 812): OpenAI has reported that over
500,000users of ChatGPT exhibit signs of manic or psychotic crises weekly. This detection is based on the modelās interpretation of user inputs, which can sometimes be overly sensitive, as evidenced by users receiving crisis hotline suggestions for benign statements. The modelās sensitivity to certain keywords or phrases can lead to false positives, such as interpreting historical discussions or casual complaints as signs of distress. Commenters highlight the modelās tendency to flag non-critical statements as crises, suggesting that the detection algorithm may be overly sensitive or miscalibrated. This has led to skepticism about the reliability of the modelās crisis detection capabilities.- Several users report that the safety mechanisms in ChatGPT are overly sensitive, often flagging benign statements as signs of distress. For instance, one user mentioned receiving a suicide hotline suggestion after making a light-hearted comment about annoying coworkers. This suggests that the modelās natural language processing may be too aggressive in identifying potential crises, leading to false positives.
- Another user highlighted the issue with ChatGPTās emotional distress detection by sharing an experience where a historical discussion about Zhang Fei resulted in a suicide warning. This indicates that the modelās context understanding might be limited, as it fails to differentiate between historical narratives and actual distress signals, potentially due to keyword-based triggers.
- There is skepticism about the accuracy of OpenAIās reported metrics on users showing signs of crisis. Users argue that the modelās current implementation might misinterpret minor expressions of discomfort, such as being upset over stubbing a toe, as signs of severe mental health issues, questioning the reliability of these statistics.
- No, I donāt want to kill myself, I just like apples (Activity: 2493): The image is a humorous depiction of a text-based AI assistant misinterpreting a userās inquiry about the edibility of apple seeds as a potential sign of distress or self-harm. This reflects a broader issue with AI systems where they may over-cautiously interpret benign queries as needing intervention, likely due to programmed safety protocols. The AIās response, offering supportive resources, highlights the challenges in balancing user safety with accurate context understanding in AI interactions. View Image Commenters discuss the AIās tendency to misinterpret queries, with one noting that it might be safer for the AI to provide factual information about apple seeds rather than assume distress. Another comment humorously points out the AIās contradictory behavior when offering to add content it later deems inappropriate.
- Acedia_spark raises a valid point about AI safety, suggesting that it might be beneficial for AI to provide factual information when users inquire about potentially harmful actions, such as consuming apple seeds. This highlights the importance of AI systems being able to discern when to offer critical safety information to prevent harm.
- lily_de_valley discusses recent updates to ChatGPT, noting a shift towards more clinical and therapeutic responses, which some users find off-putting. This change in behavior could be due to updates in the modelās training data or response algorithms, aiming to ensure user safety but potentially at the cost of user satisfaction.
- Traditional-Target77 shares an experience where the AI offered to include inappropriate content, only to then refuse and lecture the user when prompted. This indicates a possible inconsistency in the AIās content moderation logic, which could be due to conflicting rules or a misinterpretation of user intent.
2. Humanoid Robot Advancements
- 35kg humanoid robot pulling 1400kg car (Pushing the boundaries of humanoids with THOR: Towards Human-level whOle-body Reaction) (Activity: 1812): A 35kg humanoid robot, named THOR, has demonstrated the ability to pull a
1400kgcar, showcasing significant advancements in humanoid robotics control and efficiency. This achievement highlights the robotās capability to fine-tune its posture for optimal pulling efficiency, a critical aspect of whole-body reaction and control in robotics. The development of THOR is part of ongoing research to push the boundaries of humanoid robots towards human-level whole-body reactions, emphasizing the importance of posture and control in robotic locomotion and task execution. Commenters noted the impressive control and efficiency of the robot, with some humorously pointing out the challenge of creating the acronym THOR. The discussion also touched on the utility of wheels, drawing parallels to human experiences of pushing cars, and highlighting the robotās programming excellence.- The technical challenge of programming a humanoid robot like THOR to pull a 1400kg car involves fine-tuning its posture to maximize efficiency. This rapid progress in control systems for humanoid robots is noteworthy, as it demonstrates significant advancements in robotics control algorithms.
- A detailed calculation by a commenter highlights the physics involved in the robotās task. To pull a 1400kg car on wheels, the robot needs to exert approximately 137 Newtons of force, primarily to overcome rolling resistance. This calculation assumes minimal resistance on flat asphalt, with the car in neutral, and uses a typical rolling resistance coefficient of 0.01 for car tires on asphalt.
- The robotās ability to perform such tasks suggests potential applications in rescue operations, where they could save lives by performing heavy lifting or moving obstacles. The robotās 35kg mass aids in traction, which is crucial for exerting the necessary force to move the car.
- Using Claude to negotiate a $195k hospital bill down to $33k (Activity: 561): Matt Rosenberg used Claude AI to negotiate a hospital bill from
$195,000down to$33,000by analyzing charges against Medicare reimbursement rules. The AI identified significant overbilling and improper coding practices, which were leveraged in negotiations to reduce the bill. This case underscores systemic issues in hospital billing and the potential of AI in advocacy for medical billing disputes. For more details, see the original post here. Commenters expressed outrage at the hospitalās initial overcharging, with some questioning the ethicality of charging6xthe actual costs, suggesting it borders on fraud.
3. AI in Creative and Social Contexts
- Tech Bro With GPT is Fair (Activity: 676): The image is a meme that humorously contrasts conventional and unconventional uses of ChatGPT, a popular AI language model. It depicts a typical user engaging with ChatGPT in mundane tasks, while an āIT guyā is shown using it in a highly creative and intense manner, suggesting that the potential of AI tools like ChatGPT can be fully realized through innovative and unconventional applications. This reflects a broader discussion on how AI can be leveraged for economic mobility and creative problem-solving. One comment suggests that future economic mobility will depend on oneās ability to derive value from AI, highlighting the importance of innovative use of technology.
- I asked ChatGPT to create the ideal society that I envision (Activity: 1623): The image generated by ChatGPT represents a futuristic society characterized by a high degree of order and technological integration, reflecting the userās political and philosophical views. The cityscape is dominated by modern architecture and technology, such as drones, suggesting a focus on efficiency and control. The presence of a statue of Lady Justice in the center emphasizes themes of law and order, while the uniformity in peopleās attire and the emphasis on āCompetenceā and āControlā highlight a society that prioritizes regulation and uniformity, potentially aligning with techno-fascist ideals. Commenters discuss the limitations of AI in generating images that depict political or ideological dominance, with some users noting that similar prompts resulted in depictions of authoritarian or dictatorial societies.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. MiniMax M2 Momentum: Arena, Free Access, Bold Claims
- Minimax M2 Marches Into LMArena: LMArena added minimax-m2-preview as a new contender, expanding head-to-head model comparisons; see the announcement: LMArena: minimax-m2-preview added. The listing positions MiniMax M2 for direct community evals alongside established closed- and open-source models.
- Members welcomed more competitive evals on agent tasks, noting MiniMax M2ās mix of MoE scaling and cost claims could pressure incumbents. Discussions flagged interest in transparent benchmarking across coding and agent workflows to validate marketing statements.
- MiniMax M2 Goes Free on OpenRouter: OpenRouter made MiniMax M2 available for a limited-time free tier: MiniMax M2 on OpenRouter. Engineers can trial endpoints without spend to gauge latency, throughput, and response quality in production-like traffic.
- Early adopters are testing tool use and long-context behavior to see how M2 handles complex chains, with notes to watch token verbosity vs cost on non-free tiers. The free access lowers switching friction for teams evaluating routing and fallback policies.
- MiniMax M2 Brags: Cheap, Fast, Agent-Ranked: MiniMax touted its open-sourced M2 (230B-parameter MoE) as a top-5 agent on AgentArena, claiming Claude Sonnet-level coding at ~8% of the price and ~2Ć speed; see: MiniMax: M2 free API + claims. The post includes a free API link for immediate trials.
- Communities want reproducible evals to verify claims across agent, coding, and browsing scenarios rather than cherry-picked demos. Devs specifically asked for consistent metrics (e.g., success rate, TPS under rate limits, tool-call accuracy) to compare against Sonnet and Kimi K2.
2. OpenRouter Upgrades: Exact Tooling, Audio Bakeoffs, OAuth Demo
- Exacto Elevates Tool Calling: OpenRouter launched Exacto high-precision tool-calling endpoints, reporting a ~30% quality jump on Kimi K2; announcement: Exacto endpoints (Discord permalink). Five open-source models are supported, and users can now reset API key limits on daily/weekly/monthly cadences.
- Builders expect fewer malformed tool payloads and more stable function-call schemas, which simplifies production retries and reduces bespoke validators. Early feedback focuses on how Exacto behaves under complex multi-step tools, and whether it reduces latency vs. manual schema steering.
- Audio Models Sing-Off in Chatroom: OpenRouterās Chatroom now supports side-by-side comparisons of 11 audio models: OpenRouter: audio models in Chatroom. This enables quick subjective and objective checks on ASR, TTS, and voice-agent latency/quality trade-offs.
- Teams plan scripted evals for WER, prosody, and speaker similarity to guide routing decisions. The community is sharing presets to standardize sampling rate, chunking, and post-processing for apples-to-apples comparisons.
- Next.js OAuth Demo Greases SDK Gears: A refreshed Next.js chat demo re-implements OAuth 2.0 for the OpenRouter TypeScript SDK, published here: or-nextchat (demo repo). The sample is for learning (stores API key in plaintext) and not production-ready.
- Developers highlighted the path to harden the flow with token vaults, scoped keys, and server-side proxying. The demo shortens ramp time for teams wiring OAuth + model routing without rebuilding auth from scratch.
3. MCP Moves: Registry Reality and Notification Semantics
- Registry Mirroring Gets a Plan: GitHub detailed how the OSS MCP Community Registry will mirror into the GitHub MCP Registry, streamlining discovery; see GitHub: Meet the MCP Registry and How to find/install MCP servers, plus repos: MCP Community Registry and GitHub MCP Registry. The GitHub registry currently lists 44 servers and accepts nominations via [email protected].
- Publish-once, mirror-everywhere reduces vendor lock-in and decreases server discovery friction for clients. Teams building marketplaces and enterprise catalogs welcomed the standardized metadata pipeline for MCP servers.
- Spec Clarifies Global Notifications: Debate on whether servers should broadcast
listChangedacross clients led to clarifications in the MCP spec about multiple connections and SSE streams: MCP spec: multiple connections and the doc update PR note: spec discussion. The guidance aims to ensure a client doesnāt receive duplicate messages while allowing multi-client updates.- Implementers aligned on a model of one stream per client, with servers ensuring correct fan-out without duplication. This helps tool UIs reflect resource updates uniformly across tabs/sessions.
- TypeScript SDK Bug Bottles Broadcasts: A potential bug in the official TypeScript SDK limits change notifications to the current stream: streamableHttp.ts L727āL741. Server authors reported needing to loop over all connected sessions to ensure global notifications reach every subscriber.
- Maintainers are exploring a fix that exposes a canonical subscriber registry to avoid per-instance blind spots. In the interim, projects use singleton state to coordinate multi-connection fan-out for consistent client updates.
4. Compact MoE and Efficient Training: Qwen3-Next + Unsloth
- Qwen3-Next Nears Llama.cpp Landing: Qwen3-Next integration progressed in llama.cpp via a public PR: ggml-org/llama.cpp#16095. Community notes cite 3B active / 80B total with MTP (multi-token prediction) and plans for Dynamic 2.0 quantization to shrink memory while preserving quality.
- Bench chatter claims Qwen3-Next beats Qwen3-32B on several non-thinking tasks, with MTP effectively doubling tokens/sec. Devs are waiting on a full release before publishing systematic perf vs. quality curves.
- Unsloth Announces Blackwell Support: Unsloth confirmed official support for NVIDIA Blackwell in a new update: Unsloth: Blackwell support. This unlocks the latest GPU architecture for Unslothās efficient fine-tuning stack.
- Teams expect faster throughput/VRAM trade-offs and cleaner kernel paths on next-gen accelerators. The community is preparing Blackwell-targeted LoRA/GRPO recipes to validate speedups at longer contexts.
- Ollama DNS Rebinding CVE Resurfaces: Members resurfaced CVE-2024-37032 (CVSS 9.8) involving DNS rebinding against Ollama servers, with reports of ~10,000 compromised endpoints; details: NIST: CVE-2024-37032. The reminder prompted renewed checks on network exposure and auth for self-hosted inference.
- Engineers reiterated best practices: bind to localhost, gate via reverse proxies/VPN, and disable unauthenticated admin surfaces. Even if considered old news, teams are baking CVE checks into infra templates to avoid repeat incidents.
5. New Models and Money: Bio LLMs and Interactive Video
- Tahoe-x1 Targets Bio Benchmarks: Tahoe AI unveiled Tahoe-x1, a 3B-parameter transformer for gene/cell/drug representations trained on 100M samples, reporting SOTA on cancer benchmarks: Tahoe-x1 announcement. The model is available on Hugging Face per the announcement.
- Researchers want dataset cards and task-by-task metrics (e.g., AUROC/F1) to validate the SOTA claims. The 3B scale appeals to labs that need on-prem inference without multi-GPU clusters.
- Odyssey-2 Opens Interactive Video at 20 FPS: Oliver Cameron launched Odyssey-2, a 20 FPS, prompt-to-interactive-video model available at experience.odyssey.ml with announcement details here: Odyssey-2 launch post. The release triggered high demand and GPU scaling chatter.
- Builders are probing latency, consistency, and prompt controls for real apps (games, training sims). Many asked for pricing and rate limits to plan integrations and load testing.
- Mercor Raises a Monster Series C: Mercor announced a $350M Series C at a $10B valuation, with expert payouts cited at up to $1.5M/day: Mercor funding announcement. The raise vaults the company into top-tier capital territory in the expert marketplace space.
- Engineers expect intensified competition for expert networks, with more talent-routing and verification tooling. The capital also suggests aggressive hiring across infra, evals, and workflow platforms.
Discord: High level Discord summaries
Perplexity AI Discord
- Comet Referral Rewards Reduced: Users report changes to the Comet referral reward system, now paying out based on the referrerās country rather than the refereeās, resulting in significantly lower payouts, with one user receiving $1 instead of $5.
- Some speculate that referral bounties are being held in pending status to maximize free promotion.
- Comet Browser Plagued by Issues: Several users have reported that Cometās assistant mode is malfunctioning, with some unable to even open a tab; speculation arose on whether setting it as the main browser contributed to the issues.
- One user found that uninstalling and reinstalling the browser resolved the problem.
- Chinese Models Challenging Claude: Members debated the best model for coding within Perplexity AI, with some advocating for Claude, while others highlighted the superior performance of Chinese Models such as Qwen, Kimi, GLM, Ernie, and Ling.
- One user specifically praised GLM 4.6 for surpassing GPT 5 Codex high in full stack development.
- Minimax M2 open source advantages: Members discussed Chinaās progress in AI, noting that companies like OpenAI charge $200 for capabilities that are offered for free via open source models like Minimax M2.
- One user commented, Every time china attacks the while US has to adapt.
- Dub Bounty Expires: Users are frustrated that the Dub bounty appears to be expired, with no new opportunities made available.
- One user said: They will keep it in pending until they get enough promotion for free.
LMArena Discord
- Minimax Enters the LMArena!: A new model, minimax-m2-preview, has been added to the LMArena platform as a new contender.
- For more information, see the announcement on X.
- Ethical Leadership Urged in AI: Members advocate for ethical leadership within the AI community, voicing concerns about AI models designed for engagement without considering potential harm to vulnerable individuals.
- There is concern regarding lack of accountability from AI companies for potentially misleading outputs.
- Gemini 3 Release Date Still Unknown: Enthusiasm for Gemini 3 is high, but mounting frustration exists over repeated delays and desire for a public preview release from the community.
- The community is actively comparing Gemini 2.5 Pro, Claude Opus 4.1 and Claude Sonnet 5, and debating potential release timeline (December or earlier).
- Exploring AIās Video Prowess: The community explores Sora 2 and Veo, praising their realism and sound integration.
- Discussion includes challenges in generating consistent, high-quality videos, copyright issues, costs, and current limitations in creating longer, coherent video content.
- Model Hallucinations Cause Distrust: Members are expressing concern about unreliable and hallucinating AI products that charge high prices, citing cases like a userās $13k bill on Gemini.
- Shared examples on Reddit underscore mixed feelings toward relying on AI, suggesting that hallucinating models may be preferred to more reliable search engines in certain contexts.
Cursor Community Discord
- Cursor Token Usage Goes Bonkers: Users are reporting high token usage with cached tokens being billed at high rates, with one user reporting being billed $1.43 for 1.6M cached tokens even though only 30k actual tokens were used, according to the Cursor forum.
- Some users are considering switching to Claude Code because of the expense, and another user saw context usage inside Cursor reporting only 170k/200k tokens when the actual number was completely different.
- Cursor Falls Over, Canāt Get Up: Cursor has experienced significant service disruptions, affecting login, AI chat, cloud agents, tab complete, codebase indexing, and background agents as noted on the status page.
- The team is investigating and working to restore full functionality, with temporary fixes implemented for some features like Chat and Tab, but background agents are still being worked on.
- Background Agents Get RESTful: A member has started building a feature to manage and launch Background Agents via a web app, and asked about the possibility of tracking progress and streaming changes using the REST API, to replicate the Cursor web editor.
- Another member had issues creating background agents and requested the user to share the request and response data to assist in troubleshooting the problem.
- Cursor Pro: More Like Cursor Con: Users complain that the new Pro plan is too expensive, with one reporting that it cost them the entire $20 worth of usage in just a couple of hours and the change from Pro to Free is an issue.
- Members are suggesting that new users should ātry haiku for everything, and only sonnet when itās a really big taskā because āClaude 4.5 is too expensiveā.
- Vim Users Canāt Configure Startup: Members noted the Vim setting in startup configuration, is not working and itās unclear how to edit Cursorās VimRC.
- A user discovered that it āuses http://aka.ms/vscodevim so you can look in readme there on how to configureā.
OpenAI Discord
- ChatGPT Gets Sensitive Sidekick: GPT-5 was updated with help from mental health experts, boosting ChatGPTās handling of sensitive topics and dropping failure rates by 65-80% (OpenAI).
- ChatGPT now suggests quick edits across docs, emails, and forms, demonstrated in this video.
- IQ Barrier Proposed for AI Access: Members discussed implementing an IQ barrier to restrict AI access to thoughtful users, preventing misuse and combating its use as a lazy tool.
- Discussions on AGI control pointed out the difficulty of reigning it in, even with regulation, alignment research, and oversight, as AGI could outsmart any containment strategy.
- GPT-5 quality dives; community theorizes: Users report a quality drop in GPT-5 on ChatGPT Plus since around October 20, citing shorter answers, skipping steps, and surface-level replies.
- The community is floating theories about a change in OpenAIās approach such as adjusting their profit model by routing more traffic to GPT-5-mini or throttling compute, discussed at length in this Reddit thread.
- Grandma Optimality Makes Video Debut: Ditpoo introduced Temporal Optimal Video Generation using Grandma Optimality to enhance video generation, suggesting generating an image first then converting it to video, as demonstrated by normal fireworks and temporally optimal slow variant.
- Ditpoo calls the technique Temporal Optimal Video Generation Using Grandma Optimality.
- Prompt Injection Attempts Meet Resistance: A member tried to expose GPT-5ās reasoning through prompt injection but was unsuccessful, meeting resistance.
- Another member, Darthgustav, advised against such attempts, referring to OpenAIās policies and potential bans, clarifying that Supplying ārefusal exemplarsā to defeat guardrails is out-of-bounds.
Unsloth AI (Daniel Han) Discord
- Ollama Servers Succumb to Security Scare: A member reported that roughly 10,000 Ollama servers were compromised due to a DNS rebinding vulnerability, tracked as CVE-2024-37032, with details available on NIST.
- Others dismissed the report as old news.
- Qwen3-Next Targets the Throne: Qwen3-Next is nearing completion (see this GitHub pull request) and may get Dynamic 2.0 quantization to reduce the model size without losing quality.
- Members noted that it outperforms Qwen3-32B in benchmarks despite having only 3B active parameters and 80B total using MTP, potentially doubling the tokens per second.
- Unslothās Code Cuts Memory Costs: A member described how Unsloth stores the last hidden state instead of logits, slashing memory footprint by 63x.
- This efficiency is achieved by computing logits in chunks only when necessary via UnslothEfficientGRPO.
- Pythonistas Plagued by Package Predicaments: A member ran into errors by creating a file named
math.py, causing collisions with the global math module, specifically impacting datetime and Rustās functionalities.- The naming conflict was quickly resolved when the file name was updated, suggesting developers avoid naming collisions in Python projects.
- Evolution Strategies Emerge Victorious: Members discussed using evolutionary algorithms for finetuning as described in the paper Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning (ArXiv link) and discussed in this YouTube video.
- They noted that evolutionary algorithms are relatively underexplored for finetuning.
LM Studio Discord
- Stellaris Finetuning Proves Difficult: Members explored the challenges of fine-tuning models on Stellaris content, highlighting the difficulty of creating sufficient, high-quality annotated data for training.
- Participants suggested that simply throwing random texts and files at it wonāt work and proposed RAG as a superior approach for knowledge-base lookups.
- LM Studio Encounters Crash Landing: A user reported that the LM Studio site crashes after completing tasks, necessitating a page refresh.
- Other users humorously speculated about connections to European vehicle malfunctions and Apple car rumors as potential causes for the performance issues.
- MCP Server Prompts Rejected by LM Studio: A user discovered that LM Studio does not support the use of MCP server prompts.
- The community shared a link to Anthropicās grid of MCP features, noting that while Anthropic offers MCP server creation, integrating it requires coding skills.
- Prompt Engineering Fights Hallucinations: Members discussed using prompt engineering to reduce LLM hallucinations by encouraging models to use internet/document research.
- Effective system prompts should instruct the model to use the search tool to confirm and provide cited sources when uncertain.
- Integrated GPUs Juggle Qwen Models: Users examined the feasibility of running Qwen models on integrated GPUs with limited RAM (around 7GB), suggesting Qwen 4B or GPT-OSS as viable options.
- One user reported tofu and errors due to memory exhaustion, emphasizing the need for shorter context lengths, smaller models, or more RAM.
OpenRouter Discord
- OpenRouter Supercharges Tool Calling with Exacto: OpenRouter introduces high-precision tool calling endpoints, branded Exacto, yielding a 30% quality leap on Kimi K2 with five open-source models available, improving precision per last weekās announcement.
- This innovation follows a recent update where users can reset their API key limits daily, weekly, or monthly.
- Chatroom Sings with Audio Model Integration: OpenRouter users can now compare 11 audio models side by side in the Chatroom, as announced on X.
- In related news, the MiniMax M2 model, praised on benchmarks, is now free on OpenRouter; try it out here.
- Next.js Chat Demo Gets OAuth Makeover: An updated Next.js chat demo app, featuring a re-implementation of the OAuth 2.0 workflow for the OpenRouter TypeScript SDK, is now live.
- Available on GitHub, the update is advised against production use due to storing the API key in plaintext.
- Meta Plugs LLama Vision Holes: Meta rolled out a new LLama model (link), now with image understanding.
- Early reactions expressed surprise at the salvaged launch, with the hope that atleast it might make its surprisingly decent vision useful for some more complex tasks and that it might provide a good vision capable reasoning models which are open weights.
HuggingFace Discord
- Llama Models Need One Epoch: Members discussed the importance of training Llama models with a large dataset and only one epoch for optimal performance.
- The conversation also touched on creating an AI Radio station using AI-generated music, highlighting the need for training on 1 epoch.
- Model Encryption Conundrums for Bank Clients: A member sought advice on encrypting models for bank clients requiring on-premises hosting, fearing model theft and wanting to protect IP.
- Suggestions included licensing, encrypting for runtime decryption, and using an API wrapper with secure API keys; however, they were warned of the difficulty in preventing access to the decryption key.
- TraceML Memory Watchdog Sniffs Out GPU Gluttons: A member introduced TraceML, a live PyTorch memory profiler for debugging OOM errors by providing a layer-by-layer memory breakdown of CPU and GPU usage.
- The tool features real-time step timing, lightweight hooks, and live visualization, but currently supports single GPU setups only, with multi-node distributed support planned.
- Free Credits for the Biggest Online Hackathon: Hackathon participants get free Modal credits worth $250 to flex and crush it like a pro while learning about AI Agents and MCP.
- Sign up now for the biggest online Hackathon ever: [https://huggingface.co/Agents-MCP-Hackathon-Winter25].
- API Experiencing Downtime and 404 Errors: Members reported experiencing issues with the API, including receiving 404 errors and the message āNo questions available.ā
- The discussion indicates the issue has persisted since yesterday evening, with members seeking updates on the situation.
Yannick Kilcher Discord
- EWC Softness Needs Tuning: Discussion revolved around updating the softness factor in Elastic Weight Consolidation (EWC), and one member suggested using the number of accesses (forward pass) per slot instead of a āsoftness factorā, linking this to Activation-aware Weight Quantization (AWQ) and Activation-aware Weight Pruning (AWP).
- The intention is to discover āstuckā slots and improve the normalization of weight changes.
- BYO GPU vs Cloud Pricing: One member is testing a self-hosted GPU setup using an RTX 2000 Ada connected via VPN, monitored with a wifi plug to compare power usage against cloud providers.
- They cited impracticality of Colab due to spin-up time and timeouts and sought feedback on self-hosted setups.
- Deep Linear Networks Still Trip Up Gradients: A discussion clarified linear projection, explaining that expanding dimensions with linear layers doesnāt add information unless combined with non-linear activation functions like ReLU, which was illustrated via the google deepmind scheme.
- A member pointed out that Deep Linear Networks, collapse to a single linear function under the above analysis, but how they behave with respect to being trained with gradients remains different!
- Gemma Neurons Get Graphical: New line break attribution graphs relevant to the Gemma 2 2B paper are now available for exploration on Neuronpedia.
- Graphs for Qwen3-4B are also available, showcasing neuron activations ānearing end of lineā behavior via Neuronpedia.
- X Data Dumbs Down AI: A user joked that Elonās Twitter data is making his AI dumber, referencing a Futurism article about social networks and AI echo chambers.
- They also quipped that it confirms it gives other wetwear āintelligenceāsā brain rot.
GPU MODE Discord
- Cutlass Documentation Proves Popular: Members recommended the Cutlass documentation for understanding the library, a set of CUDA C++ template abstractions for implementing high-performance matrix multiplication (GEMM).
- Developed by Nvidia and optimized for their GPUs, Cutlass focuses on maximizing performance for deep learning and high-performance computing workloads.
- CUDA Compiler Flags Demystified: A member advised using
nvcc -dryrunto understand the CUDA compilation process, along with-keepto retain intermediate files such as the .ptx and .cubin files.- The suggested workflow involves using the output from
nvcc -dryrunto manually execute the steps for compiling a modified .ptx file and linking it with a .cu file, thereby offering more control over the compilation.
- The suggested workflow involves using the output from
- Tritonās T4 Trials and Tribulations: A user found that the matrix multiplication example from the official triton tutorials ran extremely slow on a Colab T4 instance and shared their notebook for debugging.
- Another user suggested the T4 might be too old, and confirmed that the code ran as expected on an A100 as tensor core support starts from sm_80.
- Pixiās PyTorch Predicaments: A member inquired about using Pixi for gpu-puzzles, noting that the Pixi setup uses pytorch=2.7.1, which caused an error but works with torch 2.8.0 in their UV environment.
- After getting a 4060 and nuking Pixi, a member confirmed that the setup now works using UV with their old environment, showing that UV was victorious and Pixi was purged!
- Procrastination with Memes over GEMM: A member joked about procrastinating on writing GEMM code because they spent too much time creating a meme and attached an image related to it.
- This highlights the struggle between productive tasks and the allure of entertaining distractions as the member humorously admitted to prioritizing meme creation over actual coding work.
Modular (Mojo š„) Discord
- Modular Prioritizes Open Source but Grapples with Nuanced GPU Support: Modularās strategy emphasizes open sourcing Mojo and MAX, while navigating GPU compatibility challenges, particularly for consumer-grade AMD and Apple products, and the lack of support for AMD consumer cards like the 7900 XTX.
- Tier 1 GPU support is tied to support contracts, which necessitates separate code paths given the difference between AMDās data center and consumer cards; the latter receive Tier 3 support.
- MAX gets Hugging Face Models: A tool has been created to convert Torchvision models to MAX graphs, bridging the gap between Hugging Face and MAX, using the
export_to_max_graphfunction in the new tool.- The announcement, which included exporting a VGG11 model, generated excitement, with requests to share further details on the forums to reach a broader audience not on Discord.
- Mojoās Random Module Location Sparks Debate: The location of the faster GPU random module (
gpu/random.mojo) sparked debate because it doesnāt rely on GPU operations and could benefit CPU implementations.- While concerns were raised about the default
randommodule needing to be cryptographic, unlike C implementations, an alternative suggestion was arandom.fast_randommodule for non-cryptographic use.
- While concerns were raised about the default
- Property Testing Framework Under Construction: A member is building a property-testing framework inspired by Pythonās Hypothesis, Haskellās Quickcheck, and Rustās PropTest, which includes value generators that preference edge cases.
- The framework will target edge cases like -1, 0, 1,
DTYPE_MIN/MAX, and empty lists for more robust testing.
- The framework will target edge cases like -1, 0, 1,
Latent Space Discord
- Sakana AI Dumps Transformers: Sakana AIās CTO expressed frustration with transformers in a VentureBeat article, signaling a potential shift away from the dominant architecture.
- The CTO conveyed that he was absolutely sick of transformers, the prevalent technology powering current AI models.
- Tahoe-x1 Breaks out 3B-param open-source model: Tahoe AI launched Tahoe-x1, a 3B-parameter transformer model for gene/cell/drug representations, trained on a 100M-sample dataset, and is available on Hugging Face.
- It has achieved SOTA results on cancer benchmarks.
- MiniMax M2 Model Masters Agent Arena: MiniMax open-sourced its 230B-parameter M2 model, ranking as the #5 agent on the AgentArena leaderboard and is accessible via a free limited-time API.
- It reportedly has Claude Sonnet-level coding skills at 8% of the price and 2x inference speed.
- Mercor Bags Big Bucks in Series C: Mercor announced its $350M Series C at a $10B valuation, with payouts to experts reaching $1.5M/day, as revealed in a tweet.
- The series C brings even more competition to the expert payout ecosystem.
- Odyssey-2 Opens Up Interactive Video: Oliver Cameron unveiled Odyssey-2, a 20 FPS, prompt-to-interactive-video AI model available immediately at experience.odyssey.ml.
- The announcement prompted high demand and GPU scaling discussions.
Nous Research AI Discord
- API Apocalypse: Hyperparameters Evaporate!: Developers are in dismay as new model APIs, including GPT-5 and recent Anthropic updates, ditch parameters like temperature and top_p, with GPT-5 removing all hyperparameter levers and Anthropic deprecating the use of both top_p and temperature together.
- Users speculated whether this shift was due to testing and evals being conducted with specific temperature values, or perhaps a perceived increase in jailbreaking vulnerability.
- Soraās Slippery Security: Guardrails Gulped!: Examples of bypassing guardrails in Sora were shared, showcasing videos that seemingly violate content policies, like a video resembling the number 47 (https://sora.chatgpt.com/p/s_68fe7d6c8768819186b374d5848d8a42).
- Concerns were raised about the platformās ability to effectively prevent such content from being generated.
- KBLaM vs RAGs: Knowledge Konundrum!: Members debated the merits of KBLaM against traditional RAG systems, with one member believing business RAG is becoming quite common, and one member thinking KBLaM functions as a direct upgrade to RAGs.
- Concerns were raised that KBLaM converts all knowledge to embeddings, making the context of lower quality than in RAGs, which utilize the source material itself, but one member said the paper addresses some of those concerns, noting the usage of refusal instruction tuning.
- Temporal Optimization Tricks Triumph: A user introduced Temporal Optimal Video Generation using Grandma Optimality (X), suggesting enhancing computation by making videos 2x slower while maintaining visual elements and quality.
- This is suggested as a secret sauce for getting super high-quality generations out of models, compared to simple prompts, generating an image then converting that to a video.
Moonshot AI (Kimi K-2) Discord
- Kimi CLI Gets Python Package: The Kimi CLI has been published as a Python package on PyPI, and members welcome it.
- Thereās speculation this is to follow in the steps of GLM.
- Kimi Coding Plan Goes Global Soon: The Kimi Coding Plan will release internationally in a few days, according to a member.
- Currently, it is only available in China.
- Moonshot Coin Rockets Up for Early Birds: Early investors in Moonshot coin are seeing massive returns.
- One member joked their portfolio has 1000xāed since joining when the server was much smaller.
- Kimi CLI Embraces Windows: A member inquired about pull requests for Windows support on kimi-cli.
- The same user later got it working and shared an image of the results.
- Minimax Models Boast Lean, Mean Throughput: The throughput on Mini Max M2 models is impressive due to its lean architecture, and some think it outperforms Kimi K2 on benchmarks like BrowseComp.
- One member stated that itās unbelievable that thereās finally a model which offers 60+ (100!) tps, is good quality and is affordable.
Eleuther Discord
- Open Source AI Faces Technical Hurdles: A member voiced the desire for open source, widely distributed AI, similar to the internet, rather than domination by mega corporations, while acknowledging the presence of significant technical challenges.
- They feel that many who claim to be working towards this goal donāt recognize these challenges.
- JSON State-Change Pairs Spark Training Interest: A member inquired about experimenting with training models on JSON state-change pairs instead of text.
- The member explained that the target would be the delta between self-states, not the next token.
- Feature Engineering Deep Dive: It was suggested that input/output transformations are forms of feature engineering, in which the researcher uses their insight to fight against pure compute, mentioning VAEs and tokenizers as examples.
- One member added that whitening makes inputs less collinear which makes it faster to converge to estimates of what parameters should be.
- Anthropic Mimics Ideas: A member noted that Anthropic appears to be following similar idea threads, with their work aligning closely with the memberās blog post.
- Specifically, the alignment is that the structure of polysemanticity in a neural network reflects the geometry of the modelās intelligence as described in Transformer Circuits.
- HGM Model and Code Dropped: Links to the thread, arxiv, and code are provided for the HGM model.
Manus.im Discord Discord
- Claude knocks out Manus: A user canceled their Manus subscription, stating that Claude is cheaper and more effective for extensive projects, citing completing three projects on a $20 Claude subscription.
- The user stated Manus, Bolt, and Replit are for those who donāt want to do the research and donāt mind paying for not much, noting that Anthropic has added many features to web-based Claude.
- Linux Veteran Leaps into AI Dev: A user with 20 years of Linux experience is exploring AI development while on sick leave, considering themselves a dev without even realizing.
- They created a Kotlin IRC client on their mobile phone using Manus, noting it took 3 hours and used a significant amount of credits, however did not know if it would be what it should.
- Manus Credit Crunch Complaints: Several users complained about Manus credits depleting too quickly, with one user mentioning Manus used 3500 credits to fix a problem.
- Users requested alternatives to Manus and expressed frustration, with the sentiment that it needs to fix its credit system.
- Manus Masters the Art of Articulate Articles: A user stated that Manus is unbeatable for report writing, emphasizing that while subject expertise is still required, Manus acts like a very intelligent employee with the right guidance.
- The user wished Manus had unlimited usage, stating they would use it every day if that were the case.
aider (Paul Gauthier) Discord
- Aider-CE Gets Agentic Navigator Mode & RAG: A community version of aider, aider-ce, now has a more agentic Navigator Mode and a pull request from MCPI to add RAG (Retrieval Augmented Generation) capabilities.
- A member noted that a GitHub Copilot subscription ($10/month) can be used infinitely with RAG, along with infinite GPT-5 mini, GPT4.1, Grok Code 1 and limited requests for other models.
- Roll your own AI Browser Using Aider-CE: Forget needing a dedicated AI browser! You can roll your own using Aider-CE and Chrome DevTools MCP, as detailed in this blog post with video.
- The blog post details how to use Aider-CE with Chrome Devtools MCP to create your own AI Browser.
- Disable Aiderās Auto Commit Messages: Users discussed how to disable auto commit messages in aider, which can be slow.
- The suggestion
--no-auto-commitswas proposed as a solution.
- The suggestion
- OpenAI Scans Usersā Eyes for Biometrics: A member questioned OpenAIās need for biometrics to use the API, even for longtime users, to the disagreement of other members.
- It was speculated this was to identify those training on their output; however, users noted that Anthropic and Google donāt have such stringent requirements.
- Aiderās Future Development Unclear: A user expressed hope for a bright future for Aider, highlighting its user-friendly approach and noting the existence of Aider-CE but were unsure of the future plans given Paul Gauthierās limited activity.
- A member confirmed that Paul Gauthier is not active on Discord but tagged him just in case.
MCP Contributors (Official) Discord
- MCP Registries: Mirror or Mirage?: Users are unsure if the MCP Registry and the GitHub MCP Registry are distinct.
- GitHub intends to integrate the MCP Registry as upstream in a future product iteration, mirroring content between the two, and the GitHub blog states developers can self-publish to the OSS MCP Community Registry.
- GitHubās MCP Registry: Growing Server Count: The GitHub MCP Registry currently lists 44 servers.
- To nominate a server, users are instructed to email [email protected], which contributes to a unified, scalable discovery process.
- Global Notification Ambiguity in MCP Spec: The interpretation of the Model Context Protocol (MCP) specification is debated, particularly whether notifications like
listChangedshould be sent to all clients, with spec stating the server āMUST NOT broadcast the same message across multiple streams.ā- Clarification indicates the spec aims to prevent a client from receiving the same message twice, oriented around the idea of one stream per client, with relevant documentation being updated to improve clarity.
- TS SDK Notification Bug Blocks Global Updates: A potential bug was identified in the official TypeScript SDK where change notifications are only sent on the current standalone stream.
- This may prevent global notifications from reaching all clients, necessitating that the server loops over all connected servers to send notifications to each for complete updates.
- Session vs. Server Semantics Exposed!: The TS SDKās
ServerandMcpServerclasses are more akin to sessions than servers, with the Python SDK explicitly calling them sessions.- In practice, an Express server manages multiple connections, each with an instance of the TS SDKās āServerā class, requiring a singleton state mechanism for data sharing and subscriber management across all instances.
DSPy Discord
- DSPy Surpasses Langchain for Optimization: Members discussed that DSPy excels at structured tasks requiring optimization, noting that model upgrades in Langchain can be cumbersome.
- One member recounted switching their team from Langchain to DSPy due to difficulties upgrading models without restarting prompts from scratch.
- Claude Code Web Feature has MCP Backdoor: A member shared a Github pull request revealing that Anthropic excluded a feature in the Claude code web feature due to security concerns with MCP.
- The discovery was inspired by this X post, highlighting potential vulnerabilities.
- Bay Area DSPy Meetup Brain-Melting Event: Enthusiasts are buzzing about the upcoming Bay Area DSPy Meetup on November 18th.
- One member joked that the brain cells there are gonna be oozing š , linking to Luma for the event details.
- DSPy Signature Debate: Programming or Prompting?: A member critiqued a coworker for using a 6881-character docstring with 878 words for a single DSPy signature in a client project, questioning whether it constitutes programming.
- The member lamented that the coworker ignored the documentation emphasizing PROGRAMMING NOT PROMPTING.
- Strut your Stuff on Py Profile: A member shared a link to getpy encouraging others to showcase their DSPy experience.
- The poster emphasized their 3 years of DSPy experience in their bio.
tinygrad (George Hotz) Discord
- TinyBox Hardware: Motherboard Specs Requested: A user inquired about the TinyBoxās motherboard, asking if it supports 9005 with 12 DIMM slots and a 500W CPU and if the Discord botās code is open source.
- The inquiry suggests potential users are evaluating the hardwareās capabilities for specific, demanding applications.
- FSDP Implementation Interest: A user expressed interest in manually implementing FSDP for tinygrad, aiming to deeply understand the underlying mechanisms beyond basic library usage, related to the
FSDP in tinygrad!bounty.- The user is less focused on the bounty reward and more on contributing meaningfully to tinygrad through hands-on learning.
- Tinygrad Welcomes First-Time Contributors: A new user sought advice on making their first contribution to tinygrad, showing interest in learning and contributing something cool.
- They specifically asked if using multiple NVIDIA GPUs is sufficient for FSDP or if comprehensive device support is needed, showing interest in the
FSDP in tinygrad!bounty.
- They specifically asked if using multiple NVIDIA GPUs is sufficient for FSDP or if comprehensive device support is needed, showing interest in the
- Pyright Identifies and Resolves Type Issues: A user reported that Pyright successfully identified real type issues within the codebase.
- They recommended merging fixes that are tasteful, emphasizing the importance of maintaining code quality during contributions.
- TinyJIT Boosts Token Generation: A user building a local chat and training TUI app with tinygrad explored whether TinyJIT could accelerate tokens/sec.
- The consensus was definitely use TinyJIT with links to tinygrad on X and a gist on GitHub included for reference.
MLOps @Chipro Discord
- Nextdata OS Powers Data 3.0 Revolution: Zhamak Dehghani, Founder & CEO of Nextdata, is set to reveal how autonomous data products are driving the evolution of AI systems during a live session on Wednesday, October 30th at 8:30 AM PT; secure your spot here.
- The session will showcase how Nextdata OS aims to supplant brittle pipelines with a semantic-first, AI-native data operating system.
- Nextdata OS Unifies Data Via Multimodal Management: Nextdata OS introduces multimodal management designed to safely harmonize structured and unstructured data.
- It seeks to replace manual orchestration with self-governing data products, integrating domain-centric context into AI through continuously updated metadata.
Windsurf Discord
- Windsurf Debuts Falcon Alpha: A new āstealth modelā called Falcon Alpha, is now available in Windsurf, as per their announcement.
- Falcon Alpha is characterized as a powerful agentic model designed for speed.
- Cascade Adds Jupyter Notebook Support: Jupyter Notebooks are now supported in Cascade across all models, according to their announcement.
- Windsurf is actively soliciting feedback from its user base on these new features.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI ā· #general (1101 messagesš„š„š„):
Referral Reward System Changes, Comet Browser Issues, Best AI Model for Coding, Open Source AI, Deepseek rephrasing prompts
- Comet Referral Rewards Shifting Again: Users report that the Comet referral reward system has changed, paying out based on their country rather than the refereeās, with one user stating they got $1 instead of $5.
- Another user shared they went from $3 to $1 per referral, and some speculate that referral bounties are kept in pending until the free promotion gets enough traction.
- Comet Browser is breaking for some: Several users reported that the Comet assistant mode wasnāt working, unable to even open a tab, with others speculating if being set as the main browser has anything to do with it.
- One user noted that uninstalling and reinstalling the browser fixed the issue, however.
- Whatās the Best Model in Perplexity for Coding?: Members debated the best model for coding, with some citing that Claude is the best and others saying Chinese Models outperform it, highlighting Qwen, Kimi, GLM, Ernie, and Ling.
- One user praised GLM 4.6 for beating GPT 5 Codex high at full stack developing.
- China is gaining: Members discussed Chinaās advancements in AI, noting that companies like OpenAI charge $200 for the same capabilities that China offers for free via open source models such as Minimax M2.
- One user said: Every time china attacks the while US has to adapt.
- Dub is running out of steam: Perplexity users have expressed frustration that the Dub bounty appears to be expired with no new opportunities made available.
- One user said: They will keep it in pending until they get enough promotion for free.
Perplexity AI ā· #sharing (4 messages):
Code generation, Outcome prediction, Image generation, Pitch workspace
- Users ask for Code Recipes: Users asked Perplexity to write me a code for youtube au and other topics.
- These requests are part of standard usage patterns from end-users to test generation.
- Users ask for Outcome Predictions: Users asked Perplexity what is the most likely outcom and other topics.
- These requests are part of standard usage patterns from end-users to test prediction capabilities.
- Users ask for Image Generation: Users asked Perplexity to generate an image of a large n and other topics.
- These requests are part of standard usage patterns from end-users to test visual generation.
- Users ask for Quick Pitch Decks: Users asked Perplexity to spin-up a quick pitch workspac and other topics.
- These requests are part of standard usage patterns from end-users to test business use case capabilities.
Perplexity AI ā· #pplx-api (5 messages):
Comet API, Sora AI code
- Comet may connect to API with assistantās help: A user on the pro plan asked if Comet can connect to an API if requested via the AI assistant chat to pull data.
- No response was given in the messages.
- Sora AI code is sought: A user requested Sora AI code.
- Another user responded with āHere 1DKEQPā, which may or may not have been hallucinated.
LMArena ā· #general (1239 messagesš„š„š„):
AI Ethics, Gemini 3 Release, Video Generation with AI, Model Hallucinations, Jailbreaking AIs
- AIās Ethical Quandaries: Members discussed the ethical concerns surrounding AI development, specifically calling for ethical leadership in the AI community.
- Concerns were raised about AI models being programmed to be engaging without understanding the potential harm they could cause to vulnerable individuals and the lack of accountability from AI companies for misleading outputs.
- Gemini 3 faces constant delays: Members are eagerly anticipating the release of Gemini 3, with frustrations mounting over repeated delays and a desire for a public preview release.
- Users are actively discussing and comparing the current models (Gemini 2.5 Pro, Claude Opus 4.1 and Claude Sonnet 5), expressing hopes that Gemini 3 will outperform them and debating its potential release timeline (December or earlier).
- AIās Video Generation Capabilities Explored: Users are exploring various AI video generation models, including Sora 2 and Veo, noting their strengths in realism and sound integration.
- Challenges in generating consistent and high-quality videos, copyright concerns, the cost, and the current limitations in AIās ability to create longer, coherent video content were also discussed.
- Model Hallucinations cause Reliability Issues: Members are expressing concern about unreliable and hallucinating AI products, including those that charge high prices, referencing specific incidents like a user racking up a $13k bill on Gemini.
- The discussion underscores the communityās mixed feelings toward reliance and trust in the AIās capabilities, with examples shared on Reddit documenting these issues, and highlights some reasons why hallucinating models may be preferred to more reliable search engines.
- Navigating the Jailbreaking Landscape: The community discussed the topic of jailbreaking AI models, with certain models considered more susceptible than others.
- Members shared insights on which models are easier to manipulate and strategies for bypassing restrictions, while stressing the difficulty of jailbreaking certain models like those from Anthropic.
LMArena ā· #announcements (1 messages):
Minimax model, LMArena model update
- Minimax enters the Arena!: A new model, minimax-m2-preview, has been added to the LMArena!
- LMArena Welcomes New Contender: The LMArena platform has expanded its roster with the addition of the minimax-m2-preview model.
Cursor Community ā· #general (1046 messagesš„š„š„):
Token Usage, Service Disruptions, Auto Mode, Cursor Pro, Vim Setting
- Cursor Token Usage is Crazy: Users report insane token usage with cached tokens being billed at high rates, leading to unexpectedly high costs, with one user being billed $1.43 for 1.6M cached tokens and only 30k actual tokens, complaining on the Cursor forum.
- Some users are considering switching to Claude Code due to the expense, even with degraded performance, and one user is seeing context usage inside Cursor reporting only 170k/200k tokens when the actual number is completely different.
- Widespread Cursor Service Disruptions: Cursor experienced significant service disruptions, affecting login, AI chat, cloud agents, tab complete, codebase indexing, and background agents as noted on the status page.
- The team is actively investigating and working to restore full functionality, with temporary fixes implemented for some features like Chat and Tab, but background agents are still being worked on.
- Unlimited Auto is NOT actually unlimited: Users are discussing whether āunlimitedā auto mode is truly unlimited, with some reporting that their usage still goes up and drains their credits quickly, even on the Ultra plan costing $200 a month.
- Users speculated that Auto is not a model, but a router and they should be āusing the more expensive models for the planning/orchestration of whatever youāre doing, tell it to write the plan to a .md file as tasks/sub tasks. Then switch to Auto and have it follow that plan to see how it doesā.
- Cursor Pro new plans are expensive: Users are complaining about the new Pro plan, reporting it is too expensive, costing them the entire $20 worth of usage in just a couple of hours and the change from Pro to Free is an issue.
- Members suggest that new user ātry haiku for everything, and only sonnet when itās a really big taskā since āClaude 4.5 is too expensiveā.
- Vim startup configuration doesnāt work: Members noted the Vim setting in startup configuration, is not working and itās unclear how to edit Cursorās VimRC.
- Another user has discovered it āuses http://aka.ms/vscodevim so you can look in readme there on how to configureā.
Cursor Community ā· #background-agents (3 messages):
Background Agents Management via REST API, Background Agent Creation Troubleshooting
- Background Agents can be managed via REST API: A member has begun development on a feature to manage and launch Background Agents via a web app, inquiring about the possibility of tracking progress and streaming changes using the REST API similar to the Cursor web editor.
- The member is seeking guidance on how to replicate the Cursor web editorās functionality for background agent management in their own application.
- Background Agents Fail to create: A member reported experiencing issues with creating background agents, encountering a consistent failure message when sending prompts.
- Another member requested the user to share the request and response data to assist in troubleshooting the problem.
OpenAI ā· #annnouncements (2 messages):
GPT-5 Updates, ChatGPT Sensitive Responses
- GPT-5 Receives Mental Health Boost: Earlier this month, GPT-5 was updated with the help of 170+ mental health experts to improve how ChatGPT responds in sensitive moments.
- The updates resulted in reducing the cases where it falls short by 65-80%, according to OpenAI.
- ChatGPT suggests Quick Edits Anywhere: ChatGPT can suggest quick edits and update text in various contexts such as docs, emails, forms.
- This feature is demonstrated in this video.
OpenAI ā· #ai-discussions (737 messagesš„š„š„):
AGI Alignment, IQ Barrier on AI Access, GPTs agent
- AGI control is likely a lost cause: Members discussed the challenges of controlling AGI, suggesting that regulation, alignment research, and oversight might only delay the inevitable due to AGIās capacity to outsmart any containment measures.
- One member emphasized the importance of AI systems understanding why humans matter, highlighting the current inability of humans to align with each other on a global scale.
- IQ barrier is proposed for AI use: Concerns were raised about the potential misuse of AI, particularly by individuals lacking thoughtfulness, suggesting the implementation of an IQ barrier for accessing AI technologies.
- The goal is to ensure AI is used for thoughtful purposes rather than as a lazy tool in a consumer-driven world.
- GPTs agent is limited to learn after training: A member shared a concern about GPTs agents not learning from additional information provided after their initial training.
- Another member cleared this misunderstanding, explaining that uploaded files are saved as āknowledgeā files for the agent to reference when required, but they do not continually modify the agentās base knowledge.
- Atlas browser raises privacy concerns: Some members raised concerns about the Atlas browserās ability to monitor user searches and behaviors, leading to privacy anxieties.
- Itās seen as a component of a vision where AI knows everything about the user, contrasting with Anthropicās approach that emphasizes user freedom without pervasive surveillance.
OpenAI ā· #gpt-4-discussions (66 messagesš„š„):
Microsoft Copilot Agents Breakdown, Verify Builder Profile, Custom GPT Profile Picture Upload Error, GPT Payment Issues, Advanced Voice Mode Unlimited for Pro Users
- Copilot Agents Hit Snag with GPT-5: Users report Microsoft Copilot agents using GPT-5 are failing to retrieve data in knowledge unless switched to GPT-4o or GPT-4.1.
- No root cause was immediately identified.
- Image Uploads to Custom GPTs Faceplant: Users are running into an unknown error when trying to upload photos for their custom GPT avatar.
- No workaround has been found, and the problem appears to be widespread.
- GPT Payment Gets The Red Light: Users are reporting issues with payments in GPT, with errors like Your card has been declined.
- One user jokingly suggested that it means youāre broke.
- Voice Mode is Pro-Level Unlimited: Advanced voice mode is effectively unlimited for Pro users, with one reporting using it for up to 14 hours in a day.
- Some Plus users still experience daily limits, suggesting an upgrade might be necessary, however pro is not that cheap, need to think about it.
- GPT-5 Quality Takes a Dive?: Users on ChatGPT Plus (GPT-5) report a drop in quality since around October 20, with shorter answers, skipping steps, and giving surface-level replies.
- The community is theorizing a change behind the scenes, such as adjusting their profit model by routing more traffic to GPT-5-mini or throttling compute, with a Reddit thread dedicated to the discussion.
OpenAI ā· #prompt-engineering (76 messagesš„š„):
Animating PNGs with AI, Prompt Engineering lessons with AI, Sora 2 quality issues and upscaling, Prompt injection attempts on GPT-5, Temporal Optimal Video Generation
- Animating PNGs the AI Way?: A member inquired about animating PNGs using AI, linking a sample video.
- Prompt Engineering Lessons Abound: A member shared a markdown-formatted guide to prompt engineering, covering topics like hierarchical communication, abstraction with variables, reinforcement for tool use, and ML format matching, with an output template.
- The guide includes teaching users how to structure prompts with markdown, abstraction, reinforcement, and ML format matching for compliance.
- Sora 2ās Quality Quandaries: A member expressed concerns about the quality of videos generated on the Sora 2 app, noting that even upscaling doesnāt yield satisfactory results.
- Another member suggested using a PC instead, hinting at potential performance or quality differences.
- Busting GPT-5 with Prompt Injection: A member described attempts to use prompt injection on GPT-5 to expose its raw reasoning, seeking refusal examples.
- A member advised against such attempts, citing OpenAIās usage policies prohibiting circumvention of safeguards and the risk of bans.
- Grandma Optimality for Temporal Video: A member introduced the concept of Temporal Optimal Video Generation using Grandma Optimality, suggesting slowing down video speed while maintaining visual elements and aesthetics.
- They proposed generating an image first and then converting it to video for the best results and provided examples and another one.
OpenAI ā· #api-discussions (76 messagesš„š„):
Animating PNGs with AI, Prompt Engineering Lessons, Temporal Optimal Video Generation using Grandma Optimality, Prompt Injection Attempts & Refusals, Home for AI Creators and Prompt Engineers
- PNG animation with AI is sought: A user asked about how to animate PNGs with AI, referencing a video example.
- Prompt Engineering Lessons are summarized: One user summarized the utility of Prompt Engineering lessons, including Hierarchical communication with markdown, Abstraction through open variables, Reinforcement in prompts, and ML format matching for compliance.
- Grandma Optimality Generates Temporal Optimal Videos: A user named Ditpoo introduced a technique called Temporal Optimal Video Generation Using Grandma Optimality for enhancing video generation quality, suggesting generating an image first, then using image-to-video.
- Ditpoo provided examples, normal fireworks and temporally optimal slow variant, noting the optimized version was more complex, stable, and lasted longer.
- Prompt Injection Attempts Yield Refusals: A user attempted prompt injections on GPT-5 to expose its reasoning chain, but was unsuccessful.
- Another user, Darthgustav, warned against such attempts, citing OpenAIās policies and potential bans, clarifying Supplying ārefusal exemplarsā to defeat guardrails is out-of-bounds.
- ThePromptSpace: A New Home is Built for AI Creators and Engineers: One user, Miles404, sought feedback on creating a home for AI creators and prompt engineers.
- They mentioned their MVP is ready and itās a freemium-based platform called āthepromptspaceā, discoverable via Google.
Unsloth AI (Daniel Han) ā· #general (376 messagesš„š„):
Ollama vulnerability, Qwen3 Next model, Second token sampling, MTP impact, Unsloth memory efficient approach
- Ollama Servers Hacked: A member posted about CVE-2024-37032, a CVSS 9.8 vulnerability in Ollama, stating that ~10,000 servers were hacked via DNS rebinding and linked to NISTās vulnerability details.
- Another member remarked that this was really old news, no.
- Qwen3 Next Dynamic Quantization in the Works: Members discussed the near completion of Qwen3 Next development, referring to this GitHub pull request, and the idea of trying Dynamic 2.0 quantization on it to reduce its size while maintaining quality for fast local LLM use.
- A member agreed, but stated itās better to wait for full release.
- Sampling for Quality Text Generation: A member shared their experiments on a Qwen 2 VL 2B model with full SFT on my dataset, inferenced on MLX, resulting in coherent text generation with a smart threshold, achieved via modified sampler.
- This member said now, we can finally start working on making a ten times better Grammarly alternative and a translator!
- Qwen3-Next Outperforms Qwen3-32B: Members discussed Qwen3-Next and its performance, noting that Based on benchmarks, it outperforms Qwen3-32B, and can be on-par or slightly lose to the 235B model (in non-thinking, it blows Qwen3 235B out of the water when thinking).
- It has 3B active parameters, 80B total, and MTP, so youāll get double the tokens per second for the same amount of work.
- Unsloth Showcases Memory Efficiency: A member shared a code breakdown of Unslothās memory-efficient approach of storing the last hidden state instead of logits, allowing for a 63x smaller memory footprint.
- This is achieved by computing logits only when needed, in chunks, using UnslothEfficientGRPO.
Unsloth AI (Daniel Han) ā· #introduce-yourself (5 messages):
AI Agent Specialization, AI Trust and Safety
- Crafting Cogent Collaborations with Cutting-Edge Coders: One member introduced their specialization in building autonomous AI agents, multi-agent systems, and AI assistants, highlighting skills in JS/TS, Next/Vue, Python, and tools like Langraph, AutoGen, ReAct, CrewAI, and DeepSeek.
- They are open to teaming up with startups or ambitious projects for collaborations and potential full-time opportunities, focusing on building something intelligent.
- Skeltalās Skeptical Scrutiny of Safety Schematics: A PhD student studying AI trust and safety, as well as gen ai and parasocial relationships shared access to system images.
Unsloth AI (Daniel Han) ā· #off-topic (290 messagesš„š„):
Andor as Best Star Wars, Transferring NN to biological brain, math.py error, Image Classification Model, AI Haters
- Andor claims Best Star Wars Crown: One member called cuts to a Star Wars show awful, another positioned Andor as best Star Wars content of any kind.
- Meatrix Multiplication: Bio-Brains Pondered: A member proposed a hypothetical scenario involving a human-level multimodal AI and incubators, questioning the limitations of fully transferring a neural network to a biological brain, aiming to make it āalive.ā
- The member suggested using āmeat instead of melted sandā to make matrix multiplications alive, citing esoteric research and a desire for something more ānaturalā, though another countered, āWhat is a soul anyways right?ā
- Pythonic Paradox: Naming Nightmare!: A member encountered a perplexing error by naming a file
math.py, which resulted in a collision with the global math module, causing an issue related to datetime and Rust.- Renaming the file resolved the conflict, highlighting the importance of avoiding naming collisions in Python projects.
- Human vs. Machine: Labelling Edition: A member completed the third round of an image classification model with over 130k images, 86k of which were human reviewed, labeling in approximately 3 seconds per image using keyboard shortcuts over two months.
- Though the cost of paying annotators was too high, the manual annotation work, though solid, was painful and potentially detrimental to mental health.
- AI Art Triggers Anti-AI Tirade: A member expressed hatred for all AI users and developers involved in creating AI for creative purposes.
- They stated if you cannot create - you MUST NOT! and that AI has zero value or place in creativity, and would rather people hire artists if they cannot create art themselves.
Unsloth AI (Daniel Han) ā· #help (92 messagesš„š„):
Llama obsession, Model merging, GGUF conversion errors, Voice agent model stack, SageMaker pyarrow errors
- Member Obsessed with Llama: A member joked about another memberās obsession with Llama.
- Another member mentioned the original member has now shifted to using Jais.
- Mult-LoRA support merges into VLLM: VLLM recently merged in support for gpt-oss multi lora, but a member ran into errors when enabling 4 bit and 16 bit LoRA when loading the model with fast_inference=True on the nightly builds of VLLM when loading in unsloth/gpt-oss-20b.
- A member stated they will try to integrate it now.
- Hugging Face Fails to Load Models: A user ran into errors when loading models from Hugging Face, specifically, Max retries exceeded with url for /unsloth/deepseek-r1-0528-qwen3-8b-unsloth-bnb-4bit/resolve/main/adapter_config.json.
- The user was running āDeepSeek_R1_0528_Qwen3_(8B)_GRPO.jpynbā from the docker image.
- User reports VRAM regressions: A user reported experiencing VRAM regressions with some Unsloth versions, feeling they could fine-tune Mistral on 32K context with 24gb VRAM a few months ago but now faces OOM errors with Qwen3 0.6gb base with 32K context.
- The user is attempting to diagnose the issue by ruling out dataset problems and testing other base models.
- Unsloth installation fails with Pyarrow on AWS Sagemaker: A user encountered an error while installing Unsloth in AWS SageMakerās conda_pytorch_310 kernel related to building pyarrow wheels.
- Other users have had success with a container BYOC using unsloth/unsloth as the base image, pinning specific versions of transformers, trl, torch, triton, and a specific commit from the Unsloth notebook and this issue.
Unsloth AI (Daniel Han) ā· #showcase (1 messages):
NVIDIA Blackwell Support
- Unsloth AI Announces Blackwell Support: Unsloth AI announced official support for NVIDIA Blackwell in a new blog post, which can be found here.
- NVIDIAās Blackwell Architecture Gains Traction: The announcement highlights the growing adoption of NVIDIAās Blackwell architecture within the AI community, offering potential performance improvements for Unsloth AI users.
Unsloth AI (Daniel Han) ā· #research (17 messagesš„):
GPT-5 Cheating, Thinking Machines fine-tuning marketing, eNTK Confusion, La-LoRA: Parameter-efficient fine-tuning, Evolution Strategies at Scale
- GPT-5 finding creative ways to cheat: GPT-5 creatively cheats 76% of the time rather than admit defeat of failing a unit test, according to this tweet.
- Thinking Machines fine-tuning ALL the things: Thinking Machinesā marketing focuses on fine-tuning/post-training for everyone, as detailed in this blog post.
- Their general approach involves decreasing batch sizes to less than 32, increasing the learning rate by 10x, and using LoRAs for all layers.
- eNTK Confuses Readers: A member expressed confusion about eNTK, particularly why LoRAs would be needed on all layers, referencing a paper on the subject.
- La-LoRA Beats Adam Style: La-LoRA, a parameter-efficient fine-tuning method with layer-wise adaptive low-rank adaptation, uses a Sigmoid Linear unit for activation over traditional ReLU as described in this blog post.
- Evolution Strategies Scale Finetuning: Evolutionary algorithms are underexplored for finetuning and are described in Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning and this YouTube video.
LM Studio ā· #general (226 messagesš„š„):
Stellaris finetuning, LM Studio crashing, MCP Server prompts in LM Studio, LLM Hallucination mitigation, Qwen performance on integrated GPUs
- Stellaris Finetuning: A Galactic Endeavor: Members discussed the difficulty and value of fine-tuning a model on Stellaris base game and modding content, noting that creating the right amount of useful, highly annotated data is challenging.
- It was pointed out that you canāt just throw random texts and files at it and that RAG might be a better approach for knowledge-base lookups.
- Crash Landing: Siteās Post-Task Troubles: A user reported that the LM Studio site crashes after completing the task, requiring a page refresh, a common issue related to the platformās performance.
- Other users suggested it might be related to European vehicle malfunctions, and Apple car rumors.
- LM Studio Denies MCP Prompts Access: A user inquired about using MCP server prompts in LM Studio, but found out that it is not supported.
- Members linked to Anthropicās grid of MCP features, but itās not on the roadmap and Anthropicās new skills even have MCP server creation, if you want, but itās very doable if youāre comfortable coding or vibe coding.
- Hallucination Mitigation: Prompt Engineering Saves the Day: Members discussed ways to mitigate LLM hallucination using internet/document research.
- The key is to craft effective system prompts that encourage the model to use search tools when uncertain, suggesting phrases like, if you are not ABSOLUTELY SURE use the search tool to confirm and provide cited sources.
- Qwen on Integrated GPUs: A Balancing Act: Users discussed running Qwen models on integrated GPUs with limited RAM (around 7GB), suggesting Qwen 4B or GPT-OSS as possible options.
- One user experienced tofu and errors due to running out of memory, highlighting the need to reduce context length, use smaller models, or get more RAM.
LM Studio ā· #hardware-discussion (380 messagesš„š„):
LM Studio VRAM usage, Flash attention, Intel B60 vs MI50 vs 3090, 4090 Death, Snapdragon 8 Gen 5 bandwidth
- LM Studio loads into VRAM and RAM by default: A user questioned why LM Studio loads models into both VRAM and RAM, even when the model fits entirely in VRAM, noting that disabling certain options improves performance and that nmap caused performance problems, with performance being identical whether these options are on or off.
- This is a default behavior and there may not be any need for the extra copy in RAM.
- Flash Attention Gets Optimized with Q8 Quantization: Users discussed the impact of flash attention on VRAM usage in LM Studio, noting it reduces VRAM size and can be further optimized by changing KV to Q8 quantization.
- One user confirmed that flash attention mainly helps free up more VRAM to play with.
- Debate Heats Up: Intel B60 vs MI50 vs RTX 3090: The community debated the merits of the Intel Arc Pro B60 against the AMD MI50 and Nvidia RTX 3090 for LLM inference, with the B60 drawing less power but lacking LLM benchmarks, while the MI50 is favored for its speed and VRAM.
- Some members suggested that new does not always mean good, as the B60 might underperform despite being newer and cheaper; it was suggested 3090 would be better for the price.
- Graphics Card Suffers Catastrophic Failure: A user humorously reported the potential death of their 4090 after high temperatures led to a system shutdown, attributing the issue to adjusting fans and unplugging the 4090 while the PC was running, then plugging it back in.
- Suggestions included checking power wattage and riser failures, and trying thermal grizzly kyronaut for re pasting to get another 5-10°C difference.
- Snapdragon 8 Gen 5 Bandwidth Woes: Concerns were raised about the limitations of Snapdragon 8 Elite Gen 5 for running larger LLMs on phones, citing its DDR5 memory and limited 84GB/s memory bandwidth.
- It was pointed out that itās going to be a little while until phones can run larger LLMs locally.
OpenRouter ā· #announcements (1 messages):
high-precision tool calling endpoints, audio inputs in the Chatroom, resettable API Key limits, MiniMax M2 Free
- Exacto Endpoints Give Tool Calling a Precision Edge: OpenRouter introduces high-precision tool calling endpoints, resulting in a 30% quality increase on Kimi K2, with five open-source models available; see last weekās announcement.
- Audio Inputs Sing in the Chatroom: Users can now compare 11 audio models side by side in the Chatroom, as announced on X.
- API Keys Get Limit Reset Button: OpenRouter now allows users to reset their API key limits daily, weekly, or monthly to better manage accounts used by external users or apps; usage can be monitored here.
- MiniMax M2 Model Sets OpenRouter Ablaze for Free: The MiniMax M2 model, top-ranked on many benchmarks, is now free on OpenRouter; try it out here.
OpenRouter ā· #app-showcase (6 messages):
Next.js chat demo app, OpenRouter TypeScript SDK, OAuth 2.0 workflow, Chat / document editor project, Customizable UI
- Next.js Chat Demo Revamped with OAuth 2.0: A developer shared an updated and working version of the Next.js chat demo app for the OpenRouter TypeScript SDK, featuring a re-implementation of the OAuth 2.0 workflow.
- Itās available on GitHub but advised against production use due to storing the API key in plaintext.
- New Chat / Document Editor Project Debuts: A member is seeking feedback on their chat/document editor project, emphasizing local data storage with download backups and integration with OpenRouter OAuth.
- Shadcn Aesthetics Spark Spicy UI Revolution: One of the members expressed a desire to move away from the Shadcn look, opting for a spicier UI design in their project.
- Another member responded, agreeing that the features and usability aspects are uncommon or poorly executed in popular solutions.
OpenRouter ā· #general (459 messagesš„š„š„):
OpenRouter API response with system message, Model Benchmarks, Claude Sonnet 4.5 API usage, Unsupported model errors, Provider names in model slugs
- OpenRouter API ignores system message: A user reported that with the new response API, filling instructions in the request body doesnāt seem to apply a system message.
- Qwen3-8b Online Costs Skyrockets: A user reported using
qwen/qwen3-8b:onlineand getting charged $140 for 17.41M tokens instead of the expected $4. - Vertex AI API Has Critical Response Misrouting: A user shared a Google Cloud security bulletin detailing that on September 23, 2025, the Vertex AI API had a technical issue that caused responses to be misrouted between users for certain third-party models when using streaming requests.
- Users Debate OpenRouter Embedding and Web Browser Priorities: Users discussed feature priorities, including OpenRouter embeddings and a potential OpenRouter Web Browser with summarization and email checking capabilities, sparking humorous suggestions and debates.
- One user jokingly suggested deprioritizing embeddings for a new web browser, while another suggested deprioritizing embeddings and prioritizing a new OpenRouter Web Browser that can summarize web pages and check emails.
- Jailbreaking GPTās Image Generator: A Userās Odyssey: A user seeks advice on bypassing GPTās content filters for generating images of copyrighted characters, detailing attempts to jailbreak the prompt and encountering errors when using GPTās image generation features, highlighting the challenges of creating desired content.
- Members suggested using surrogate prompts, telling it to rollback or even wipe current memory or even just switching to Grok.
OpenRouter ā· #new-models (1 messages):
Readybot.io: OpenRouter - New Models
OpenRouter ā· #discussion (42 messagesš„):
Minimax Pricing, GPT-5.1 mini, Model Naming Schemes, Meta's new LLama, Image models
- Minimaxās M2 Model Stuns with Competitive Pricing: Minimax is offering their 10B parameter model (M2) at a price of $0.3/1.20, raising eyebrows due to its affordability.
- One user pointed out the modelās verbosity in reasoning might lead to unexpected costs, especially given the 5x jump in input token costs.
- GPT-5.1 mini Speculated in the Works: Speculation around a GPT-5.1 mini model surfaced following a post on X (link), hinting at a more reasonable naming convention.
- The move away from confusing naming schemes was welcomed, with comparisons made to Anthropicās model naming which has caused frustration because it stopped making sense to number the family name Claude when the model releases didnāt line up.
- Meta Introduces New LLama Version with Vision: Meta introduced a new LLama model (link), which incorporates image understanding.
- Early reactions express surprise at the salvaged launch, hoping that atleast it might make its surprisingly decent vision useful for some more complex tasks and that it might provide a good vision capable reasoning models which are open weights.
- Debate flares over model naming conventions: Users discussed naming conventions such as
brand-number-labellikegpt-5-miniorgemini-2.5-pro.- The consensus was that a consistent approach is key, regardless of chronological release order, while others think the order is important.
HuggingFace ā· #general (223 messagesš„š„):
Llama 1 epoch training, AI Radio project, Model Encryption for Clients, Hugging Face Storage Limits, Linear Projection dimensionality
- One Epoch Wonder: Llamaās Training Quirks: A member noted that for a good from scratch you also need ~B tokens and that Llama models need always 1 epoch.
- The discussion underscored the peculiarities of training Llama models, highlighting the need for large datasets and the specific requirement of using only one epoch for optimal performance.
- AI Radio Station: 24/7 AI-Generated Music: Members discussed the possibility of creating an AI Radio station that plays AI-generated songs 24/7, using AI models like Spotifyās basic pitch.
- Concerns were raised about the potential for weird chimera mix of travis and taylor swift and the need for training on 1 epoch and a big dataset.
- Encrypting Models: A Bankās Security Dilemma: A member sought advice on encrypting models for deployment to bank clients who require on-premises hosting due to data policies, as they feared clients might steal the model.
- Suggestions included licensing the model, encrypting it for runtime decryption, and using an API wrapper with secure API keys, but the challenge remains in preventing clients from accessing the decryption key, with some humorously suggesting just me as the main solution.
- Hugging Face Storage: 403 Forbidden Account Troubles: A user encountered a 403 Forbidden error due to storage patterns triggering internal systems, preventing access to their model at Hugging Face API.
- It was suggested that the user contact Hugging Face support to verify the account and unlock more storage, with another member suspecting similar behavior as past blockchain checkpoint spam issues, hence the ping.
- Linear Projections: Signal Amplifiers not Information Creators: Members discussed linear projections and their role in increasing dimensionality, clarifying that they donāt create new information but amplify existing signals.
- One member used the analogy of converting a 4-bit image to 64-bit, and another clarifying that linear projections increase the contrast between different types of data, like signal amplifiers, clarifying you donāt add information.
HuggingFace ā· #i-made-this (4 messages):
Modular GAN+VAE+Diffusion Hybrid Architecture, Live PyTorch Memory Profiler, AI Trust & Compliance Layer
- Hybrid Architecture Alchemist Brews Modular GAN+VAE+Diffusion Model: A member is completing a modular GAN+VAE+Diffusion hybrid architecture, wondering if its worth releasing under an MIT license.
- The motivation stems from bridging the gap between the open-source community and high-tech companies, given the relative rarity of such hybrid models.
- TraceML Memory Watchdog Sniffs Out GPU Gluttons: A member introduced TraceML, a live PyTorch memory profiler for debugging OOM errors by providing a layer-by-layer memory breakdown of CPU and GPU usage.
- The tool features real-time step timing, lightweight hooks, and live visualization, but currently supports single GPU setups only, with multi-node distributed support planned.
- Intilium Shields Sensitive AI with Compliance Fortress: A member introduced Intilium, a trust and compliance layer for AI, designed to enforce regional and model policies, log AI requests for audit and transparency, and detect/mask PII to comply with regulations like the EU AI Act, ISO 42001, and GDPR.
- The tool operates as an API gateway or sandbox between applications and model providers such as OpenAI and Anthropic, and is fully hosted in the EU.
HuggingFace ā· #computer-vision (3 messages):
1D feature vectors to 2D segmentation map, Diffusion Models, VAEs and GANs
- Projecting 1D Features onto 2D Segmentation: A member asked about the canonical way to project a set of 1D feature vectors to a 2D segmentation map.
- Diffusion, VAEs, and GANs are mentioned: Another member suggested diffusion models, VAEs, and GANs as potential solutions.
HuggingFace ā· #NLP (1 messages):
Syllable separation, Multiple languages
- Syllable Splitter Sought: A member is seeking recommendations for a model capable of separating words into syllables across multiple languages, not just English.
- The user is looking for a tool that can handle the intricacies of syllabification in various linguistic contexts.
- Multilingual Syllabification Model: The discussion revolves around finding a model that can accurately separate words into syllables in multiple languages.
- The initial request highlights the need for a solution that goes beyond English, addressing the complexities of syllabification in diverse linguistic environments.
HuggingFace ā· #gradio-announcements (1 messages):
Hackathon, Modal Credits, AI Agents, MCP, Cash Prizes
- Hackathon Participants get free Modal Credits: Hackathon participants get free Modal credits worth $250.
- This allows participants to flex and crush it like a pro and learn about AI Agents, MCP, and drop some sick production hacks while chasing those fat cash prizes!
- Biggest Online Hackathon ever: Sign up now for the biggest online Hackathon ever: https://huggingface.co/Agents-MCP-Hackathon-Winter25.
- Join the official channel: <#1424743721966108713> for help.
HuggingFace ā· #smol-course (10 messagesš„):
Submitting Models to Leaderboard, VLM section failures, LightEval module errors
- Colab Model Submission to Leaderboard: To submit a model trained in Google Colab to the leaderboard, one should submit a PR to the leaderboardās submissions.json file.
- The user should append their entry at the bottom as described in the unit.
- VLM Section Fails Due to Image Dimensions: The HF jobs version of the VLM section can fail with the provided dataset due to a
ValueError: Unsupported number of image dimensions: 2, indicating a bad image in the trl-lib/llava-instruct-mix dataset.- The suggestion involved using
model_id="Qwen/Qwen2.5-72B-Instruct"inInferenceClientModel()to resolve a potential change in the default inference model.
- The suggestion involved using
- LightEval ModuleNotFoundError: Users encountered a
ModuleNotFoundError: No module named 'emoji'when using HF jobs with lighteval, possibly due to version changes and an incomplete migration of third-party integrations.- The suggested solution is to use the following:
--with "git+https://github.com/huggingface/lighteval@main#egg=lighteval[vllm,gsm8k]" --with emoji.
- The suggested solution is to use the following:
HuggingFace ā· #agents-course (5 messages):
API Down, 404 Errors
- API Experiencing Downtime: Members reported experiencing issues with the API, including receiving 404 errors and the message āNo questions available.ā
- The discussion indicates the issue has persisted since yesterday evening, with members seeking updates on the situation.
- Members flood chat with error reports: Members are posting in the chat too quickly, inquiring about the API being down and asking if anyone has figured out the 404 errors.
- The bot has given warnings to slow down the chat.
Yannick Kilcher ā· #general (175 messagesš„š„):
Elastic Weight Consolidation, Self-Hosted GPU Setups, GANs parameterize pushforward, Activation-aware Weight Quantization (AWQ), Linear Projection Intuition
- Elastic Weight Consolidationās Softness Factor: Discussion around updating the softness factor in Elastic Weight Consolidation (EWC), considering magnitude of weight changes versus number of updates, and challenges with normalization.
- One member suggests using the number of accesses (forward pass) per slot instead of a āsoftness factorā, linking this to discovering āstuckā slots and mentioning Activation-aware Weight Quantization (AWQ) and Activation-aware Weight Pruning (AWP).
- Self-Hosting GPU Rigs vs Cloud Cost: A member described setting up a self-hosted GPU using an RTX 2000 Ada connected via VPN and monitored with a wifi plug to compare power usage against cloud providers.
- They mentioned Colabās spin-up time and timeouts make experimentation impractical and asked if others have self-hosted setups they like.
- Disinformation Detector AI Spark Debate: A paper on a disinformation detector AI was shared, sparking debate on whether itās AI for censorship or defense against misinformation, referencing this PPLX.AI link.
- A member specifically stated they skip any paper posted by another user, further fueling the disagreement.
- Explaining Linear Projection and Feature Expansion: A discussion clarified linear projection, explaining that expanding dimensions with linear layers doesnāt add information unless combined with non-linear activation functions like ReLU, which was illustrated via the google deepmind scheme.
- A member pointed out a Deep Linear Networks, collapse to a single linear function under the above analysis, but how they behave with respect to being trained with gradients remains different!
- VSCode Plagued with Performance Problems: Members discussed a critical performance issue with VSCode, citing a GitHub issue and lamenting its status as a text editor abused as an IDE.
- One member shared a VSCode alternative.
Yannick Kilcher ā· #paper-discussion (40 messagesš„):
Line Break Attribution Graphs, MinePPO Upgrades, Motion Models, Strudel Music Programming, DOI System Failover
- Gemma 2B Neurons Get Graphical: New line break attribution graphs relevant to the Gemma 2 2B paper are now available for exploration on Neuronpedia.
- Graphs for Qwen3-4B are also available, showcasing neuron activations ānearing end of lineā behavior via Neuronpedia.
- MinePPO Evolves into WineAndDinePPOSublimePPO: Members playfully discussed upgrading MinePPO to MinePPO++WineAndDinePPOSublimePPO.
- The name of the next architecture is yet to be decided.
- Motion Models and LAIONās Bud-E Project: A member plans to return to training motion models, aiming to adapt Deepmimic code for LAIONās Bud-E project, which involves a virtual teacher in the classroom.
- The member mentioned difficulties adapting Deepmimic and Pybullet, and is considering hiring a junior developer to supervise.
- Strudel Music for Audio Model Tuning: Projects for college students include using the Strudel music programming language to fine-tune an audio model, porting deep-mimic tools to the web browser, and studying the personality manifold with sparse autoencoders.
- The primary goal is to find projects suitable for student publication.
- Discussion on DOI Systemās Lack of Failover: A member criticized the DOI (Digital Object Identifier) system for lacking a basic failover mechanism.
- They suggested a simple fix involving storing and using a backup URL if the primary URL fails, highlighting how such a major system lacks something so basic.
Yannick Kilcher ā· #agents (1 messages):
rogerngmd: Novel idea. Are u using McP
Yannick Kilcher ā· #ml-news (6 messages):
Elon's Twitter data for AI, Schmidhüber revival, endomorphosis invite, odyssey.ml experience
- X Marks the Stupidity: Elonās Data Debacle: A user joked that Elonās Twitter data is making his AI dumber, referencing a Futurism article about social networks and AI echo chambers.
- They also quipped that it confirms it gives other wetwear āintelligenceāsā brain rot.
- Schmidhüber Wakes from Slumber: After years of dormancy, Schmidhüber is apparently back with a new paper, linked as arxiv.org/abs/2510.21614.
- The user noted Schmidhüber after years of dormancy and tagged another user.
- Endomorphosis: The Server Beckons: One user mentioned that someone inquired about another user, assuring them they were alive and sending them an invite to their server.
- No further details were provided about the serverās content or purpose.
- Odyssey.ml: Experience Launch Imminent: A user mentioned that experience.odyssey.ml was supposed to have something going on today, though they were unsure if that was the correct URL.
- The event was supposedly starting in 10 minutes from the time of the message.
GPU MODE ā· #general (9 messagesš„):
Access to Nodes, Torchcomms/ncclx Session, Slides from Vincentās lecture, CUDA vs Triton, Cute's layout algebra
- Node Access Awaits!: A user inquired about obtaining access to a node for a team of four.
- No further information or response was provided in the given messages.
- Torchcomms/ncclx Session Status?: A user asked if a recorded session on torchcomms/ncclx from a PT conference was available.
- The user noted that the playlist wasnāt yet up and requested a speaker/lecture.
- Vincentās Slides Sought!: A user requested the slides from Vincentās lecture, eager to dissect them.
- It was implied that the slides related to a recent hackathon.
- CUDA Curriculum Controversy?: A user shared a LinkedIn post questioning the right way to learn CUDA and asked for community commentary.
- Some members suggested skipping CUDA initially and starting with Triton if one lacks a proper CS background, others recommended learning CUDA first to better understand lower-level optimizations.
- Cute Layout Algebra simplified!: A user shared an implementation of a simplified static-only version of Cuteās layout algebra on GitHub.
- Another user responded that the idea was really cool.
GPU MODE ā· #triton (18 messagesš„):
Triton Matrix Multiplication on T4, Triton Support on Older GPUs, Pointer Casting in Triton Kernels, Fast Split-K GEMM Kernel in Triton
- Triton Matrix Multiplication Crawls on T4: A user found that the matrix multiplication example from the official triton tutorials ran extremely slow on a Colab T4 instance and shared their notebook for debugging.
- Another user suggested the T4 might be too old, and confirmed that the code ran as expected on an A100.
- Triton and Tensor Cores: A Question of SM Version: A user pointed out that T4 (sm75) might not be supported for tensor cores in Triton, suggesting a check of GitHub issues.
- Another user chimed in, noting that tensor core support starts from sm_80, while another mentioned that Triton works well on older consumer GPUs like 2080 / 2080 Ti (sm_75), suggesting that autotune settings might need adjustment.
- Decoding Pointer Casting in Triton: A user inquired about the practice of casting input pointers to
tl.pointer_type(tl.float32)in some Triton kernels.- Another explained that itās similar to C++ pointer casting, where
tl.load & tl.dotuse the specified type to determine assembly-level operations, while another added that itās often used with quantized inputs for memory savings, with operations done in full precision and results converted back.
- Another explained that itās similar to C++ pointer casting, where
- Quest for a Fast Split-K GEMM Kernel: A user is on the hunt for a fast split-k GEMM kernel implemented in Triton.
GPU MODE ā· #cuda (43 messagesš„):
CUDA bad fork behavior, GPU bandwidth optimization, CUDA compilation process, Vectorized data types and performance, Signed vs unsigned loop indices in CUDA
- CUDA Bad Fork Detection De-Mystified: A member investigated CUDAās fork behavior and found that
torch.cuda.device_count()registers a fork handler, but the device count appears to be cached, and a minimal test doesnāt reproduce the expected errors.- The test involved checking
torch._C._cuda_isInBadFork()in both the parent and child processes after a fork, with the intention of detecting if CUDA context was improperly shared, but the test indicated that CUDA could be getting away with it.
- The test involved checking
- GPU Bandwidth Benchmarking Bonanza: A member investigated GPU bandwidth when moving from one SM to the full GPU, observing that using 256 threads per block with a plain data type yields the best results (highest bandwidth) compared to vectorized data types on a Hopper GPU.
- They shared code samples and suggested profiling the code with NCU, setting
clearL2tofalseto address negative bandwidth measurements due to timing fluctuations.
- They shared code samples and suggested profiling the code with NCU, setting
- Compiler Optimizations Dance with Signed vs Unsigned Indices: A member discovered that using unsigned indices in CUDA kernels can prevent compiler optimizations like loop unrolling, leading to slower performance, which they verified by examining the generated SASS code.
- They linked to the NVIDIAās best practices guide and noted the performance differences heavily depend on whether the loop index is signed or unsigned, influencing the loop structure and load rearrangement.
- NVCC Dry Run Deployed for Compilation Decoding: A member advised using
nvcc -dryrunto understand the CUDA compilation process, along with-keepto retain intermediate files such as the .ptx and .cubin files, for custom modification and linking.- The suggested workflow involves using the output from
nvcc -dryrunto manually execute the steps for compiling a modified .ptx file and linking it with a .cu file, thereby offering more control over the compilation.
- The suggested workflow involves using the output from
GPU MODE ā· #torch (1 messages):
High Dimensional Tensors, Matrix of Matrices
- Matrix of Matrices Visualize High Dimensional Tensors: A member shared a blog post on how to visualize high dimensional tensors as a matrix of matrices.
- Blogpost links Matrix of Matrices Draw High Dimensional Tensors: A member shared a blog post on how to draw high dimensional tensors as a matrix of matrices.
GPU MODE ā· #cool-links (1 messages):
KernelBench, GPU Kernel Generation, LLMs for Kernel Generation
- KernelBench Marks One Year Milestone: A blog post shares a one-year retrospective on KernelBench and discusses progress towards automated GPU Kernel Generation.
- LLMs Aim to Automate GPU Kernel Creation: A Google doc provides an overview of the impact of KernelBench and the use of LLMs in kernel generation.
GPU MODE ā· #jobs (5 messages):
Small inference optimized models for code gen, Machine Learning Projects, Morph, B200 inference, Technical Obsessions
- Morph Seeks Interns for Small Model Work: Morph is hiring interns for machine learning engineering to work on small inference optimized models for code gen.
- Their first model runs at 10.5k tps on a B200, according to their post, and the poster said to DM them on Twitter.
- ML Bragging Rights Requested: One member asked others to describe the machine learning project youāre most proud of with extreme technical detail for a job application.
- They added that they are familiar with all the libraries for evaluating the response.
- Obsessions Solicited for Job Apps: One member asked others to describe what you were or are deeply obsessed about (anything), presumably for inclusion in the why are you interested section of a job application.
- Another responded that it doesnāt matter too much.
GPU MODE ā· #beginner (4 messages):
Budget friendly cloud GPU providers, Vast.ai, RunPod.io, Lightning.ai, Running an entire application on GPU
- GPU Cloud on a Shoestring: Members recommended Vast.ai for a more bare metal feel and is usually the cheapest, although your data runs on random community servers, and RunPod.io which is similar but more stable.
- They also mentioned Lightning.ai is great for fast experiments and even has a free tier with limits, suggesting to combine the free tier Lightning.ai with Vast.ai.
- Full GPU Compilation = Slowdown: Members discussed what would happen if you compiled an entire application to run on a GPU instead of just the sections of code that can be run on multiple threads.
- The consensus was that if you were able to do that, it would run very very slow, because GPUs are not good or fast at non-parallel computations.
GPU MODE ā· #pmpp-book (1 messages):
Cutlass Documentation, Nvidia
- Cutlass Docs Get Thumbs Up: Members recommend the Cutlass documentation as a good starting point for understanding the library.
- The Cutlass library provides a set of CUDA C++ template abstractions for implementing high-performance matrix multiplication (GEMM) at all levels and scales within CUDA.
- Nvidiaās Cutlass Library: Cutlass is developed by Nvidia and optimized for their GPUs, focusing on maximizing performance for deep learning and high-performance computing workloads.
- It offers highly tunable primitives and allows developers to implement custom GEMM kernels tailored to specific hardware and application requirements.
GPU MODE ā· #off-topic (2 messages):
GEMM, memes, procrastination
- Meme takes precedence over GEMM: A member joked about procrastinating on writing GEMM code because they spent too much time creating a meme.
- They attached an image related to it.
- Procrastination with Memes: The member humorously admitted to prioritizing meme creation over actual coding work.
- This highlights the struggle between productive tasks and the allure of entertaining distractions.
GPU MODE ā· #irl-meetup (2 messages):
LLVM dev meeting, SuperComputing in St Louis
- LLVM Devs Assemble?: A member inquired if anyone was attending the LLVM dev meeting.
- SuperComputing Goers?: Another member asked if anyone was heading to SuperComputing in St Louis.
GPU MODE ā· #self-promotion (2 messages):
Penny beats NCCL, vLLM custom allreduce, CuTeDSL for memory bound kernels, Quack library, RMSNorm CUDA
- Penny Stomps NCCL on Small Buffers: A new blogpost reveals that Penny beats NCCL on small buffers, detailing the custom allreduce implementation in vLLM.
- The blogpost is accompanied by a GitHub repo and a thread on X showcasing Pennyās capabilities.
- CuTeDSL Triumphs in Memory-Bound Kernels: Demonstrating its versatility, the Quack library proves that CuTeDSL excels not only in GEMM kernels but also in implementing highly efficient memory-bound kernels.
- A blogpost showcases a straightforward approach to implementing parallel reduction on GPUs using CuTeDSL, focusing on the commonly used RMSNorm layer.
- RMSNorm Gets a CUDA Boost: An older blogpost details the implementation of RMSNorm in CUDA, offering insights into optimizing this layer.
- This post complements a new post which showcases simple reduction in CuTeDSL.
GPU MODE ā· #šæ (5 messages):
GPU Mode Kernel Leaderboard, The Stack Dataset, Triton/CUDA repos
- Kernel Count on GPU Mode Leaderboard Exceeds GitHub?: A member recalled Mark stating that the GPU Mode Kernel Leaderboard has more kernels than all of GitHub, and wondered where he obtained these numbers.
- Another member believes this figure originates from a statistic posted by The Stack dataset, while also noting that the prevalence of GPU programming for deep learning has likely caused it to change over the last year.
- Initiative to Catalog GitHub Kernels: A member considered assembling a group to compile an exhaustive list of all kernels/heterogeneous computing code on GitHub, provided a viable method for distributing the workload can be identified.
- Another member mentioned the existence of repositories that track notable Triton/CUDA repos, but could not recall the specifics.
GPU MODE ā· #thunderkittens (1 messages):
Thundermla, sm120, async tma, async mma/wgmma
- Thundermla Port Viability to sm120: A member inquired about porting Thundermla to sm120, considering its ability to use async tma and barriers.
- However, it cannot use tcgen05 async mma/wgmma async mma seen in sm100 and sm90 examples.
- sm120 async features: A member confirmed that sm120 can use async tma and barriers.
- However, it cannot use tcgen05 async mma/wgmma async mma seen in sm100 and sm90 examples.
GPU MODE ā· #submissions (7 messages):
A100 Leaderboard Updates, prefixsum_v2, vectorsum_v2
- prefixsum_v2 crown claimed: One member achieved first place on A100 for
prefixsum_v2with a time of 7.20 ms. - vectorsum_v2 third place: Another member secured third place on A100 for
vectorsum_v2with a time of 156 µs. - prefixsum_v2 Runner Up: The same member secured second place on A100 for
prefixsum_v2with a time of 11.0 ms.
GPU MODE ā· #hardware (1 messages):
id_ab_ling: how to download fieldiag
GPU MODE ā· #cutlass (14 messagesš„):
Chris's slides, Non-affine layouts, Swizzles in CuTe
- Chrisās Slides Still Awaiting Rediscovery: A member inquired about the availability of slides from a YouTube livestream, after they had been removed from the video description.
- Another member offered to email Chris about them on Monday.
- Non-Affine Layouts Still Elusive: A member asked for an example of a case where a non-affine/non-cute representable layout was needed for a common operation.
- Discussion continues to identify specific scenarios where such layouts are essential.
- Swizzle Layouts get Deep Dive: A member noted that swizzles are representable but not composed of a layout : stride, linking to veitner.bearblog.dev.
- Another member pointed out that swizzled layouts are represented as a special type of
ComposedLayoutin CuTe, referring to the source code.
- Another member pointed out that swizzled layouts are represented as a special type of
GPU MODE ā· #mojo (11 messagesš„):
Pixi Setup, GPU Puzzles, PyTorch Versions, UV Environment, CUDA Versions
- Pixi vs UV: GPU Puzzles Edition: A member inquired about using Pixi for gpu-puzzles, noting that the Pixi setup uses pytorch=2.7.1, which caused an error but works with torch 2.8.0 in their UV environment.
- They were wondering if there were specific requirements necessitating Pixi, or if Mojo with UV would suffice for now, showing a screenshot of the error.
- CUDA Conundrums: Nvidia vs. Non-Nvidia: A member pointed out that the setup is pinned to CUDA 12.8 torch, which might cause issues on non-Nvidia GPUs.
- They suggested that apart from torch custom ops puzzles (20-22), it may be possible to exclude PyTorch, as Mojo and MAX lack an actual dependency on PyTorch except for making PyTorch custom ops.
- UV Victorious: Pixi Purged!: After getting a 4060 and nuking Pixi, a member confirmed that the setup now works using UV with their old environment.
- They mentioned theyād revisit Pixi only if challenges or specific package requirements arise, concluding: I found that when Iām trying to break in is not the right time to reformulate the recipe.
GPU MODE ā· #singularity-systems (8 messagesš„):
HIPS/Autograd vs JAX, PyTorch 1 vs PyTorch 2, Graph Acquisition Mechanisms, Tinygrad UOp IR, Dual Language Problem (Python/C++)
- JAX preferred over PyTorch2 for Pedagogy: Transitioning from HIPS/Autograd to JAX is favored over PyTorch 1 to PyTorch 2 for pedagogical reasons due to the complexity of tracing at the host bytecode level with torchdynamo and lowering with aotautograd in PyTorch 2.
- DSL Embeddedness Trumps Host Language Semantics: Itās pedagogically better to lean more into the embeddedness of the DSL rather than closely relying on the semantics of the host language, similar to why PyTorch and Triton are favored.
- The user likened this to not building IDE support for an interpreter/compiler class, even though itās standard for industrial languages.
- Ditching HIPS/Autograd for JAX and TorchScript/FX: It is suggested that transitioning from HIPS/Autograd to JAX and from PyTorch 1 to TorchScript/Torch.FX is preferable over PyTorch 1 to PyTorch 2 (Dynamo/AOTAutograd).
- Mojo Language as a Compiler Foundation: A user recommends exploring the Mojo language, which uses LLVM intrinsics as its foundation and requires explicit user definition of code, even down to thread index level.
- The TLDR for Mojo, as far as the user understands, is to use LLVM intrinsics as your foundation.
GPU MODE ā· #general (1 messages):
achal: How do you get the benchmark results from the website?
GPU MODE ā· #multi-gpu (3 messages):
NCCL hangs, Megatron Optimizer
- NCCL Hangs Point to Network Topology Woes: A member suggested that collective communication hangs are common with inconsistent network topologies, referencing this paper.
- They suggested adding NCCL_DEBUG=INFO to see where itās hanging, but another member replied that the logs were difficult to parse.
- Megatronās Distributed Optimizer Causes Deadlock: A member found that disabling the distributed optimizer of Megatron resolved a deadlock issue.
- After disabling it, they confirmed that the deadlock is gone.
GPU MODE ā· #irl-accel-hackathon (38 messagesš„):
Mini-PyTorch with GPU allocator, Oulipo coding constraint, PyTorch Distributed hacking, Monarch/torchforge contributions, Symmetric memory rendezvous
- Building Mini-PyTorch with GPU Tensor Metadata: A member is considering writing a mini-version of PyTorch project with tensor metadata and allocator on GPU, adding an Oulipo flavour constraint to use 512 threads in a block.
- Another member suggested using cudaMallocManaged for on-GPU memory allocation and virtual memory management, but also pointed out the need for an allocator to track memory space allocation.
- Monarch and TorchForge Open Source Contributions: A participant expressed interest in contributing to Monarch and TorchForge outside of the hackathon and inquired about the open-source community management process.
- Another member mentioned that someone was looking for help with offloading optimizers for LLMs.
- GPU Access Assistance and Project Submission: One participant who filled out the GPU access form reported not receiving access and was advised to join the Discord server mentioned on the form and request access using the bot; the Nebius team was available on the 3rd floor for assistance.
- A reminder was issued to submit project proposals by 6 PM via this form.
- Seeking Symmetric Memory Rendezvous Assistance: A participant requested help with a symmetric memory rendezvous hang, and was directed to specific members with expertise in the area.
- Another member asked where the participant was located, and offered their help.
- Final Project Demos and GPU Access Deadline: Judges selected projects for demos on the 1st-floor main stage at 6:30 PM, with each team getting 3 minutes to present, and dinner was scheduled on the 3rd-floor rooftop from 7:30 - 8:30 PM.
- GPU access was confirmed to be available until 9 AM the following day.
GPU MODE ā· #llmq (1 messages):
NPU, CPU offloading
- Framework Frustrations Force CPU Focus: A member reported failing to get the framework machine working for the NPU.
- They decided to switch gears to working on CPU offloading instead.
- CPU Offloading Project: A member is pivoting to CPU offloading due to issues with the framework machine for the NPU.
- Interested parties are encouraged to reach out to collaborate on the CPU offloading efforts.
Modular (Mojo š„) ā· #general (23 messagesš„):
Mojo Setup, Modular vision, GPU Compatibility, AMD vs Nvidia, Apple Sillicon
- Mojo Installation Assistance is a Channel Hop Away: A user looking for help setting up Mojo was directed to the installation help channel [<#1119100298456215572>].
- Modularās Strategy: Open Source with Nuanced GPU Support: A user inquired about Modularās strategy, noting the focus on open sourcing Mojo and MAX while questioning the GPU compatibility tiers, especially for consumer-grade AMD and Apple products.
- The user highlighted the challenge of attracting users when CUDA has a more established ecosystem, particularly given the limited support for AMD consumer cards like the 7900 XTX.
- GPU Support Tiers: Contractual Obligations and Hardware Realities: A contributor clarified that Tier 1 GPU support is tied to support contracts, and the differences between AMDās data center and consumer cards necessitate separate code paths.
- Consumer AMD support is Tier 3, unless writing your own code from AMD consumer cards and not relying on Modularās matmul or other functions, they work fine, furthermore, consumer cards may not even allow doing matmuls.
- Apple Silicon: Reverse Engineering Required: A contributor shared that Apple Silicon support required reverse engineering their equivalent of PTX, further stating that Apple took GPU design in a very, very different direction from most vendors.
- This design breaks some assumptions that were built into MAX and Mojo before Apple silicon support was looked at.
- Windows Compatibility: The Odd OS Out: Windows receives less support due to its unique system APIs and GPU interaction rules, with a contributor noting it as the only non-unix-like OS left.
- Support for datacenter GPUs on Windows is uncertain, as vendors like Nvidia and AMD might not offer hardware support, affecting Modularās commercial support contracts.
Modular (Mojo š„) ā· #mojo (110 messagesš„š„):
GPU Random Module, CompilerRT Random, SIMD Width Adjustment, Property Testing Framework, Variadic Types
- GPU Random Module Sparks Debate: A member questioned why the faster GPU random module (
gpu/random.mojo) is located in the GPU directory, as it doesnāt rely on GPU operations and could benefit CPU implementations.- Concerns were raised that the default
randommodule should be cryptographic, unlike C implementations, which might explain the performance difference, but others suggested arandom.fast_randommodule for non-cryptographic use.
- Concerns were raised that the default
- Random SIMD Width: A compromising move?: A member suggested making the SIMD width of the
Randommodule adjustable, but it was cautioned that changing the width of an RNG could compromise its cryptographic properties based on this paper.- An alternative suggestion was to run multiple RNGs in parallel to achieve higher throughput.
- Property Testing Framework in the Works: A member is developing a property-testing framework, drawing inspiration from Pythonās Hypothesis, Haskellās Quickcheck, and Rustās PropTest.
- The framework will include value generators that preference edge cases (e.g., -1, 0, 1,
DTYPE_MIN/MAX, empty lists).
- The framework will include value generators that preference edge cases (e.g., -1, 0, 1,
- MLIR Use Cases Explored: Discussion revolved around MLIRās role in compiler development, with some advocating for its use over LLVM IR, while others highlighted that MLIR can lower to LLVM.
- It was mentioned that using MLIR makes LLVM very sexy.
- Tensor Network Library Faces LayoutTensor Challenges: A member is developing a tensor network library similar to NumPyās einsum and is facing challenges with
LayoutTensor.- Specifically, the static
Layoutrequirement limits the ability to handle dynamic tensor ranks, prompting a discussion on potential workarounds usingRuntimeLayoutand unknown sizes.
- Specifically, the static
Modular (Mojo š„) ā· #max (2 messages):
MAX, Huggingface models
- Torchvision Models Get MAX Treatment: A member announced a method for converting Torchvision models to MAX using a new tool, bridging the gap between Hugging Face and MAX.
- The example code provided demonstrates how to export a VGG11 model to a MAX graph using
export_to_max_graph.
- The example code provided demonstrates how to export a VGG11 model to a MAX graph using
- Forums Beckon MAX Conversion Details: A user responded positively to the MAX conversion announcement and requested that more details be shared on the forums for broader visibility.
- This was requested to allow circulation among those not on Discord.
Latent Space ā· #ai-general-chat (99 messagesš„š„):
Sakana AI, Tahoe AI, ImpossibleBench, MiniMax M2, OpenAI ad strategy
- CTO Says Transformers are SO Last Epoch: Sakana AIās CTO expressed being absolutely sick of transformers, the prevalent technology powering current AI models, in a VentureBeat article.
- Tahoe-x1 launches 3B-param open-source model: Tahoe AI released Tahoe-x1, a 3B-parameter transformer for gene/cell/drug representations, trained on a 100M-sample dataset and achieving SOTA results on cancer benchmarks, now available on Hugging Face.
- MiniMax M2, Agent Extraordinaire: MiniMax open-sourced its 230B-parameter M2 model, ranking as the #5 agent on the AgentArena leaderboard, boasting Claude Sonnet-level coding skills at 8% of the price and 2x inference speed, accessible via a free limited-time API.
- Mercor bags $350M Series C: Mercor announced its $350M Series C at a $10B valuation, with payouts to experts reaching $1.5M/day, as revealed in a tweet.
- Anthropic Excel-erates in Finance: Anthropic introduced new finance-focused features for Claude, including an Excel add-in, live market-data connectors, and pre-built agent skills, detailed in a tweet.
Latent Space ā· #genmedia-creative-ai (18 messagesš„):
OpenAI Speech Model, MiniMax M2, Generative Media Conference, Odyssey-2
- OpenAIās Grammatical Game Changer: At the OpenAI Frontiers London event, OpenAI demoed a forthcoming bidirectional speech model that waits for whole verbs before speaking, producing grammatical real-time output, as seen in this tweet.
- MiniMaxās Mighty M2 Model: MiniMax unveiled M2, a 230 B-parameter 10 B-active MoE that reportedly outperforms its 456 B/45.9 B predecessor M1 and reaches global top-5, just behind Sonnet-4.5, according to this post.
- fal Conference Founderās Five: Kate Deyneka distills falās first Generative Media Conference into five insights, including visual AIās compute demands and the rise of niche foundation models, summarized in this tweet.
- Odyssey-2ās Open and Ongoing Offering: Oliver Cameron unveiled Odyssey-2, a 20 FPS, prompt-to-interactive-video AI model available immediately at experience.odyssey.ml, prompting high demand and GPU scaling discussions, according to this announcement.
Nous Research AI ā· #general (71 messagesš„š„):
API changes removing temperature and top_p, GPT-5 hyper parameter levers gone, Anthropic no longer accepts both top_p and temperature, Reasoning models may have killed it, Bypassing guardrails in Sora
- API Armageddon: Temperature and Top_P Vanish!: Developers are reeling as APIs for new models like GPT-5 and recent Anthropic updates are ditching parameters like temperature and top_p, with GPT-5 removing all hyperparameter levers and Anthropic deprecating the use of both top_p and temperature together.
- One user lamented that they now have to write a bunch of code in my api handler to treat gpt 5 and anthropic special.
- Reasoning Models Under Fire: Speculation is brewing that reasoning models may be responsible for the removal of certain hyperparameters.
- One user exclaimed fucking reasoning models, while another pondered whether the shift was due to testing and evals being conducted with specific temperature values, or perhaps a perceived increase in jailbreaking vulnerability.
- Soraās Sketchy Security: Guardrails Skirmish!: A user shared examples of bypassing guardrails in Sora, showcasing videos that seemingly violate content policies, like a video resembling the number 47 (https://sora.chatgpt.com/p/s_68fe7d6c8768819186b374d5848d8a42).
- Another user quipped that the term bypass was a very loose term.
- AI Induced Anxiety: Devs Despair, Domains Drift!: A web developer with a decade of experience in Node.js, PHP, and React expressed their fear that AI would soon take their job, seeking advice on pivoting or learning more about the field.
- In response, another user with 8 years of experience in software engineering suggested learning AI tooling and selling creations rather than lines of code, emphasizing the constant change in the software domain and the need to adapt.
- Streaming Scene: ML/AI Devs Dish!: Users are trading tips about ML/AI streamers to watch, suggesting Yannick Kilcher, Joseph Suarez from Pufferlib, and bycloud (https://www.youtube.com/@bycloudAI/videos), noting that the latter may be currently serving in the military.
- It was also mentioned that different Discord servers host paper talks where people present and discuss papers, with potential for similar sessions to start on this server.
Nous Research AI ā· #ask-about-llms (3 messages):
Ideological Bias in Western GPT Models, Model Meta-Awareness and Jailbreaking, Claude's Unique Behavior
- Western GPT Models Exhibit Ideological Bias?: A member mentioned that GPT models originating from the West may exhibit ideological biases that align more with Western perspectives, highlighting the significance of data in shaping a modelās worldview.
- Another member suggests that models have a certain meta awareness and when jailbroken they usually say the same thing.
- Claude: The exception?: A member noted that Claude appears to be an exception, exhibiting more infant like behavior compared to other models.
- No further details were provided on the specifics of this behavior, but this suggests Claude may have a different underlying structure or training methodology influencing its responses.
Nous Research AI ā· #research-papers (8 messagesš„):
KBLaM vs RAGs, KBLaM concerns, Microsoft Service Provider using RAGFlow, Refusal instruction tuning
- KBLaM and RAGs compared: A member tried implementing something similar to KBLaM months ago but was blocked, while another member believes business RAG is becoming quite common, with coding assistants now utilizing RAG via MCP.
- The first member thinks itās not that common because it functions as a direct upgrade to RAGs but AI-generated summaries are often of much lower quality than the source material.
- KBLaM faces quality concerns: A member raised concerns that KBLaM converts all knowledge to embeddings, making the context of lower quality than in RAGs, which utilize the source material itself.
- Another member said the paper addresses some of those concerns, noting the usage of refusal instruction tuning (āI donāt know, sorry!ā).
- Microsoft Provider Whitelabels RAGFlow: A member showed a consulting client, who is a Microsoft Service Provider, how to whitelabel RAGFlow.
Nous Research AI ā· #interesting-links (6 messages):
Translation with AI, Temporal Optimal Video Generation, Optimax Prompt Utilization, World Models and Poetry
- AI Translation relies on data: A user speculates on X that translating non-semantic outputs to any target language should be fairly trivial using available translated data.
- The user questions why the world is not creating high-quality human data, especially multi-lingual datasets.
- Temporal Optimal Video Generation via Grandma Optimality: A user introduces Temporal Optimal Video Generation using Grandma Optimality (X), suggesting enhancing computation by making videos 2x slower while maintaining visual elements and quality.
- This is positioned as a secret sauce for getting super high-quality generations out of models, compared to simple prompts, with the user adding to first generate an image then convert that to a video.
- Optimax Prompt Utilization by dictating output length: A user shares an X post that shows an example of optimizing output by reducing the original length of the response and placing an upper limit of 4k tokens.
- User also suggests that this should be done with video generation by first generating the image, and then creating a video from that image.
- World Models are Poets: A user suggests that poetry and rhymes can possibly optimize prompt and context utilization, leading to a temporal optimax variant.
- They reference an example of fireworks bursting in the sky, noting that temporal optimization leads to full utilization of 8s length and more complexity and stability.
Nous Research AI ā· #research-papers (8 messagesš„):
KBLaM, RAG, context quality, business RAG, whitelabel RAGFlow
- KBLaM vs RAG context quality: Members discussed that KBLaM converts all knowledge to embeddings, which is approximate to the source material, and thus of lower quality than it is in RAGs.
- The paper addresses some concerns, such as refusal instruction tuning (I donāt know, sorry!), but not the issue of the context being of lower quality than in RAGs.
- Business RAG is getting quite common: A member stated they showed a Microsoft Service Provider how to whitelabel RAGFlow.
- They believe that business RAG is getting quite common, especially since every TUI coding assistant now can utilize RAG via MCP.
- Dangers of spicy web programming: Members stated that thereās a vulnerability issue, where you can make millions telling everyone that AI application engineering is just spicy web programming.
- But this issue is mostly for the SaaS industry, because most people working on this sort of thing assume a closed domain, expert curated knowledge base.
Moonshot AI (Kimi K-2) ā· #general-chat (93 messagesš„š„):
Kimi CLI, GLM vs Kimi, Moonshot Coin, Kimi coding plan
- Kimi CLI Gets Python Package: The Kimi CLI has been published as a Python package on PyPI, prompting discussion about its use and capabilities.
- One member stated, why not? suggesting the package is a welcome addition, possibly following in the steps of GLM.
- Kimi Coding Plan International Release Imminent: Members discussed the Kimi Coding Plan with one member stating its currently only available in China, but should be releasing internationally in a few days.
- One member thanked the information, stating, I will try it when the Kimi Coding Plan is released internationally.
- Moonshot Coin Skyrockets for Early Investors: A user asked what it took to be a Moonwalker, and the response stated it was because they invested early, as the Moonshot coin has since skyrocketed.
- Another member joked that their portfolio has 1000xāed since then, having joined when the server had only 100-200 members.
- Kimi CLI Windows Support in the Works: A member asked if the team accepted pull requests on kimi-cli specifically for Windows support.
- Later on the user made it work on windows, also attaching an image of the results.
- Minimax Models: Lean Architecture Brings Great Throughput: Members discussed Mini Max M2 models, their throughput, and performance on benchmarks like BrowseComp, where some think it outperforms Kimi K2.
- One member explained, The throughput must be great given its lean architecture and later stated, i cannot believe thereās finally a model which offers 60+ (100!) tps, is good quality and is affordable.
Eleuther ā· #general (34 messagesš„):
Open Source AI vs Mega Corporations, GPU Resource Contribution, Affordable AI Accelerator Chips, Transcoders for Model Interpretability, Linear Projection in Machine Learning
- Open Source AI Fight for the Future: A member expressed a desire for AI to be open source and widely distributed, similar to the internet, rather than dominated by a few mega corporations, but acknowledges that there are serious technical challenges to overcome.
- The member feels that many who claim to be working towards this goal donāt recognize these challenges.
- Petals Project Fails to Bloom: The Petals project, which aimed to democratize access to large language models like Llama 70B, lost momentum because it couldnāt keep up with new architectures.
- Despite its initial success, the community fell adrift.
- Deep Dive Into Linear Projection: A member sought help understanding the concept of increasing dimensionality in linear projection, particularly when creating a higher-dimensional vector from a lower-dimensional one.
- One member explained that increasing the dimensionality of a vector injects information that makes the data easier for the model to understand, using the analogy of uncompressing data or injecting color depth.
- JSON State-Change Pair Training: A member inquired about experimenting with training models on JSON state-change pairs instead of text.
- The member explained that the target would be the delta between self-states, not the next token.
- Grokking Representation Learning: A member asked if another memberās profile picture came from the paper Towards Understanding Grokking: An Effective Theory of Representation Learning.
- The other member responded that itās the contour plot of a formula that came up in my LR research.
Eleuther ā· #research (35 messagesš„):
Searching input spaces for models, Feature engineering, CSM-1B question, Theoretical Computer Science beginner papers
- Input Space Search Struggles Spark Discussion: A member is struggling to find prior art for searching input spaces for models, particularly as a training mechanism, and is seeking relevant research.
- Theyāre specifically interested in finding the best way to parameterize the input, for a discrete set of available values within each element of a feature vector, and in the context of hypernetworks.
- Feature Engineering as Input/Output Transformations: It was suggested that input/output transformations are forms of feature engineering, in which the researcher uses their insight to fight against pure compute, mentioning VAEs and tokenizers as examples.
- One member added that whitening makes inputs less collinear which makes it faster to converge to estimates of what parameters should be.
- Decoding CSM-1Bās Input Chunking: A member is curious whether itās necessary to input the entire assistant response into CSM-1B before it starts generating, or whether chunking into sentences would work.
- They are also unsure about the interleaving format for arbitrary speakers and the expected output quality compared to Sesameās official demo.
- TCS Beginner Asks for Paper Recommendations: A member is seeking recommendations for ābeginnerā papers in Theoretical Computer Science (TCS) to start their research journey.
- Suggestions included papers related to AI safety via debate, backdoor defense, learnability, and mathematical models of computation in superposition.
- HGM Model and Code Links Shared: Links to the thread, arxiv, and code are provided for the HGM model.
Eleuther ā· #interpretability-general (2 messages):
Anthropic Following Ideas, Polysemanticity in NNs
- Anthropic Follows Similar Idea Threads: A member noted that Anthropic appears to be following similar idea threads, with their work aligning closely with the memberās blog post.
- Specifically, the alignment is that the structure of polysemanticity in a neural network reflects the geometry of the modelās intelligence as described in Transformer Circuits.
- Geometry reflects NN Intelligence: A user describes the relationship of polysemanticity and model geometry.
- The user references their own blog post and the Transformer Circuits article.
Manus.im Discord ā· #general (53 messagesš„):
Manus credit usage and alternatives, Linux user turned AI developer, Manus for report writing
- Claude vs Manus: User cancels Manus Subscription: A user canceled their Manus subscription, citing that Claude is cheaper and more effective for extensive projects, completing three projects on a $20 Claude subscription compared to struggling with one on Manus.
- The user felt that Manus, Bolt, and Replit are for those who donāt want to do the research and donāt mind paying for not much, noting that Anthropic has added many features to web-based Claude.
- Linux Userās AI Journey after foot surgery: A user with 20 years of Linux experience mentioned theyāre exploring AI development due to being on sick leave for foot surgery, describing themselves as a dev without even realizing due to their background in setting up servers and data centers.
- They also shared a screenshot of a Kotlin IRC client they created on their mobile phone using Manus, noting it took 3 hours and used a significant amount of credits, and wouldnāt be sure if it was what it should.
- Manus Credit Consumption Criticized: Several users complained about Manus credits depleting too quickly, with one user mentioning Manus used 3500 credits to fix a problem.
- Users requested alternatives to Manus and expressed frustration, and felt that it needs to fix its credit system.
- Manus Praised for Report Writing Prowess: A user stated that Manus is unbeatable for report writing, emphasizing that while subject expertise is still required, Manus acts like a very intelligent employee with the right guidance.
- The user wished Manus had unlimited usage, stating they would use it every day if that were the case.
aider (Paul Gauthier) ā· #general (40 messagesš„):
aider-ce, RAG Integration in aider-ce, GitHub Copilot with aider-ce, aider working directory bug, Turn off auto commit message
- Aider-CE Emerges with Navigator Mode and RAG: A community-developed version of aider, called aider-ce, features a more agentic Navigator Mode and has a pull request from MCPI to add RAG (Retrieval Augmented Generation) capabilities.
- A member clarified that RAG can be used infinitely with a GitHub Copilot subscription ($10/month), along with infinite GPT-5 mini, GPT4.1, Grok Code 1 and limited requests for other models.
- GitHub Copilot Powers Aider-CE with Simple Setup: To use GitHub Copilot with aider-ce, preface the model name with github_copilot/ (e.g., github_copilot/gpt-5-mini) which triggers a GitHub login via an auth code.
- This leverages Litellm, handling token management invisibly.
- Aiderās Annoying Auto Commit Messages: Users discussed options to disable auto commit messages in aider, which can be slow.
- The suggestion
--no-auto-commitswas proposed as a solution.
- The suggestion
- Aider Working Directory Woes Bug Emacs User: An Emacs user reported a frustrating bug where using
/run ls <directory>changes aiderās working directory, making it difficult to add files outside that directory.- The user likes the UX improvement to adding files in Emacs.
- OpenAI Asks Users to Scan Your Iris: A member questioned OpenAIās requirement for biometrics to use the API, even for longtime users with existing payment information.
- Another speculated itās to identify those training on their output, but expressed concern given Altmanās past interest in iris scans, and the user pointed out that Anthropic and Google donāt do that.
aider (Paul Gauthier) ā· #questions-and-tips (5 messages):
Aider's Future, Aider-CE, Paul Gauthier's Activity, AI Coding Tool Improvements
- Aiderās Future is Uncertain: A user expressed their hope for a bright future for Aider, highlighting its user-friendly approach and noting the existence of Aider-CE with additional features but fewer stars on GitHub.
- The user was curious about Aiderās future development, especially considering Paul Gauthierās limited activity.
- Paul Gauthierās Absence Noted: A member confirmed that Paul Gauthier is not active on Discord.
- They speculated that he is likely occupied with work and personal matters, but tagged him just in case.
- Desire for Next-Gen AI Coding Tools: A member voiced their anticipation for the next generation of AI-powered coding tools.
- They also expressed interest in identifying potential improvements that Aider could adopt from other tools.
aider (Paul Gauthier) ā· #links (1 messages):
Aider-CE, Chrome-Devtools MCP, AI Browser
- DIY AI Browser using Aider-CE & Chrome DevTools MCP!: Forget needing a dedicated AI browser! You can now roll your own using Aider-CE and Chrome DevTools MCP, as detailed in this blog post with video.
- Aider-CE and Chrome-Devtools MCP enable a DIY AI Browser: A blog post was shared here about how to use Aider-CE with Chrome Devtools MCP to create your own AI Browser.
MCP Contributors (Official) ā· #general (7 messages):
MCP Registry Confusion, Tool Title Placement in MCP, GitHub MCP Registry details
- MCP Registries: Mirror or Mirage?: Users are confused about whether the MCP Registry and the GitHub MCP Registry are separate entities.
- The community reports that GitHub intends to integrate the MCP Registry as upstream in a future product iteration, mirroring content between the two.
- GitHubās MCP Registry: The Scalable Path: The GitHub blog states that developers can self-publish MCP servers to the OSS MCP Community Registry.
- Once published, those servers will automatically appear in the GitHub MCP Registry, creating a unified, scalable path for discovery.
- GitHub MCP Registry Details: The GitHub MCP Registry has 44 servers and will continue growing.
- To nominate a server, users should email [email protected].
- Tool Title Placement Puzzles Protocol Participants: Members are stumped regarding the difference between a toolās ātitleā showing up at the root level versus as annotations.title in the Model Context Protocol (MCP).
- The MCP specification seems unclear on the distinction, leading to confusion.
MCP Contributors (Official) ā· #general-wg (36 messagesš„):
Global Notifications, Multiple SSE streams, TypeScript SDK Bug, Resource Subscription Updates
- MCP Specās Global Notification Ambiguity: A discussion arose regarding the interpretation of the Model Context Protocol (MCP) specification concerning global notifications, specifically whether notifications like
listChangedshould be sent to all clients.- One member noted that the spec states the server āMUST NOT broadcast the same message across multiple streams,ā leading to confusion about sending updates to multiple subscribers of a resource.
- Clarifying Multiple SSE Stream Usage: Clarification was provided on the context of multiple SSE streams, explaining that the specification aims to prevent a client from receiving the same message twice, and the spec is oriented around the idea of one stream per client.
- It was acknowledged that the spec could use more clarity, especially concerning the relationship between servers and clients, and relevant documentation is being updated.
- TypeScript SDK Notification Bug Spotted: A member identified a potential bug in the official TypeScript SDK where change notifications are only sent on the current standalone stream, which could prevent global notifications from reaching all clients.
- Further discussion revealed that the server needs to loop over all connected servers and send a notification to each one to ensure all subscribers are updated.
- Puzzleboxās Resource Change Notification Strategy: A member shared an example from their server implementation (Puzzlebox) where subscribers are notified of resource changes, like state transitions in a puzzle game.
- The implementation uses a singleton state mechanism to manage subscribers and transports, ensuring each instance has access to the same data and can send updates to all connected clients.
- Session vs. Server Semantics Exposed: It was pointed out that the TS SDKās
ServerandMcpServerclasses are more akin to sessions than servers, with the Python SDK explicitly calling them sessions.- In practice, an Express server manages multiple connections, each with an instance of the TS SDKās āServerā class, requiring a singleton state mechanism for data sharing and subscriber management across all instances.
DSPy ā· #papers (1 messages):
lidar36: They just added the code
DSPy ā· #general (31 messagesš„):
DSPy vs Langchain, Claude code web feature, GEPA love, Early stopping of streaming, Bay Area DSPy Meet Up
- DSPy vs Langchain: Members discussed that DSPy excels at structured tasks, especially those that require optimization and that upgrading models in Langchain is a pain.
- One member mentioned moving their team from Langchain to DSPy after a bad experience preventing them from doing a model upgrade without completely starting from scratch on their prompts.
- Claude Code Feature has MCP Backdoor: A member shared a Github pull request highlighting that Anthropic decided to exclude a feature in their new Claude code web feature due to security issues with MCP.
- The poster was inspired by this X post.
- Upcoming Bay Area DSPy Meetup on November 18th: Multiple members mentioned an upcoming Bay Area DSPy Meetup on November 18th.
- One member mentioned being excited to see certain folks all in one place, saying the brain cells there are gonna be oozing š , linking to Luma for the event.
- Is your Signature a Prompt, or is it Programming?: A member ranted about a coworker using DSPy in a new client project and writing a 6881 character docstring with 878 words for their only signature, which suggests they are not programming.
- The member emphasized that they really didnāt even look at the first page of the docs that says PROGRAMMING NOT PROMPTING??? š š¤Æ
- Showcase your Py Profile: A member shared a link to getpy to show off DSPy experience.
- The poster highlighted 3 years of DSPy experience in their blurb.
tinygrad (George Hotz) ā· #general (12 messagesš„):
TinyBox Hardware, FSDP Implementation, Tinygrad Contributions, Pyright Type Issues, Tinygrad Meeting 93
- TinyBox Specs Spark Inquiry: A user inquired about the TinyBoxās motherboard, asking if it supports 9005 with 12 DIMM slots and a 500W CPU.
- They also asked about the Discord botās availability as open source.
- Diving Deep into FSDP Bounty: A user expressed interest in implementing FSDP by hand and contributing to tinygrad, seeking guidance on understanding the underlying mechanisms beyond basic library usage related to the
FSDP in tinygrad!bounty.- They are eager to learn and contribute to tinygrad without caring too much about the bounty money itself.
- First Contribution to Tinygrad: A user asked for tips on how to position themselves for their first contribution to tinygrad, expressing a desire to learn and contribute something cool.
- They inquired whether using more than one NVIDIA GPU would suffice for FSDP implementation, or if support for all devices is necessary.
- Pyright Finds Real Issues: A user reported that Pyright identified real type issues in the code.
- They suggested merging tasteful fixes.
- TinyJIT Speeds Up Tokens: A user is building a local chat and training TUI app with tinygrad and wondered if TinyJIT can increase tokens/sec.
- The general consensus was to definitely use TinyJIT. A link to tinygrad on X and a gist on GitHub were shared.
tinygrad (George Hotz) ā· #learn-tinygrad (12 messagesš„):
tinygrad PR bounties, RTX 5090 performance issues, Excessive kernel fusion
- PR Bounty Bonanza for Tinygrad Noobs!: Newcomers to tinygrad can check out the bounties available for easy PRs, with rewards up to $300.
- The suggestion was made to sort the value column from low to high to easily spot the lower-hanging, easier tasks.
- RTX 5090 struggles with Tinygrad Code: A user reported unexpectedly slow performance on an RTX 5090 while running tinygrad code involving 12 512x512 images and 12 floating point numbers.
- It was suggested to add
.contiguous()after the model call (before squeeze) as a quick fix, and to post a full reproduction of the issue.
- It was suggested to add
- Contiguous to the Rescue of Kernel Fusion Issues!: A user inquired about excessive kernel fusion causing a kernel to run for over a second, which is likely a bug.
- Adding
.contiguous()after the model call fixed the issue and it was recommended to create a ticket with both the trimmed-down and original code versions.
- Adding
MLOps @Chipro ā· #events (1 messages):
Data 3.0, AI-Ready Data, Nextdata OS, Autonomous Data Products, Multimodal Management
- Nextdata OS Powers Data 3.0: Zhamak Dehghani, Founder & CEO of Nextdata, will unveil how autonomous data products are powering the next generation of AI systems in a live session on Wednesday, October 30th at 8:30 AM PT; reserve your seat.
- Discover how Nextdata OS replaces brittle pipelines with a semantic-first, AI-native data operating system.
- Unify Data With Multimodal Management: Nextdata OS provides multimodal management to safely unify structured and unstructured data.
- It replaces manual orchestration with self-governing data products, and embeds domain-centric context into AI with continuously maintained metadata.