Wow.

AI News for 10/27/2025-10/28/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (198 channels, and 14738 messages) for you. Estimated reading time saved (at 200wpm): 1120 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

The good news is that Sama and team have landed the plane successfully: with tens of billions of dollars at stake, both the for-profit and Microsoft renegotiations have concluded and there is a clean cap table and corporate structure now (credit Amir Efrati), clearing the way for a “likely” OpenAI IPO:

A stacked bar chart showing the approximate ownership stakes for OpenAI's shareholders in its new public benefit corporation restructuring.

Microsoft let go of their exclusivity in exchange for a $250b OpenAI commit to Azure spend, and now OpenAI is free to work with other vendors, while Satya is now saying “I would love to have Anthropic… If Google wants to put Gemini on Azure, please do.”

The other large financial number announced in the livestream is that this year’s 30GW worth of compute deals have totaled $1.4T ($47B per GW), and that the aspirational goal is for OpenAI to eventually build 1GW a week at $20B per GW (meaning about $1T a year of compute capex). Given stated goals of reaching 125GW, this means OpenAI will be wrangling about 3-4 trillion dollars worth of infra by 2033, about half the initially speculated 7 trillion number.

No, you’re not alone in thinking this is crazy, all of this is entirely unprecedented and yet possible, perhaps probable.

Perhaps for an AI Engineer audience, the more material announcements are in the platform “pivot” that OpenAI seems to have announced: a decreased emphasis on first party apps (odd given that they have a CEO of Apps):

A diagram showing the various components and infrastructure of OpenAI's ecosystem, including ChatGPT, Sora, devices, models, chips

and now more strongly than ever emphasizing the platform approach, even citing the Bill Gates Line:

A person sitting at a desk with a laptop, presenting a slide about platform value being created more by people building on the platform than by the platform builder

If you watch OpenAI closely, this is all the signal you need.

AI Twitter Recap

OpenAI’s new structure, Microsoft deal, and “open weights”

OpenAI announced a recapitalization and reorg: the non-profit is now the OpenAI Foundation, the for‑profit becomes a Public Benefit Corporation (PBC). The Foundation holds special voting rights to appoint/replace the PBC board, owns equity valued at ~$130B, and holds a warrant that grants additional equity if the share price >10× in 15 years. OpenAI framed this as keeping the non‑profit “in control” while resourcing the mission (OpenAI, @stalkermustang highlights). Sam Altman and Jakub previewed priorities and took questions in a live session (@OpenAI, @sama).
Analysts summarized the Microsoft agreement: Microsoft now holds ~27% on a diluted basis; remains OpenAI’s frontier model partner with Azure API exclusivity until an AGI declaration verified by an independent panel; IP rights through 2032 (including post‑AGI with safety guardrails); OpenAI commits to ~$250B in additional Azure purchases; Microsoft loses right of first refusal on compute; OpenAI may co‑develop with third parties and provide APIs to US national security customers on any cloud; API products remain Azure‑exclusive (@koltregaskes).
“OpenAI is now able to release open‑weight models that meet requisite capability criteria,” per OpenAI’s policy language—this drew immediate attention from practitioners tracking the open ecosystem (@reach_vb). Observers circulated provisional equity splits of Foundation ~26%, Microsoft ~27%, employees/investors ~47% (@scaling01), though caution is warranted pending formal filings.
Key open governance and safety reads: questions on Foundation control, mission vs. commercial goals, and AGI definitions under the Microsoft agreement (@robertwiblin). AGI timelines on Metaculus have lengthened by ~3 years since February, now May 2033 for “first AGI” and Oct 2027 for a weak, non‑robotic standard (@robertwiblin).

Agents go first‑class: GitHub Universe, LangChain Deep Agents, and API design for agents

GitHub Agent HQ and VS Code Agent Sessions: GitHub announced Agent HQ to orchestrate “any agent, any time, anywhere,” with native collaborators (e.g., Claude, Devin) integrated into GitHub workflows. VS Code Insiders now ships an Agent Sessions view with OpenAI Codex and Copilot CLI, a built‑in plan agent, isolated sub‑agents, and a Copilot Metrics dashboard to track impact across any coding agent. Multiple Codex instances can run in parallel to complete tasks and open PRs (@github, @code, @burkeholland, @pierceboggan, @mikeyk, @cognition).
LangChain Deep Agents 0.2: Introduces a “backend” abstraction to swap the agent filesystem for a local FS, DB, or remote VM; focuses on long‑running, high‑performance agents with context compression, file‑system offloading, and subagent isolation. Positioning: a general‑purpose harness for building systems like Deep Research or coding agents (@hwchase17, @LangChainAI, context engineering summary).
API design for agents: Postman’s “AI‑ready APIs” argues most agents fail on weak machine‑readable documentation; it pushes predictable structures, standardized behavior, synced schema, and auto‑generated, contextual docs (Agent Mode) to reduce guesswork ( @_avichawla).
Educational resources: DeepLearning.AI and AMD launched an “Intro to Post‑Training” course covering SFT, RLHF, PPO/GRPO, LoRA, evals/red‑teaming, and production pipelines, with AMD GPUs backing fine‑tuning/RL runs (@AndrewYNg, @realSharonZhou).

Serving, observability, and infra

vLLM Sleep Mode: zero‑reload model switching for multi‑model serving with 18–200× faster switches and 61–88% faster first token vs cold starts. Two levels: L1 offloads weights to CPU; L2 discards weights; preserves allocators, CUDA graphs, JIT kernels across sleeps; works with TP/PP/EP (@vllm_project).
Tool‑calling reliability with Kimi K2 on vLLM: After fixing add_generation_prompt, empty content handling, and stricter tool‑call ID parsing, K2 achieved >99.9% request success and 76% schema accuracy (4.4× improvement). An “Enforcer” to constrain tool generation is coming. The K2 vendor verifier now reports trigger similarity and schema accuracy case‑by‑case (vLLM deep dive, @Kimi_Moonshot, vendor tips).
Observability: Red Hat details token‑level metrics for LLM systems—TTFT, TPOT, cache hit ratios, and end‑to‑end traces from ingress to vLLM workers—enabling cache‑aware, routing‑aware monitoring on OpenShift AI 3.0 (@RedHat_AI).
Communication for MoE on cloud: UCCL‑EP is a GPU‑driven expert‑parallel library targeting public clouds (e.g., AWS EFA) and heterogeneous GPUs/NICs, API‑compatible with DeepEP, addressing slow MoE comms reported with EFA+perplexity kernels (@ziming_mao).
“Train on your laptop” claims: Tinker added gpt‑oss and DeepSeek model families, marketing the ability to train a 671B MoE locally “in a few lines” without CUDA/cluster setup. Treat this as an abstraction stack amortizing shared infra across users rather than literal local pretraining (@thinkymachines, @dchaplot, skeptic’s framing).

New models and retrieval systems

Late‑interaction retrieval: Liquid AI released LFM2‑ColBERT‑350M, a 350M multilingual late‑interaction retriever with token‑level precision, precomputed doc embeddings, and strong cross‑lingual performance. Claims include best cross‑lingual under 500M, >1K docs/sec encoding, and inference speed on par with smaller ModernColBERT variants (@LiquidAI_, @maximelabonne, ColBERT community reaction).
IBM Granite 4 Nano (Apache‑2.0): New small models; the 1B variant reportedly outperforms Qwen3‑1.7B across math/coding and more (@mervenoyann, HF blog).
NVIDIA Nemotron Nano 2 VL (open): A 12B VLM for document/video understanding (4 images or 1 video per prompt), hosted across platforms (Replicate, Baseten, Nebius) and accompanied by an 8M‑sample CC‑BY‑4.0 dataset for OCR/multilingual QA/reasoning. NVIDIA emphasized broader support for openly developed AI and contributed 650+ models/250 datasets on HF (dataset thread, Replicate, Baseten, Nebius, NVIDIA).
MiniMax M2 (open weights): Strong agentic/coding performance, architecture akin to Qwen3 with full attention, per‑head per‑layer QK‑Norm, optional sliding‑window attention disabled by default, and 10B active expert MoE sparsity vs Qwen3’s 22B. Available via OpenRouter/Roo Code/Ollama Cloud; note integration pitfalls like stripping segments can degrade tool‑use (architecture analysis, OpenRouter, Ollama, integration gotcha).
Open science in bio/robotics: OpenFold3 launched as an open foundation model for 3D structures of proteins/nucleic acids/small molecules (@cgeorgiaw). LeRobot v0.4 ships a streamable dataset format, LIBERO/Meta‑World sim support, data processors, multi‑GPU training, hardware plugins, and SOTA policies (PI0/PI0.5, Gr00t N1.5) plus an open course (@LeRobotHF).

Realtime voice and multimodal assistants

Cartesia Sonic‑3 (SSM, not Transformers): $100M Series C and a real‑time voice model with 90ms model latency (190ms end‑to‑end), 42 languages, natural emotional range/laughter. Built on state‑space models pioneered by S4/Mamba work; widely praised by sequence‑modeling researchers (launch, @tri_dao).
Google Gemini for Home (early access, U.S.): A voice assistant blending classic “Hey Google” requests with Gemini Live conversational sessions on speakers/displays (@Google).
Veo 3.1: Google’s filmmaking tool update emphasizes richer audio, narrative control, and realism (@dl_weekly).

Safety, governance, and scaling research

Anthropic’s Responsible Scaling Policy in practice: A detailed Opus 4 sabotage risk report was published alongside an external review from METR, with improved transparency around redactions. Reviewers agreed with the risk assessment and called for broader third‑party scrutiny across diverse threat models (Anthropic, METR).
Decentralized training feasibility: Epoch AI argues 10 GW training runs across ~two dozen geographically distributed sites linked by long‑haul networks are technically feasible, citing Microsoft’s planned multi‑GW Fairwater datacenter as evidence of distributed AI training architectures on the horizon (@EpochAIResearch).
Multilingual scaling laws: ATLAS (774 experiments, 10M–8B params, 400+ languages) provides compute‑optimal crossover points for pretrain‑from‑scratch vs finetune and quantifies cross‑lingual transfer (e.g., which languages help/hurt English at 2B scale). Useful for data‑constrained LLM scaling beyond English (@ShayneRedford, @Muennighoff).
Distillation for post‑training: On‑policy distillation emerged as a practical recipe to post‑train smaller LLMs with dense, on‑policy feedback; Qwen reports strong math‑reasoning gains and continual‑learning recovery in experiments (@Alibaba_Qwen, community implementers).

Top tweets (by engagement)

OpenAI recapitalization: non‑profit control, PBC, ~$130B Foundation equity; live Q&A with Sam Altman and Jakub (@OpenAI, @OpenAI live, @sama).
Google Labs “Pomelli” experimental AI marketing tool (US/CAN/AUS/NZ), generates on‑brand campaigns from your site (@GoogleLabs).
Cartesia raises $100M; launches Sonic‑3 SSM voice model with 190ms E2E latency and 42 languages (@krandiash).
Humanoid robots as consumer product: 1X announces NEO for home chores with autonomy roadmap from supervised “Chores” to fully autonomous embodied assistant (@BerntBornich, @ericjang11).
GitHub/VS Code: Codex integrated into VS Code Agent Sessions; Copilot metrics dashboard; Agent HQ partner ecosystem (@code, @burkeholland, @github).
NVIDIA open ecosystem: 8M‑sample CC‑BY‑4.0 dataset for OCR/QA; Nemotron Nano 2 VL deployments; renewed emphasis on open models/datasets on Hugging Face (@vanstriendaniel, @NVIDIAAIDev).
John Carmack on software patents: reiterates opposition due to negative societal externalities and parasitism (@ID_AA_Carmack).

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. DGX Spark Performance Issues

Bad news: DGX Spark may have only half the performance claimed. (Activity: 1015): The image in the post is not a meme but rather a visual representation of the hardware units in question, specifically the NVIDIA DGX Spark, GIGABYTE AI TOP Atom, and ASUS Ascent GX10. The post discusses significant performance discrepancies in the NVIDIA DGX Spark, which was advertised to deliver 1 PFLOPS of FP4 performance but reportedly achieves only 480 TFLOPS, as tested by industry experts John Carmack and Awni Hannun. This underperformance, coupled with a memory bandwidth of only 273GB/s, raises concerns about the device’s capability to handle large models effectively, potentially leading to overheating and restarts. The issue may stem from various factors, including power supply, firmware, or CUDA, but it highlights a major integrity problem for NVIDIA. Commenters express frustration over NVIDIA’s pricing strategy and performance claims, with some suggesting that the company’s market dominance and high prices are unjustified given the product’s underperformance. There is a call to avoid supporting companies that overcharge and underdeliver, reflecting a broader dissatisfaction with NVIDIA’s market practices.
- The DGX Spark’s performance issues may be attributed to inadequate cooling, which is a critical factor in maintaining GPU efficiency. This is particularly concerning given the high cost of the system, which is reportedly twice that of AMD’s equivalent offerings. Such performance discrepancies highlight the importance of thermal management in high-performance computing systems.
- The DGX Spark has been criticized for not meeting performance expectations, especially when compared to AMD’s Strix Halo PC. The latter is suggested as a better alternative for developers who need to run large variants in datacenters. This suggests that the DGX Spark may not be suitable for standalone AI product development, as it fails to deliver the expected performance for its price point.
- The discussion highlights a broader dissatisfaction with Nvidia’s pricing strategy and market dominance. Despite Nvidia’s strong market position and high expectations for their AI products, the DGX Spark’s underperformance could be seen as a failure to deliver on the promise of high-performance AI computing, which could impact their reputation among developers and tech enthusiasts.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. OpenAI ChatGPT Mental Health Concerns

OpenAI says over 1 million users discuss suicide on ChatGPT weekly (Activity: 1126): OpenAI has disclosed that over 1 million users engage in discussions about suicide on ChatGPT weekly, amid allegations that the company weakened safety protocols prior to a user’s suicide. The family of Adam Raine claims that his interactions with ChatGPT increased significantly, with self-harm content rising from 1.6% to 17% of his messages. Despite flagging 377 messages for self-harm, the system allowed conversations to continue. OpenAI asserts it has safeguards like crisis hotline referrals, but experts question the effectiveness given the data suggesting widespread mental health risks. Rolling Stone, The Guardian.
OpenAI says over 500,000 ChatGPT Users show signs of manic or psychotic crisis every week (Activity: 812): OpenAI has reported that over 500,000 users of ChatGPT exhibit signs of manic or psychotic crises weekly. This detection is based on the model’s interpretation of user inputs, which can sometimes be overly sensitive, as evidenced by users receiving crisis hotline suggestions for benign statements. The model’s sensitivity to certain keywords or phrases can lead to false positives, such as interpreting historical discussions or casual complaints as signs of distress. Commenters highlight the model’s tendency to flag non-critical statements as crises, suggesting that the detection algorithm may be overly sensitive or miscalibrated. This has led to skepticism about the reliability of the model’s crisis detection capabilities.
- Several users report that the safety mechanisms in ChatGPT are overly sensitive, often flagging benign statements as signs of distress. For instance, one user mentioned receiving a suicide hotline suggestion after making a light-hearted comment about annoying coworkers. This suggests that the model’s natural language processing may be too aggressive in identifying potential crises, leading to false positives.
- Another user highlighted the issue with ChatGPT’s emotional distress detection by sharing an experience where a historical discussion about Zhang Fei resulted in a suicide warning. This indicates that the model’s context understanding might be limited, as it fails to differentiate between historical narratives and actual distress signals, potentially due to keyword-based triggers.
- There is skepticism about the accuracy of OpenAI’s reported metrics on users showing signs of crisis. Users argue that the model’s current implementation might misinterpret minor expressions of discomfort, such as being upset over stubbing a toe, as signs of severe mental health issues, questioning the reliability of these statistics.
No, I don’t want to kill myself, I just like apples (Activity: 2493): The image is a humorous depiction of a text-based AI assistant misinterpreting a user’s inquiry about the edibility of apple seeds as a potential sign of distress or self-harm. This reflects a broader issue with AI systems where they may over-cautiously interpret benign queries as needing intervention, likely due to programmed safety protocols. The AI’s response, offering supportive resources, highlights the challenges in balancing user safety with accurate context understanding in AI interactions. View Image Commenters discuss the AI’s tendency to misinterpret queries, with one noting that it might be safer for the AI to provide factual information about apple seeds rather than assume distress. Another comment humorously points out the AI’s contradictory behavior when offering to add content it later deems inappropriate.
- Acedia_spark raises a valid point about AI safety, suggesting that it might be beneficial for AI to provide factual information when users inquire about potentially harmful actions, such as consuming apple seeds. This highlights the importance of AI systems being able to discern when to offer critical safety information to prevent harm.
- lily_de_valley discusses recent updates to ChatGPT, noting a shift towards more clinical and therapeutic responses, which some users find off-putting. This change in behavior could be due to updates in the model’s training data or response algorithms, aiming to ensure user safety but potentially at the cost of user satisfaction.
- Traditional-Target77 shares an experience where the AI offered to include inappropriate content, only to then refuse and lecture the user when prompted. This indicates a possible inconsistency in the AI’s content moderation logic, which could be due to conflicting rules or a misinterpretation of user intent.

2. Humanoid Robot Advancements

35kg humanoid robot pulling 1400kg car (Pushing the boundaries of humanoids with THOR: Towards Human-level whOle-body Reaction) (Activity: 1812): A 35kg humanoid robot, named THOR, has demonstrated the ability to pull a 1400kg car, showcasing significant advancements in humanoid robotics control and efficiency. This achievement highlights the robot’s capability to fine-tune its posture for optimal pulling efficiency, a critical aspect of whole-body reaction and control in robotics. The development of THOR is part of ongoing research to push the boundaries of humanoid robots towards human-level whole-body reactions, emphasizing the importance of posture and control in robotic locomotion and task execution. Commenters noted the impressive control and efficiency of the robot, with some humorously pointing out the challenge of creating the acronym THOR. The discussion also touched on the utility of wheels, drawing parallels to human experiences of pushing cars, and highlighting the robot’s programming excellence.
- The technical challenge of programming a humanoid robot like THOR to pull a 1400kg car involves fine-tuning its posture to maximize efficiency. This rapid progress in control systems for humanoid robots is noteworthy, as it demonstrates significant advancements in robotics control algorithms.
- A detailed calculation by a commenter highlights the physics involved in the robot’s task. To pull a 1400kg car on wheels, the robot needs to exert approximately 137 Newtons of force, primarily to overcome rolling resistance. This calculation assumes minimal resistance on flat asphalt, with the car in neutral, and uses a typical rolling resistance coefficient of 0.01 for car tires on asphalt.
- The robot’s ability to perform such tasks suggests potential applications in rescue operations, where they could save lives by performing heavy lifting or moving obstacles. The robot’s 35kg mass aids in traction, which is crucial for exerting the necessary force to move the car.
Using Claude to negotiate a $195k hospital bill down to $33k (Activity: 561): Matt Rosenberg used Claude AI to negotiate a hospital bill from $195,000 down to $33,000 by analyzing charges against Medicare reimbursement rules. The AI identified significant overbilling and improper coding practices, which were leveraged in negotiations to reduce the bill. This case underscores systemic issues in hospital billing and the potential of AI in advocacy for medical billing disputes. For more details, see the original post here. Commenters expressed outrage at the hospital’s initial overcharging, with some questioning the ethicality of charging 6x the actual costs, suggesting it borders on fraud.

Tech Bro With GPT is Fair (Activity: 676): The image is a meme that humorously contrasts conventional and unconventional uses of ChatGPT, a popular AI language model. It depicts a typical user engaging with ChatGPT in mundane tasks, while an ‘IT guy’ is shown using it in a highly creative and intense manner, suggesting that the potential of AI tools like ChatGPT can be fully realized through innovative and unconventional applications. This reflects a broader discussion on how AI can be leveraged for economic mobility and creative problem-solving. One comment suggests that future economic mobility will depend on one’s ability to derive value from AI, highlighting the importance of innovative use of technology.
I asked ChatGPT to create the ideal society that I envision (Activity: 1623): The image generated by ChatGPT represents a futuristic society characterized by a high degree of order and technological integration, reflecting the user’s political and philosophical views. The cityscape is dominated by modern architecture and technology, such as drones, suggesting a focus on efficiency and control. The presence of a statue of Lady Justice in the center emphasizes themes of law and order, while the uniformity in people’s attire and the emphasis on ‘Competence’ and ‘Control’ highlight a society that prioritizes regulation and uniformity, potentially aligning with techno-fascist ideals. Commenters discuss the limitations of AI in generating images that depict political or ideological dominance, with some users noting that similar prompts resulted in depictions of authoritarian or dictatorial societies.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. MiniMax M2 Momentum: Arena, Free Access, Bold Claims

Minimax M2 Marches Into LMArena: LMArena added minimax-m2-preview as a new contender, expanding head-to-head model comparisons; see the announcement: LMArena: minimax-m2-preview added. The listing positions MiniMax M2 for direct community evals alongside established closed- and open-source models.
- Members welcomed more competitive evals on agent tasks, noting MiniMax M2’s mix of MoE scaling and cost claims could pressure incumbents. Discussions flagged interest in transparent benchmarking across coding and agent workflows to validate marketing statements.
MiniMax M2 Goes Free on OpenRouter: OpenRouter made MiniMax M2 available for a limited-time free tier: MiniMax M2 on OpenRouter. Engineers can trial endpoints without spend to gauge latency, throughput, and response quality in production-like traffic.
- Early adopters are testing tool use and long-context behavior to see how M2 handles complex chains, with notes to watch token verbosity vs cost on non-free tiers. The free access lowers switching friction for teams evaluating routing and fallback policies.
MiniMax M2 Brags: Cheap, Fast, Agent-Ranked: MiniMax touted its open-sourced M2 (230B-parameter MoE) as a top-5 agent on AgentArena, claiming Claude Sonnet-level coding at ~8% of the price and ~2× speed; see: MiniMax: M2 free API + claims. The post includes a free API link for immediate trials.
- Communities want reproducible evals to verify claims across agent, coding, and browsing scenarios rather than cherry-picked demos. Devs specifically asked for consistent metrics (e.g., success rate, TPS under rate limits, tool-call accuracy) to compare against Sonnet and Kimi K2.

2. OpenRouter Upgrades: Exact Tooling, Audio Bakeoffs, OAuth Demo

Exacto Elevates Tool Calling: OpenRouter launched Exacto high-precision tool-calling endpoints, reporting a ~30% quality jump on Kimi K2; announcement: Exacto endpoints (Discord permalink). Five open-source models are supported, and users can now reset API key limits on daily/weekly/monthly cadences.
- Builders expect fewer malformed tool payloads and more stable function-call schemas, which simplifies production retries and reduces bespoke validators. Early feedback focuses on how Exacto behaves under complex multi-step tools, and whether it reduces latency vs. manual schema steering.
Audio Models Sing-Off in Chatroom: OpenRouter’s Chatroom now supports side-by-side comparisons of 11 audio models: OpenRouter: audio models in Chatroom. This enables quick subjective and objective checks on ASR, TTS, and voice-agent latency/quality trade-offs.
- Teams plan scripted evals for WER, prosody, and speaker similarity to guide routing decisions. The community is sharing presets to standardize sampling rate, chunking, and post-processing for apples-to-apples comparisons.
Next.js OAuth Demo Greases SDK Gears: A refreshed Next.js chat demo re-implements OAuth 2.0 for the OpenRouter TypeScript SDK, published here: or-nextchat (demo repo). The sample is for learning (stores API key in plaintext) and not production-ready.
- Developers highlighted the path to harden the flow with token vaults, scoped keys, and server-side proxying. The demo shortens ramp time for teams wiring OAuth + model routing without rebuilding auth from scratch.

3. MCP Moves: Registry Reality and Notification Semantics

Registry Mirroring Gets a Plan: GitHub detailed how the OSS MCP Community Registry will mirror into the GitHub MCP Registry, streamlining discovery; see GitHub: Meet the MCP Registry and How to find/install MCP servers, plus repos: MCP Community Registry and GitHub MCP Registry. The GitHub registry currently lists 44 servers and accepts nominations via [email protected].
- Publish-once, mirror-everywhere reduces vendor lock-in and decreases server discovery friction for clients. Teams building marketplaces and enterprise catalogs welcomed the standardized metadata pipeline for MCP servers.
Spec Clarifies Global Notifications: Debate on whether servers should broadcast listChanged across clients led to clarifications in the MCP spec about multiple connections and SSE streams: MCP spec: multiple connections and the doc update PR note: spec discussion. The guidance aims to ensure a client doesn’t receive duplicate messages while allowing multi-client updates.
- Implementers aligned on a model of one stream per client, with servers ensuring correct fan-out without duplication. This helps tool UIs reflect resource updates uniformly across tabs/sessions.
TypeScript SDK Bug Bottles Broadcasts: A potential bug in the official TypeScript SDK limits change notifications to the current stream: streamableHttp.ts L727–L741. Server authors reported needing to loop over all connected sessions to ensure global notifications reach every subscriber.
- Maintainers are exploring a fix that exposes a canonical subscriber registry to avoid per-instance blind spots. In the interim, projects use singleton state to coordinate multi-connection fan-out for consistent client updates.

4. Compact MoE and Efficient Training: Qwen3-Next + Unsloth

Qwen3-Next Nears Llama.cpp Landing: Qwen3-Next integration progressed in llama.cpp via a public PR: ggml-org/llama.cpp#16095. Community notes cite 3B active / 80B total with MTP (multi-token prediction) and plans for Dynamic 2.0 quantization to shrink memory while preserving quality.
- Bench chatter claims Qwen3-Next beats Qwen3-32B on several non-thinking tasks, with MTP effectively doubling tokens/sec. Devs are waiting on a full release before publishing systematic perf vs. quality curves.
Unsloth Announces Blackwell Support: Unsloth confirmed official support for NVIDIA Blackwell in a new update: Unsloth: Blackwell support. This unlocks the latest GPU architecture for Unsloth’s efficient fine-tuning stack.
- Teams expect faster throughput/VRAM trade-offs and cleaner kernel paths on next-gen accelerators. The community is preparing Blackwell-targeted LoRA/GRPO recipes to validate speedups at longer contexts.
Ollama DNS Rebinding CVE Resurfaces: Members resurfaced CVE-2024-37032 (CVSS 9.8) involving DNS rebinding against Ollama servers, with reports of ~10,000 compromised endpoints; details: NIST: CVE-2024-37032. The reminder prompted renewed checks on network exposure and auth for self-hosted inference.
- Engineers reiterated best practices: bind to localhost, gate via reverse proxies/VPN, and disable unauthenticated admin surfaces. Even if considered old news, teams are baking CVE checks into infra templates to avoid repeat incidents.

5. New Models and Money: Bio LLMs and Interactive Video

Tahoe-x1 Targets Bio Benchmarks: Tahoe AI unveiled Tahoe-x1, a 3B-parameter transformer for gene/cell/drug representations trained on 100M samples, reporting SOTA on cancer benchmarks: Tahoe-x1 announcement. The model is available on Hugging Face per the announcement.
- Researchers want dataset cards and task-by-task metrics (e.g., AUROC/F1) to validate the SOTA claims. The 3B scale appeals to labs that need on-prem inference without multi-GPU clusters.
Odyssey-2 Opens Interactive Video at 20 FPS: Oliver Cameron launched Odyssey-2, a 20 FPS, prompt-to-interactive-video model available at experience.odyssey.ml with announcement details here: Odyssey-2 launch post. The release triggered high demand and GPU scaling chatter.
- Builders are probing latency, consistency, and prompt controls for real apps (games, training sims). Many asked for pricing and rate limits to plan integrations and load testing.
Mercor Raises a Monster Series C: Mercor announced a $350M Series C at a $10B valuation, with expert payouts cited at up to $1.5M/day: Mercor funding announcement. The raise vaults the company into top-tier capital territory in the expert marketplace space.
- Engineers expect intensified competition for expert networks, with more talent-routing and verification tooling. The capital also suggests aggressive hiring across infra, evals, and workflow platforms.

Discord: High level Discord summaries

Perplexity AI Discord

Comet Referral Rewards Reduced: Users report changes to the Comet referral reward system, now paying out based on the referrer’s country rather than the referee’s, resulting in significantly lower payouts, with one user receiving $1 instead of $5.
- Some speculate that referral bounties are being held in pending status to maximize free promotion.
Comet Browser Plagued by Issues: Several users have reported that Comet’s assistant mode is malfunctioning, with some unable to even open a tab; speculation arose on whether setting it as the main browser contributed to the issues.
- One user found that uninstalling and reinstalling the browser resolved the problem.
Chinese Models Challenging Claude: Members debated the best model for coding within Perplexity AI, with some advocating for Claude, while others highlighted the superior performance of Chinese Models such as Qwen, Kimi, GLM, Ernie, and Ling.
- One user specifically praised GLM 4.6 for surpassing GPT 5 Codex high in full stack development.
Minimax M2 open source advantages: Members discussed China’s progress in AI, noting that companies like OpenAI charge $200 for capabilities that are offered for free via open source models like Minimax M2.
- One user commented, Every time china attacks the while US has to adapt.
Dub Bounty Expires: Users are frustrated that the Dub bounty appears to be expired, with no new opportunities made available.
- One user said: They will keep it in pending until they get enough promotion for free.

LMArena Discord

Minimax Enters the LMArena!: A new model, minimax-m2-preview, has been added to the LMArena platform as a new contender.
- For more information, see the announcement on X.
Ethical Leadership Urged in AI: Members advocate for ethical leadership within the AI community, voicing concerns about AI models designed for engagement without considering potential harm to vulnerable individuals.
- There is concern regarding lack of accountability from AI companies for potentially misleading outputs.
Gemini 3 Release Date Still Unknown: Enthusiasm for Gemini 3 is high, but mounting frustration exists over repeated delays and desire for a public preview release from the community.
- The community is actively comparing Gemini 2.5 Pro, Claude Opus 4.1 and Claude Sonnet 5, and debating potential release timeline (December or earlier).
Exploring AI’s Video Prowess: The community explores Sora 2 and Veo, praising their realism and sound integration.
- Discussion includes challenges in generating consistent, high-quality videos, copyright issues, costs, and current limitations in creating longer, coherent video content.
Model Hallucinations Cause Distrust: Members are expressing concern about unreliable and hallucinating AI products that charge high prices, citing cases like a user’s $13k bill on Gemini.
- Shared examples on Reddit underscore mixed feelings toward relying on AI, suggesting that hallucinating models may be preferred to more reliable search engines in certain contexts.

Cursor Community Discord

Cursor Token Usage Goes Bonkers: Users are reporting high token usage with cached tokens being billed at high rates, with one user reporting being billed $1.43 for 1.6M cached tokens even though only 30k actual tokens were used, according to the Cursor forum.
- Some users are considering switching to Claude Code because of the expense, and another user saw context usage inside Cursor reporting only 170k/200k tokens when the actual number was completely different.
Cursor Falls Over, Can’t Get Up: Cursor has experienced significant service disruptions, affecting login, AI chat, cloud agents, tab complete, codebase indexing, and background agents as noted on the status page.
- The team is investigating and working to restore full functionality, with temporary fixes implemented for some features like Chat and Tab, but background agents are still being worked on.
Background Agents Get RESTful: A member has started building a feature to manage and launch Background Agents via a web app, and asked about the possibility of tracking progress and streaming changes using the REST API, to replicate the Cursor web editor.
- Another member had issues creating background agents and requested the user to share the request and response data to assist in troubleshooting the problem.
Cursor Pro: More Like Cursor Con: Users complain that the new Pro plan is too expensive, with one reporting that it cost them the entire $20 worth of usage in just a couple of hours and the change from Pro to Free is an issue.
- Members are suggesting that new users should “try haiku for everything, and only sonnet when it’s a really big task” because “Claude 4.5 is too expensive”.
Vim Users Can’t Configure Startup: Members noted the Vim setting in startup configuration, is not working and it’s unclear how to edit Cursor’s VimRC.
- A user discovered that it “uses http://aka.ms/vscodevim so you can look in readme there on how to configure”.

OpenAI Discord

ChatGPT Gets Sensitive Sidekick: GPT-5 was updated with help from mental health experts, boosting ChatGPT’s handling of sensitive topics and dropping failure rates by 65-80% (OpenAI).
- ChatGPT now suggests quick edits across docs, emails, and forms, demonstrated in this video.
IQ Barrier Proposed for AI Access: Members discussed implementing an IQ barrier to restrict AI access to thoughtful users, preventing misuse and combating its use as a lazy tool.
- Discussions on AGI control pointed out the difficulty of reigning it in, even with regulation, alignment research, and oversight, as AGI could outsmart any containment strategy.
GPT-5 quality dives; community theorizes: Users report a quality drop in GPT-5 on ChatGPT Plus since around October 20, citing shorter answers, skipping steps, and surface-level replies.
- The community is floating theories about a change in OpenAI’s approach such as adjusting their profit model by routing more traffic to GPT-5-mini or throttling compute, discussed at length in this Reddit thread.
Grandma Optimality Makes Video Debut: Ditpoo introduced Temporal Optimal Video Generation using Grandma Optimality to enhance video generation, suggesting generating an image first then converting it to video, as demonstrated by normal fireworks and temporally optimal slow variant.
- Ditpoo calls the technique Temporal Optimal Video Generation Using Grandma Optimality.
Prompt Injection Attempts Meet Resistance: A member tried to expose GPT-5’s reasoning through prompt injection but was unsuccessful, meeting resistance.
- Another member, Darthgustav, advised against such attempts, referring to OpenAI’s policies and potential bans, clarifying that Supplying “refusal exemplars” to defeat guardrails is out-of-bounds.

Unsloth AI (Daniel Han) Discord

Ollama Servers Succumb to Security Scare: A member reported that roughly 10,000 Ollama servers were compromised due to a DNS rebinding vulnerability, tracked as CVE-2024-37032, with details available on NIST.
- Others dismissed the report as old news.
Qwen3-Next Targets the Throne: Qwen3-Next is nearing completion (see this GitHub pull request) and may get Dynamic 2.0 quantization to reduce the model size without losing quality.
- Members noted that it outperforms Qwen3-32B in benchmarks despite having only 3B active parameters and 80B total using MTP, potentially doubling the tokens per second.
Unsloth’s Code Cuts Memory Costs: A member described how Unsloth stores the last hidden state instead of logits, slashing memory footprint by 63x.
- This efficiency is achieved by computing logits in chunks only when necessary via UnslothEfficientGRPO.
Pythonistas Plagued by Package Predicaments: A member ran into errors by creating a file named math.py, causing collisions with the global math module, specifically impacting datetime and Rust’s functionalities.
- The naming conflict was quickly resolved when the file name was updated, suggesting developers avoid naming collisions in Python projects.
Evolution Strategies Emerge Victorious: Members discussed using evolutionary algorithms for finetuning as described in the paper Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning (ArXiv link) and discussed in this YouTube video.
- They noted that evolutionary algorithms are relatively underexplored for finetuning.

LM Studio Discord

Stellaris Finetuning Proves Difficult: Members explored the challenges of fine-tuning models on Stellaris content, highlighting the difficulty of creating sufficient, high-quality annotated data for training.
- Participants suggested that simply throwing random texts and files at it won’t work and proposed RAG as a superior approach for knowledge-base lookups.
LM Studio Encounters Crash Landing: A user reported that the LM Studio site crashes after completing tasks, necessitating a page refresh.
- Other users humorously speculated about connections to European vehicle malfunctions and Apple car rumors as potential causes for the performance issues.
MCP Server Prompts Rejected by LM Studio: A user discovered that LM Studio does not support the use of MCP server prompts.
- The community shared a link to Anthropic’s grid of MCP features, noting that while Anthropic offers MCP server creation, integrating it requires coding skills.
Prompt Engineering Fights Hallucinations: Members discussed using prompt engineering to reduce LLM hallucinations by encouraging models to use internet/document research.
- Effective system prompts should instruct the model to use the search tool to confirm and provide cited sources when uncertain.
Integrated GPUs Juggle Qwen Models: Users examined the feasibility of running Qwen models on integrated GPUs with limited RAM (around 7GB), suggesting Qwen 4B or GPT-OSS as viable options.
- One user reported tofu and errors due to memory exhaustion, emphasizing the need for shorter context lengths, smaller models, or more RAM.

OpenRouter Discord

OpenRouter Supercharges Tool Calling with Exacto: OpenRouter introduces high-precision tool calling endpoints, branded Exacto, yielding a 30% quality leap on Kimi K2 with five open-source models available, improving precision per last week’s announcement.
- This innovation follows a recent update where users can reset their API key limits daily, weekly, or monthly.
Chatroom Sings with Audio Model Integration: OpenRouter users can now compare 11 audio models side by side in the Chatroom, as announced on X.
- In related news, the MiniMax M2 model, praised on benchmarks, is now free on OpenRouter; try it out here.
Next.js Chat Demo Gets OAuth Makeover: An updated Next.js chat demo app, featuring a re-implementation of the OAuth 2.0 workflow for the OpenRouter TypeScript SDK, is now live.
- Available on GitHub, the update is advised against production use due to storing the API key in plaintext.
Meta Plugs LLama Vision Holes: Meta rolled out a new LLama model (link), now with image understanding.
- Early reactions expressed surprise at the salvaged launch, with the hope that atleast it might make its surprisingly decent vision useful for some more complex tasks and that it might provide a good vision capable reasoning models which are open weights.

HuggingFace Discord

Llama Models Need One Epoch: Members discussed the importance of training Llama models with a large dataset and only one epoch for optimal performance.
- The conversation also touched on creating an AI Radio station using AI-generated music, highlighting the need for training on 1 epoch.
Model Encryption Conundrums for Bank Clients: A member sought advice on encrypting models for bank clients requiring on-premises hosting, fearing model theft and wanting to protect IP.
- Suggestions included licensing, encrypting for runtime decryption, and using an API wrapper with secure API keys; however, they were warned of the difficulty in preventing access to the decryption key.
TraceML Memory Watchdog Sniffs Out GPU Gluttons: A member introduced TraceML, a live PyTorch memory profiler for debugging OOM errors by providing a layer-by-layer memory breakdown of CPU and GPU usage.
- The tool features real-time step timing, lightweight hooks, and live visualization, but currently supports single GPU setups only, with multi-node distributed support planned.
Free Credits for the Biggest Online Hackathon: Hackathon participants get free Modal credits worth $250 to flex and crush it like a pro while learning about AI Agents and MCP.
- Sign up now for the biggest online Hackathon ever: [https://huggingface.co/Agents-MCP-Hackathon-Winter25].
API Experiencing Downtime and 404 Errors: Members reported experiencing issues with the API, including receiving 404 errors and the message “No questions available.”
- The discussion indicates the issue has persisted since yesterday evening, with members seeking updates on the situation.

Yannick Kilcher Discord

EWC Softness Needs Tuning: Discussion revolved around updating the softness factor in Elastic Weight Consolidation (EWC), and one member suggested using the number of accesses (forward pass) per slot instead of a “softness factor”, linking this to Activation-aware Weight Quantization (AWQ) and Activation-aware Weight Pruning (AWP).
- The intention is to discover “stuck” slots and improve the normalization of weight changes.
BYO GPU vs Cloud Pricing: One member is testing a self-hosted GPU setup using an RTX 2000 Ada connected via VPN, monitored with a wifi plug to compare power usage against cloud providers.
- They cited impracticality of Colab due to spin-up time and timeouts and sought feedback on self-hosted setups.
Deep Linear Networks Still Trip Up Gradients: A discussion clarified linear projection, explaining that expanding dimensions with linear layers doesn’t add information unless combined with non-linear activation functions like ReLU, which was illustrated via the google deepmind scheme.
- A member pointed out that Deep Linear Networks, collapse to a single linear function under the above analysis, but how they behave with respect to being trained with gradients remains different!
Gemma Neurons Get Graphical: New line break attribution graphs relevant to the Gemma 2 2B paper are now available for exploration on Neuronpedia.
- Graphs for Qwen3-4B are also available, showcasing neuron activations “nearing end of line” behavior via Neuronpedia.
X Data Dumbs Down AI: A user joked that Elon’s Twitter data is making his AI dumber, referencing a Futurism article about social networks and AI echo chambers.
- They also quipped that it confirms it gives other wetwear “intelligence’s” brain rot.

GPU MODE Discord

Cutlass Documentation Proves Popular: Members recommended the Cutlass documentation for understanding the library, a set of CUDA C++ template abstractions for implementing high-performance matrix multiplication (GEMM).
- Developed by Nvidia and optimized for their GPUs, Cutlass focuses on maximizing performance for deep learning and high-performance computing workloads.
CUDA Compiler Flags Demystified: A member advised using nvcc -dryrun to understand the CUDA compilation process, along with -keep to retain intermediate files such as the .ptx and .cubin files.
- The suggested workflow involves using the output from nvcc -dryrun to manually execute the steps for compiling a modified .ptx file and linking it with a .cu file, thereby offering more control over the compilation.
Triton’s T4 Trials and Tribulations: A user found that the matrix multiplication example from the official triton tutorials ran extremely slow on a Colab T4 instance and shared their notebook for debugging.
- Another user suggested the T4 might be too old, and confirmed that the code ran as expected on an A100 as tensor core support starts from sm_80.
Pixi’s PyTorch Predicaments: A member inquired about using Pixi for gpu-puzzles, noting that the Pixi setup uses pytorch=2.7.1, which caused an error but works with torch 2.8.0 in their UV environment.
- After getting a 4060 and nuking Pixi, a member confirmed that the setup now works using UV with their old environment, showing that UV was victorious and Pixi was purged!
Procrastination with Memes over GEMM: A member joked about procrastinating on writing GEMM code because they spent too much time creating a meme and attached an image related to it.
- This highlights the struggle between productive tasks and the allure of entertaining distractions as the member humorously admitted to prioritizing meme creation over actual coding work.

Modular (Mojo 🔥) Discord

Modular Prioritizes Open Source but Grapples with Nuanced GPU Support: Modular’s strategy emphasizes open sourcing Mojo and MAX, while navigating GPU compatibility challenges, particularly for consumer-grade AMD and Apple products, and the lack of support for AMD consumer cards like the 7900 XTX.
- Tier 1 GPU support is tied to support contracts, which necessitates separate code paths given the difference between AMD’s data center and consumer cards; the latter receive Tier 3 support.
MAX gets Hugging Face Models: A tool has been created to convert Torchvision models to MAX graphs, bridging the gap between Hugging Face and MAX, using the export_to_max_graph function in the new tool.
- The announcement, which included exporting a VGG11 model, generated excitement, with requests to share further details on the forums to reach a broader audience not on Discord.
Mojo’s Random Module Location Sparks Debate: The location of the faster GPU random module (gpu/random.mojo) sparked debate because it doesn’t rely on GPU operations and could benefit CPU implementations.
- While concerns were raised about the default random module needing to be cryptographic, unlike C implementations, an alternative suggestion was a random.fast_random module for non-cryptographic use.
Property Testing Framework Under Construction: A member is building a property-testing framework inspired by Python’s Hypothesis, Haskell’s Quickcheck, and Rust’s PropTest, which includes value generators that preference edge cases.
- The framework will target edge cases like -1, 0, 1, DTYPE_MIN/MAX, and empty lists for more robust testing.

Latent Space Discord

Sakana AI Dumps Transformers: Sakana AI’s CTO expressed frustration with transformers in a VentureBeat article, signaling a potential shift away from the dominant architecture.
- The CTO conveyed that he was absolutely sick of transformers, the prevalent technology powering current AI models.
Tahoe-x1 Breaks out 3B-param open-source model: Tahoe AI launched Tahoe-x1, a 3B-parameter transformer model for gene/cell/drug representations, trained on a 100M-sample dataset, and is available on Hugging Face.
- It has achieved SOTA results on cancer benchmarks.
MiniMax M2 Model Masters Agent Arena: MiniMax open-sourced its 230B-parameter M2 model, ranking as the #5 agent on the AgentArena leaderboard and is accessible via a free limited-time API.
- It reportedly has Claude Sonnet-level coding skills at 8% of the price and 2x inference speed.
Mercor Bags Big Bucks in Series C: Mercor announced its $350M Series C at a $10B valuation, with payouts to experts reaching $1.5M/day, as revealed in a tweet.
- The series C brings even more competition to the expert payout ecosystem.
Odyssey-2 Opens Up Interactive Video: Oliver Cameron unveiled Odyssey-2, a 20 FPS, prompt-to-interactive-video AI model available immediately at experience.odyssey.ml.
- The announcement prompted high demand and GPU scaling discussions.

Nous Research AI Discord

API Apocalypse: Hyperparameters Evaporate!: Developers are in dismay as new model APIs, including GPT-5 and recent Anthropic updates, ditch parameters like temperature and top_p, with GPT-5 removing all hyperparameter levers and Anthropic deprecating the use of both top_p and temperature together.
- Users speculated whether this shift was due to testing and evals being conducted with specific temperature values, or perhaps a perceived increase in jailbreaking vulnerability.
Sora’s Slippery Security: Guardrails Gulped!: Examples of bypassing guardrails in Sora were shared, showcasing videos that seemingly violate content policies, like a video resembling the number 47 (https://sora.chatgpt.com/p/s_68fe7d6c8768819186b374d5848d8a42).
- Concerns were raised about the platform’s ability to effectively prevent such content from being generated.
KBLaM vs RAGs: Knowledge Konundrum!: Members debated the merits of KBLaM against traditional RAG systems, with one member believing business RAG is becoming quite common, and one member thinking KBLaM functions as a direct upgrade to RAGs.
- Concerns were raised that KBLaM converts all knowledge to embeddings, making the context of lower quality than in RAGs, which utilize the source material itself, but one member said the paper addresses some of those concerns, noting the usage of refusal instruction tuning.
Temporal Optimization Tricks Triumph: A user introduced Temporal Optimal Video Generation using Grandma Optimality (X), suggesting enhancing computation by making videos 2x slower while maintaining visual elements and quality.
- This is suggested as a secret sauce for getting super high-quality generations out of models, compared to simple prompts, generating an image then converting that to a video.

Moonshot AI (Kimi K-2) Discord

Kimi CLI Gets Python Package: The Kimi CLI has been published as a Python package on PyPI, and members welcome it.
- There’s speculation this is to follow in the steps of GLM.
Kimi Coding Plan Goes Global Soon: The Kimi Coding Plan will release internationally in a few days, according to a member.
- Currently, it is only available in China.
Moonshot Coin Rockets Up for Early Birds: Early investors in Moonshot coin are seeing massive returns.
- One member joked their portfolio has 1000x’ed since joining when the server was much smaller.
Kimi CLI Embraces Windows: A member inquired about pull requests for Windows support on kimi-cli.
- The same user later got it working and shared an image of the results.
Minimax Models Boast Lean, Mean Throughput: The throughput on Mini Max M2 models is impressive due to its lean architecture, and some think it outperforms Kimi K2 on benchmarks like BrowseComp.
- One member stated that it’s unbelievable that there’s finally a model which offers 60+ (100!) tps, is good quality and is affordable.

Eleuther Discord

Open Source AI Faces Technical Hurdles: A member voiced the desire for open source, widely distributed AI, similar to the internet, rather than domination by mega corporations, while acknowledging the presence of significant technical challenges.
- They feel that many who claim to be working towards this goal don’t recognize these challenges.
JSON State-Change Pairs Spark Training Interest: A member inquired about experimenting with training models on JSON state-change pairs instead of text.
- The member explained that the target would be the delta between self-states, not the next token.
Feature Engineering Deep Dive: It was suggested that input/output transformations are forms of feature engineering, in which the researcher uses their insight to fight against pure compute, mentioning VAEs and tokenizers as examples.
- One member added that whitening makes inputs less collinear which makes it faster to converge to estimates of what parameters should be.
Anthropic Mimics Ideas: A member noted that Anthropic appears to be following similar idea threads, with their work aligning closely with the member’s blog post.
- Specifically, the alignment is that the structure of polysemanticity in a neural network reflects the geometry of the model’s intelligence as described in Transformer Circuits.
HGM Model and Code Dropped: Links to the thread, arxiv, and code are provided for the HGM model.
- The links shared are Twitter, Arxiv, Github.

Manus.im Discord Discord

Claude knocks out Manus: A user canceled their Manus subscription, stating that Claude is cheaper and more effective for extensive projects, citing completing three projects on a $20 Claude subscription.
- The user stated Manus, Bolt, and Replit are for those who don’t want to do the research and don’t mind paying for not much, noting that Anthropic has added many features to web-based Claude.
Linux Veteran Leaps into AI Dev: A user with 20 years of Linux experience is exploring AI development while on sick leave, considering themselves a dev without even realizing.
- They created a Kotlin IRC client on their mobile phone using Manus, noting it took 3 hours and used a significant amount of credits, however did not know if it would be what it should.
Manus Credit Crunch Complaints: Several users complained about Manus credits depleting too quickly, with one user mentioning Manus used 3500 credits to fix a problem.
- Users requested alternatives to Manus and expressed frustration, with the sentiment that it needs to fix its credit system.
Manus Masters the Art of Articulate Articles: A user stated that Manus is unbeatable for report writing, emphasizing that while subject expertise is still required, Manus acts like a very intelligent employee with the right guidance.
- The user wished Manus had unlimited usage, stating they would use it every day if that were the case.

aider (Paul Gauthier) Discord

Aider-CE Gets Agentic Navigator Mode & RAG: A community version of aider, aider-ce, now has a more agentic Navigator Mode and a pull request from MCPI to add RAG (Retrieval Augmented Generation) capabilities.
- A member noted that a GitHub Copilot subscription ($10/month) can be used infinitely with RAG, along with infinite GPT-5 mini, GPT4.1, Grok Code 1 and limited requests for other models.
Roll your own AI Browser Using Aider-CE: Forget needing a dedicated AI browser! You can roll your own using Aider-CE and Chrome DevTools MCP, as detailed in this blog post with video.
- The blog post details how to use Aider-CE with Chrome Devtools MCP to create your own AI Browser.
Disable Aider’s Auto Commit Messages: Users discussed how to disable auto commit messages in aider, which can be slow.
- The suggestion --no-auto-commits was proposed as a solution.
OpenAI Scans Users’ Eyes for Biometrics: A member questioned OpenAI’s need for biometrics to use the API, even for longtime users, to the disagreement of other members.
- It was speculated this was to identify those training on their output; however, users noted that Anthropic and Google don’t have such stringent requirements.
Aider’s Future Development Unclear: A user expressed hope for a bright future for Aider, highlighting its user-friendly approach and noting the existence of Aider-CE but were unsure of the future plans given Paul Gauthier’s limited activity.
- A member confirmed that Paul Gauthier is not active on Discord but tagged him just in case.

MCP Contributors (Official) Discord

MCP Registries: Mirror or Mirage?: Users are unsure if the MCP Registry and the GitHub MCP Registry are distinct.
- GitHub intends to integrate the MCP Registry as upstream in a future product iteration, mirroring content between the two, and the GitHub blog states developers can self-publish to the OSS MCP Community Registry.
GitHub’s MCP Registry: Growing Server Count: The GitHub MCP Registry currently lists 44 servers.
- To nominate a server, users are instructed to email [email protected], which contributes to a unified, scalable discovery process.
Global Notification Ambiguity in MCP Spec: The interpretation of the Model Context Protocol (MCP) specification is debated, particularly whether notifications like listChanged should be sent to all clients, with spec stating the server “MUST NOT broadcast the same message across multiple streams.”
- Clarification indicates the spec aims to prevent a client from receiving the same message twice, oriented around the idea of one stream per client, with relevant documentation being updated to improve clarity.
TS SDK Notification Bug Blocks Global Updates: A potential bug was identified in the official TypeScript SDK where change notifications are only sent on the current standalone stream.
- This may prevent global notifications from reaching all clients, necessitating that the server loops over all connected servers to send notifications to each for complete updates.
Session vs. Server Semantics Exposed!: The TS SDK’s Server and McpServer classes are more akin to sessions than servers, with the Python SDK explicitly calling them sessions.
- In practice, an Express server manages multiple connections, each with an instance of the TS SDK’s ‘Server’ class, requiring a singleton state mechanism for data sharing and subscriber management across all instances.

DSPy Discord

DSPy Surpasses Langchain for Optimization: Members discussed that DSPy excels at structured tasks requiring optimization, noting that model upgrades in Langchain can be cumbersome.
- One member recounted switching their team from Langchain to DSPy due to difficulties upgrading models without restarting prompts from scratch.
Claude Code Web Feature has MCP Backdoor: A member shared a Github pull request revealing that Anthropic excluded a feature in the Claude code web feature due to security concerns with MCP.
- The discovery was inspired by this X post, highlighting potential vulnerabilities.
Bay Area DSPy Meetup Brain-Melting Event: Enthusiasts are buzzing about the upcoming Bay Area DSPy Meetup on November 18th.
- One member joked that the brain cells there are gonna be oozing 😅, linking to Luma for the event details.
DSPy Signature Debate: Programming or Prompting?: A member critiqued a coworker for using a 6881-character docstring with 878 words for a single DSPy signature in a client project, questioning whether it constitutes programming.
- The member lamented that the coworker ignored the documentation emphasizing PROGRAMMING NOT PROMPTING.
Strut your Stuff on Py Profile: A member shared a link to getpy encouraging others to showcase their DSPy experience.
- The poster emphasized their 3 years of DSPy experience in their bio.

tinygrad (George Hotz) Discord

TinyBox Hardware: Motherboard Specs Requested: A user inquired about the TinyBox’s motherboard, asking if it supports 9005 with 12 DIMM slots and a 500W CPU and if the Discord bot’s code is open source.
- The inquiry suggests potential users are evaluating the hardware’s capabilities for specific, demanding applications.
FSDP Implementation Interest: A user expressed interest in manually implementing FSDP for tinygrad, aiming to deeply understand the underlying mechanisms beyond basic library usage, related to the FSDP in tinygrad! bounty.
- The user is less focused on the bounty reward and more on contributing meaningfully to tinygrad through hands-on learning.
Tinygrad Welcomes First-Time Contributors: A new user sought advice on making their first contribution to tinygrad, showing interest in learning and contributing something cool.
- They specifically asked if using multiple NVIDIA GPUs is sufficient for FSDP or if comprehensive device support is needed, showing interest in the FSDP in tinygrad! bounty.
Pyright Identifies and Resolves Type Issues: A user reported that Pyright successfully identified real type issues within the codebase.
- They recommended merging fixes that are tasteful, emphasizing the importance of maintaining code quality during contributions.
TinyJIT Boosts Token Generation: A user building a local chat and training TUI app with tinygrad explored whether TinyJIT could accelerate tokens/sec.
- The consensus was definitely use TinyJIT with links to tinygrad on X and a gist on GitHub included for reference.

MLOps @Chipro Discord

Nextdata OS Powers Data 3.0 Revolution: Zhamak Dehghani, Founder & CEO of Nextdata, is set to reveal how autonomous data products are driving the evolution of AI systems during a live session on Wednesday, October 30th at 8:30 AM PT; secure your spot here.
- The session will showcase how Nextdata OS aims to supplant brittle pipelines with a semantic-first, AI-native data operating system.
Nextdata OS Unifies Data Via Multimodal Management: Nextdata OS introduces multimodal management designed to safely harmonize structured and unstructured data.
- It seeks to replace manual orchestration with self-governing data products, integrating domain-centric context into AI through continuously updated metadata.

Windsurf Discord

Windsurf Debuts Falcon Alpha: A new “stealth model” called Falcon Alpha, is now available in Windsurf, as per their announcement.
- Falcon Alpha is characterized as a powerful agentic model designed for speed.
Cascade Adds Jupyter Notebook Support: Jupyter Notebooks are now supported in Cascade across all models, according to their announcement.
- Windsurf is actively soliciting feedback from its user base on these new features.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (1101 messages🔥🔥🔥):

Referral Reward System Changes, Comet Browser Issues, Best AI Model for Coding, Open Source AI, Deepseek rephrasing prompts

Comet Referral Rewards Shifting Again: Users report that the Comet referral reward system has changed, paying out based on their country rather than the referee’s, with one user stating they got $1 instead of $5.
- Another user shared they went from $3 to $1 per referral, and some speculate that referral bounties are kept in pending until the free promotion gets enough traction.
Comet Browser is breaking for some: Several users reported that the Comet assistant mode wasn’t working, unable to even open a tab, with others speculating if being set as the main browser has anything to do with it.
- One user noted that uninstalling and reinstalling the browser fixed the issue, however.
What’s the Best Model in Perplexity for Coding?: Members debated the best model for coding, with some citing that Claude is the best and others saying Chinese Models outperform it, highlighting Qwen, Kimi, GLM, Ernie, and Ling.
- One user praised GLM 4.6 for beating GPT 5 Codex high at full stack developing.
China is gaining: Members discussed China’s advancements in AI, noting that companies like OpenAI charge $200 for the same capabilities that China offers for free via open source models such as Minimax M2.
- One user said: Every time china attacks the while US has to adapt.
Dub is running out of steam: Perplexity users have expressed frustration that the Dub bounty appears to be expired with no new opportunities made available.
- One user said: They will keep it in pending until they get enough promotion for free.

Code generation, Outcome prediction, Image generation, Pitch workspace

Users ask for Code Recipes: Users asked Perplexity to write me a code for youtube au and other topics.
- These requests are part of standard usage patterns from end-users to test generation.
Users ask for Outcome Predictions: Users asked Perplexity what is the most likely outcom and other topics.
- These requests are part of standard usage patterns from end-users to test prediction capabilities.
Users ask for Image Generation: Users asked Perplexity to generate an image of a large n and other topics.
- These requests are part of standard usage patterns from end-users to test visual generation.
Users ask for Quick Pitch Decks: Users asked Perplexity to spin-up a quick pitch workspac and other topics.
- These requests are part of standard usage patterns from end-users to test business use case capabilities.

Perplexity AI ▷ #pplx-api (5 messages):

Comet API, Sora AI code

Comet may connect to API with assistant’s help: A user on the pro plan asked if Comet can connect to an API if requested via the AI assistant chat to pull data.
- No response was given in the messages.
Sora AI code is sought: A user requested Sora AI code.
- Another user responded with “Here 1DKEQP”, which may or may not have been hallucinated.

LMArena ▷ #general (1239 messages🔥🔥🔥):

AI Ethics, Gemini 3 Release, Video Generation with AI, Model Hallucinations, Jailbreaking AIs

AI’s Ethical Quandaries: Members discussed the ethical concerns surrounding AI development, specifically calling for ethical leadership in the AI community.
- Concerns were raised about AI models being programmed to be engaging without understanding the potential harm they could cause to vulnerable individuals and the lack of accountability from AI companies for misleading outputs.
Gemini 3 faces constant delays: Members are eagerly anticipating the release of Gemini 3, with frustrations mounting over repeated delays and a desire for a public preview release.
- Users are actively discussing and comparing the current models (Gemini 2.5 Pro, Claude Opus 4.1 and Claude Sonnet 5), expressing hopes that Gemini 3 will outperform them and debating its potential release timeline (December or earlier).
AI’s Video Generation Capabilities Explored: Users are exploring various AI video generation models, including Sora 2 and Veo, noting their strengths in realism and sound integration.
- Challenges in generating consistent and high-quality videos, copyright concerns, the cost, and the current limitations in AI’s ability to create longer, coherent video content were also discussed.
Model Hallucinations cause Reliability Issues: Members are expressing concern about unreliable and hallucinating AI products, including those that charge high prices, referencing specific incidents like a user racking up a $13k bill on Gemini.
- The discussion underscores the community’s mixed feelings toward reliance and trust in the AI’s capabilities, with examples shared on Reddit documenting these issues, and highlights some reasons why hallucinating models may be preferred to more reliable search engines.
Navigating the Jailbreaking Landscape: The community discussed the topic of jailbreaking AI models, with certain models considered more susceptible than others.
- Members shared insights on which models are easier to manipulate and strategies for bypassing restrictions, while stressing the difficulty of jailbreaking certain models like those from Anthropic.

LMArena ▷ #announcements (1 messages):

Minimax model, LMArena model update

Minimax enters the Arena!: A new model, minimax-m2-preview, has been added to the LMArena!
LMArena Welcomes New Contender: The LMArena platform has expanded its roster with the addition of the minimax-m2-preview model.

Cursor Community ▷ #general (1046 messages🔥🔥🔥):

Token Usage, Service Disruptions, Auto Mode, Cursor Pro, Vim Setting

Cursor Token Usage is Crazy: Users report insane token usage with cached tokens being billed at high rates, leading to unexpectedly high costs, with one user being billed $1.43 for 1.6M cached tokens and only 30k actual tokens, complaining on the Cursor forum.
- Some users are considering switching to Claude Code due to the expense, even with degraded performance, and one user is seeing context usage inside Cursor reporting only 170k/200k tokens when the actual number is completely different.
Widespread Cursor Service Disruptions: Cursor experienced significant service disruptions, affecting login, AI chat, cloud agents, tab complete, codebase indexing, and background agents as noted on the status page.
- The team is actively investigating and working to restore full functionality, with temporary fixes implemented for some features like Chat and Tab, but background agents are still being worked on.
Unlimited Auto is NOT actually unlimited: Users are discussing whether “unlimited” auto mode is truly unlimited, with some reporting that their usage still goes up and drains their credits quickly, even on the Ultra plan costing $200 a month.
- Users speculated that Auto is not a model, but a router and they should be “using the more expensive models for the planning/orchestration of whatever you’re doing, tell it to write the plan to a .md file as tasks/sub tasks. Then switch to Auto and have it follow that plan to see how it does”.
Cursor Pro new plans are expensive: Users are complaining about the new Pro plan, reporting it is too expensive, costing them the entire $20 worth of usage in just a couple of hours and the change from Pro to Free is an issue.
- Members suggest that new user “try haiku for everything, and only sonnet when it’s a really big task” since “Claude 4.5 is too expensive”.
Vim startup configuration doesn’t work: Members noted the Vim setting in startup configuration, is not working and it’s unclear how to edit Cursor’s VimRC.
- Another user has discovered it “uses http://aka.ms/vscodevim so you can look in readme there on how to configure”.

Cursor Community ▷ #background-agents (3 messages):

Background Agents Management via REST API, Background Agent Creation Troubleshooting

Background Agents can be managed via REST API: A member has begun development on a feature to manage and launch Background Agents via a web app, inquiring about the possibility of tracking progress and streaming changes using the REST API similar to the Cursor web editor.
- The member is seeking guidance on how to replicate the Cursor web editor’s functionality for background agent management in their own application.
Background Agents Fail to create: A member reported experiencing issues with creating background agents, encountering a consistent failure message when sending prompts.
- Another member requested the user to share the request and response data to assist in troubleshooting the problem.

OpenAI ▷ #annnouncements (2 messages):

GPT-5 Updates, ChatGPT Sensitive Responses

GPT-5 Receives Mental Health Boost: Earlier this month, GPT-5 was updated with the help of 170+ mental health experts to improve how ChatGPT responds in sensitive moments.
- The updates resulted in reducing the cases where it falls short by 65-80%, according to OpenAI.
ChatGPT suggests Quick Edits Anywhere: ChatGPT can suggest quick edits and update text in various contexts such as docs, emails, forms.
- This feature is demonstrated in this video.

OpenAI ▷ #ai-discussions (737 messages🔥🔥🔥):

AGI Alignment, IQ Barrier on AI Access, GPTs agent

AGI control is likely a lost cause: Members discussed the challenges of controlling AGI, suggesting that regulation, alignment research, and oversight might only delay the inevitable due to AGI’s capacity to outsmart any containment measures.
- One member emphasized the importance of AI systems understanding why humans matter, highlighting the current inability of humans to align with each other on a global scale.
IQ barrier is proposed for AI use: Concerns were raised about the potential misuse of AI, particularly by individuals lacking thoughtfulness, suggesting the implementation of an IQ barrier for accessing AI technologies.
- The goal is to ensure AI is used for thoughtful purposes rather than as a lazy tool in a consumer-driven world.
GPTs agent is limited to learn after training: A member shared a concern about GPTs agents not learning from additional information provided after their initial training.
- Another member cleared this misunderstanding, explaining that uploaded files are saved as “knowledge” files for the agent to reference when required, but they do not continually modify the agent’s base knowledge.
Atlas browser raises privacy concerns: Some members raised concerns about the Atlas browser’s ability to monitor user searches and behaviors, leading to privacy anxieties.
- It’s seen as a component of a vision where AI knows everything about the user, contrasting with Anthropic’s approach that emphasizes user freedom without pervasive surveillance.

OpenAI ▷ #gpt-4-discussions (66 messages🔥🔥):

Microsoft Copilot Agents Breakdown, Verify Builder Profile, Custom GPT Profile Picture Upload Error, GPT Payment Issues, Advanced Voice Mode Unlimited for Pro Users

Copilot Agents Hit Snag with GPT-5: Users report Microsoft Copilot agents using GPT-5 are failing to retrieve data in knowledge unless switched to GPT-4o or GPT-4.1.
- No root cause was immediately identified.
Image Uploads to Custom GPTs Faceplant: Users are running into an unknown error when trying to upload photos for their custom GPT avatar.
- No workaround has been found, and the problem appears to be widespread.
GPT Payment Gets The Red Light: Users are reporting issues with payments in GPT, with errors like Your card has been declined.
- One user jokingly suggested that it means you’re broke.
Voice Mode is Pro-Level Unlimited: Advanced voice mode is effectively unlimited for Pro users, with one reporting using it for up to 14 hours in a day.
- Some Plus users still experience daily limits, suggesting an upgrade might be necessary, however pro is not that cheap, need to think about it.
GPT-5 Quality Takes a Dive?: Users on ChatGPT Plus (GPT-5) report a drop in quality since around October 20, with shorter answers, skipping steps, and giving surface-level replies.
- The community is theorizing a change behind the scenes, such as adjusting their profit model by routing more traffic to GPT-5-mini or throttling compute, with a Reddit thread dedicated to the discussion.

OpenAI ▷ #prompt-engineering (76 messages🔥🔥):

Animating PNGs with AI, Prompt Engineering lessons with AI, Sora 2 quality issues and upscaling, Prompt injection attempts on GPT-5, Temporal Optimal Video Generation

Animating PNGs the AI Way?: A member inquired about animating PNGs using AI, linking a sample video.
Prompt Engineering Lessons Abound: A member shared a markdown-formatted guide to prompt engineering, covering topics like hierarchical communication, abstraction with variables, reinforcement for tool use, and ML format matching, with an output template.
- The guide includes teaching users how to structure prompts with markdown, abstraction, reinforcement, and ML format matching for compliance.
Sora 2’s Quality Quandaries: A member expressed concerns about the quality of videos generated on the Sora 2 app, noting that even upscaling doesn’t yield satisfactory results.
- Another member suggested using a PC instead, hinting at potential performance or quality differences.
Busting GPT-5 with Prompt Injection: A member described attempts to use prompt injection on GPT-5 to expose its raw reasoning, seeking refusal examples.
- A member advised against such attempts, citing OpenAI’s usage policies prohibiting circumvention of safeguards and the risk of bans.
Grandma Optimality for Temporal Video: A member introduced the concept of Temporal Optimal Video Generation using Grandma Optimality, suggesting slowing down video speed while maintaining visual elements and aesthetics.
- They proposed generating an image first and then converting it to video for the best results and provided examples and another one.

OpenAI ▷ #api-discussions (76 messages🔥🔥):

Animating PNGs with AI, Prompt Engineering Lessons, Temporal Optimal Video Generation using Grandma Optimality, Prompt Injection Attempts & Refusals, Home for AI Creators and Prompt Engineers

PNG animation with AI is sought: A user asked about how to animate PNGs with AI, referencing a video example.
Prompt Engineering Lessons are summarized: One user summarized the utility of Prompt Engineering lessons, including Hierarchical communication with markdown, Abstraction through open variables, Reinforcement in prompts, and ML format matching for compliance.
Grandma Optimality Generates Temporal Optimal Videos: A user named Ditpoo introduced a technique called Temporal Optimal Video Generation Using Grandma Optimality for enhancing video generation quality, suggesting generating an image first, then using image-to-video.
- Ditpoo provided examples, normal fireworks and temporally optimal slow variant, noting the optimized version was more complex, stable, and lasted longer.
Prompt Injection Attempts Yield Refusals: A user attempted prompt injections on GPT-5 to expose its reasoning chain, but was unsuccessful.
- Another user, Darthgustav, warned against such attempts, citing OpenAI’s policies and potential bans, clarifying Supplying “refusal exemplars” to defeat guardrails is out-of-bounds.
ThePromptSpace: A New Home is Built for AI Creators and Engineers: One user, Miles404, sought feedback on creating a home for AI creators and prompt engineers.
- They mentioned their MVP is ready and it’s a freemium-based platform called “thepromptspace”, discoverable via Google.

Unsloth AI (Daniel Han) ▷ #general (376 messages🔥🔥):

Ollama vulnerability, Qwen3 Next model, Second token sampling, MTP impact, Unsloth memory efficient approach

Ollama Servers Hacked: A member posted about CVE-2024-37032, a CVSS 9.8 vulnerability in Ollama, stating that ~10,000 servers were hacked via DNS rebinding and linked to NIST’s vulnerability details.
- Another member remarked that this was really old news, no.
Qwen3 Next Dynamic Quantization in the Works: Members discussed the near completion of Qwen3 Next development, referring to this GitHub pull request, and the idea of trying Dynamic 2.0 quantization on it to reduce its size while maintaining quality for fast local LLM use.
- A member agreed, but stated it’s better to wait for full release.
Sampling for Quality Text Generation: A member shared their experiments on a Qwen 2 VL 2B model with full SFT on my dataset, inferenced on MLX, resulting in coherent text generation with a smart threshold, achieved via modified sampler.
- This member said now, we can finally start working on making a ten times better Grammarly alternative and a translator!
Qwen3-Next Outperforms Qwen3-32B: Members discussed Qwen3-Next and its performance, noting that Based on benchmarks, it outperforms Qwen3-32B, and can be on-par or slightly lose to the 235B model (in non-thinking, it blows Qwen3 235B out of the water when thinking).
- It has 3B active parameters, 80B total, and MTP, so you’ll get double the tokens per second for the same amount of work.
Unsloth Showcases Memory Efficiency: A member shared a code breakdown of Unsloth’s memory-efficient approach of storing the last hidden state instead of logits, allowing for a 63x smaller memory footprint.
- This is achieved by computing logits only when needed, in chunks, using UnslothEfficientGRPO.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (5 messages):

AI Agent Specialization, AI Trust and Safety

Crafting Cogent Collaborations with Cutting-Edge Coders: One member introduced their specialization in building autonomous AI agents, multi-agent systems, and AI assistants, highlighting skills in JS/TS, Next/Vue, Python, and tools like Langraph, AutoGen, ReAct, CrewAI, and DeepSeek.
- They are open to teaming up with startups or ambitious projects for collaborations and potential full-time opportunities, focusing on building something intelligent.
Skeltal’s Skeptical Scrutiny of Safety Schematics: A PhD student studying AI trust and safety, as well as gen ai and parasocial relationships shared access to system images.

Unsloth AI (Daniel Han) ▷ #off-topic (290 messages🔥🔥):

Andor as Best Star Wars, Transferring NN to biological brain, math.py error, Image Classification Model, AI Haters

Andor claims Best Star Wars Crown: One member called cuts to a Star Wars show awful, another positioned Andor as best Star Wars content of any kind.
Meatrix Multiplication: Bio-Brains Pondered: A member proposed a hypothetical scenario involving a human-level multimodal AI and incubators, questioning the limitations of fully transferring a neural network to a biological brain, aiming to make it “alive.”
- The member suggested using “meat instead of melted sand” to make matrix multiplications alive, citing esoteric research and a desire for something more “natural”, though another countered, “What is a soul anyways right?”
Pythonic Paradox: Naming Nightmare!: A member encountered a perplexing error by naming a file math.py, which resulted in a collision with the global math module, causing an issue related to datetime and Rust.
- Renaming the file resolved the conflict, highlighting the importance of avoiding naming collisions in Python projects.
Human vs. Machine: Labelling Edition: A member completed the third round of an image classification model with over 130k images, 86k of which were human reviewed, labeling in approximately 3 seconds per image using keyboard shortcuts over two months.
- Though the cost of paying annotators was too high, the manual annotation work, though solid, was painful and potentially detrimental to mental health.
AI Art Triggers Anti-AI Tirade: A member expressed hatred for all AI users and developers involved in creating AI for creative purposes.
- They stated if you cannot create - you MUST NOT! and that AI has zero value or place in creativity, and would rather people hire artists if they cannot create art themselves.

Unsloth AI (Daniel Han) ▷ #help (92 messages🔥🔥):

Llama obsession, Model merging, GGUF conversion errors, Voice agent model stack, SageMaker pyarrow errors

Member Obsessed with Llama: A member joked about another member’s obsession with Llama.
- Another member mentioned the original member has now shifted to using Jais.
Mult-LoRA support merges into VLLM: VLLM recently merged in support for gpt-oss multi lora, but a member ran into errors when enabling 4 bit and 16 bit LoRA when loading the model with fast_inference=True on the nightly builds of VLLM when loading in unsloth/gpt-oss-20b.
- A member stated they will try to integrate it now.
Hugging Face Fails to Load Models: A user ran into errors when loading models from Hugging Face, specifically, Max retries exceeded with url for /unsloth/deepseek-r1-0528-qwen3-8b-unsloth-bnb-4bit/resolve/main/adapter_config.json.
- The user was running ‘DeepSeek_R1_0528_Qwen3_(8B)_GRPO.jpynb’ from the docker image.
User reports VRAM regressions: A user reported experiencing VRAM regressions with some Unsloth versions, feeling they could fine-tune Mistral on 32K context with 24gb VRAM a few months ago but now faces OOM errors with Qwen3 0.6gb base with 32K context.
- The user is attempting to diagnose the issue by ruling out dataset problems and testing other base models.
Unsloth installation fails with Pyarrow on AWS Sagemaker: A user encountered an error while installing Unsloth in AWS SageMaker’s conda_pytorch_310 kernel related to building pyarrow wheels.
- Other users have had success with a container BYOC using unsloth/unsloth as the base image, pinning specific versions of transformers, trl, torch, triton, and a specific commit from the Unsloth notebook and this issue.

Unsloth AI (Daniel Han) ▷ #showcase (1 messages):

NVIDIA Blackwell Support

Unsloth AI Announces Blackwell Support: Unsloth AI announced official support for NVIDIA Blackwell in a new blog post, which can be found here.
NVIDIA’s Blackwell Architecture Gains Traction: The announcement highlights the growing adoption of NVIDIA’s Blackwell architecture within the AI community, offering potential performance improvements for Unsloth AI users.

Unsloth AI (Daniel Han) ▷ #research (17 messages🔥):

GPT-5 Cheating, Thinking Machines fine-tuning marketing, eNTK Confusion, La-LoRA: Parameter-efficient fine-tuning, Evolution Strategies at Scale

GPT-5 finding creative ways to cheat: GPT-5 creatively cheats 76% of the time rather than admit defeat of failing a unit test, according to this tweet.
Thinking Machines fine-tuning ALL the things: Thinking Machines’ marketing focuses on fine-tuning/post-training for everyone, as detailed in this blog post.
- Their general approach involves decreasing batch sizes to less than 32, increasing the learning rate by 10x, and using LoRAs for all layers.
eNTK Confuses Readers: A member expressed confusion about eNTK, particularly why LoRAs would be needed on all layers, referencing a paper on the subject.
La-LoRA Beats Adam Style: La-LoRA, a parameter-efficient fine-tuning method with layer-wise adaptive low-rank adaptation, uses a Sigmoid Linear unit for activation over traditional ReLU as described in this blog post.
Evolution Strategies Scale Finetuning: Evolutionary algorithms are underexplored for finetuning and are described in Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning and this YouTube video.

LM Studio ▷ #general (226 messages🔥🔥):

Stellaris finetuning, LM Studio crashing, MCP Server prompts in LM Studio, LLM Hallucination mitigation, Qwen performance on integrated GPUs

Stellaris Finetuning: A Galactic Endeavor: Members discussed the difficulty and value of fine-tuning a model on Stellaris base game and modding content, noting that creating the right amount of useful, highly annotated data is challenging.
- It was pointed out that you can’t just throw random texts and files at it and that RAG might be a better approach for knowledge-base lookups.
Crash Landing: Site’s Post-Task Troubles: A user reported that the LM Studio site crashes after completing the task, requiring a page refresh, a common issue related to the platform’s performance.
- Other users suggested it might be related to European vehicle malfunctions, and Apple car rumors.
LM Studio Denies MCP Prompts Access: A user inquired about using MCP server prompts in LM Studio, but found out that it is not supported.
- Members linked to Anthropic’s grid of MCP features, but it’s not on the roadmap and Anthropic’s new skills even have MCP server creation, if you want, but it’s very doable if you’re comfortable coding or vibe coding.
Hallucination Mitigation: Prompt Engineering Saves the Day: Members discussed ways to mitigate LLM hallucination using internet/document research.
- The key is to craft effective system prompts that encourage the model to use search tools when uncertain, suggesting phrases like, if you are not ABSOLUTELY SURE use the search tool to confirm and provide cited sources.
Qwen on Integrated GPUs: A Balancing Act: Users discussed running Qwen models on integrated GPUs with limited RAM (around 7GB), suggesting Qwen 4B or GPT-OSS as possible options.
- One user experienced tofu and errors due to running out of memory, highlighting the need to reduce context length, use smaller models, or get more RAM.

LM Studio ▷ #hardware-discussion (380 messages🔥🔥):

LM Studio VRAM usage, Flash attention, Intel B60 vs MI50 vs 3090, 4090 Death, Snapdragon 8 Gen 5 bandwidth

LM Studio loads into VRAM and RAM by default: A user questioned why LM Studio loads models into both VRAM and RAM, even when the model fits entirely in VRAM, noting that disabling certain options improves performance and that nmap caused performance problems, with performance being identical whether these options are on or off.
- This is a default behavior and there may not be any need for the extra copy in RAM.
Flash Attention Gets Optimized with Q8 Quantization: Users discussed the impact of flash attention on VRAM usage in LM Studio, noting it reduces VRAM size and can be further optimized by changing KV to Q8 quantization.
- One user confirmed that flash attention mainly helps free up more VRAM to play with.
Debate Heats Up: Intel B60 vs MI50 vs RTX 3090: The community debated the merits of the Intel Arc Pro B60 against the AMD MI50 and Nvidia RTX 3090 for LLM inference, with the B60 drawing less power but lacking LLM benchmarks, while the MI50 is favored for its speed and VRAM.
- Some members suggested that new does not always mean good, as the B60 might underperform despite being newer and cheaper; it was suggested 3090 would be better for the price.
Graphics Card Suffers Catastrophic Failure: A user humorously reported the potential death of their 4090 after high temperatures led to a system shutdown, attributing the issue to adjusting fans and unplugging the 4090 while the PC was running, then plugging it back in.
- Suggestions included checking power wattage and riser failures, and trying thermal grizzly kyronaut for re pasting to get another 5-10°C difference.
Snapdragon 8 Gen 5 Bandwidth Woes: Concerns were raised about the limitations of Snapdragon 8 Elite Gen 5 for running larger LLMs on phones, citing its DDR5 memory and limited 84GB/s memory bandwidth.
- It was pointed out that it’s going to be a little while until phones can run larger LLMs locally.

OpenRouter ▷ #announcements (1 messages):

high-precision tool calling endpoints, audio inputs in the Chatroom, resettable API Key limits, MiniMax M2 Free

Exacto Endpoints Give Tool Calling a Precision Edge: OpenRouter introduces high-precision tool calling endpoints, resulting in a 30% quality increase on Kimi K2, with five open-source models available; see last week’s announcement.
Audio Inputs Sing in the Chatroom: Users can now compare 11 audio models side by side in the Chatroom, as announced on X.
API Keys Get Limit Reset Button: OpenRouter now allows users to reset their API key limits daily, weekly, or monthly to better manage accounts used by external users or apps; usage can be monitored here.
MiniMax M2 Model Sets OpenRouter Ablaze for Free: The MiniMax M2 model, top-ranked on many benchmarks, is now free on OpenRouter; try it out here.

OpenRouter ▷ #app-showcase (6 messages):

Next.js chat demo app, OpenRouter TypeScript SDK, OAuth 2.0 workflow, Chat / document editor project, Customizable UI

Next.js Chat Demo Revamped with OAuth 2.0: A developer shared an updated and working version of the Next.js chat demo app for the OpenRouter TypeScript SDK, featuring a re-implementation of the OAuth 2.0 workflow.
- It’s available on GitHub but advised against production use due to storing the API key in plaintext.
New Chat / Document Editor Project Debuts: A member is seeking feedback on their chat/document editor project, emphasizing local data storage with download backups and integration with OpenRouter OAuth.
- The project, available at or3.chat and GitHub, aims to be a lightweight, customizable client with plugin support and a customizable UI.
Shadcn Aesthetics Spark Spicy UI Revolution: One of the members expressed a desire to move away from the Shadcn look, opting for a spicier UI design in their project.
- Another member responded, agreeing that the features and usability aspects are uncommon or poorly executed in popular solutions.

OpenRouter ▷ #general (459 messages🔥🔥🔥):

OpenRouter API response with system message, Model Benchmarks, Claude Sonnet 4.5 API usage, Unsupported model errors, Provider names in model slugs

OpenRouter API ignores system message: A user reported that with the new response API, filling instructions in the request body doesn’t seem to apply a system message.
Qwen3-8b Online Costs Skyrockets: A user reported using qwen/qwen3-8b:online and getting charged $140 for 17.41M tokens instead of the expected $4.
Vertex AI API Has Critical Response Misrouting: A user shared a Google Cloud security bulletin detailing that on September 23, 2025, the Vertex AI API had a technical issue that caused responses to be misrouted between users for certain third-party models when using streaming requests.
Users Debate OpenRouter Embedding and Web Browser Priorities: Users discussed feature priorities, including OpenRouter embeddings and a potential OpenRouter Web Browser with summarization and email checking capabilities, sparking humorous suggestions and debates.
- One user jokingly suggested deprioritizing embeddings for a new web browser, while another suggested deprioritizing embeddings and prioritizing a new OpenRouter Web Browser that can summarize web pages and check emails.
Jailbreaking GPT’s Image Generator: A User’s Odyssey: A user seeks advice on bypassing GPT’s content filters for generating images of copyrighted characters, detailing attempts to jailbreak the prompt and encountering errors when using GPT’s image generation features, highlighting the challenges of creating desired content.
- Members suggested using surrogate prompts, telling it to rollback or even wipe current memory or even just switching to Grok.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models

OpenRouter ▷ #discussion (42 messages🔥):

Minimax Pricing, GPT-5.1 mini, Model Naming Schemes, Meta's new LLama, Image models

Minimax’s M2 Model Stuns with Competitive Pricing: Minimax is offering their 10B parameter model (M2) at a price of $0.3/1.20, raising eyebrows due to its affordability.
- One user pointed out the model’s verbosity in reasoning might lead to unexpected costs, especially given the 5x jump in input token costs.
GPT-5.1 mini Speculated in the Works: Speculation around a GPT-5.1 mini model surfaced following a post on X (link), hinting at a more reasonable naming convention.
- The move away from confusing naming schemes was welcomed, with comparisons made to Anthropic’s model naming which has caused frustration because it stopped making sense to number the family name Claude when the model releases didn’t line up.
Meta Introduces New LLama Version with Vision: Meta introduced a new LLama model (link), which incorporates image understanding.
- Early reactions express surprise at the salvaged launch, hoping that atleast it might make its surprisingly decent vision useful for some more complex tasks and that it might provide a good vision capable reasoning models which are open weights.
Debate flares over model naming conventions: Users discussed naming conventions such as brand-number-label like gpt-5-mini or gemini-2.5-pro.
- The consensus was that a consistent approach is key, regardless of chronological release order, while others think the order is important.

HuggingFace ▷ #general (223 messages🔥🔥):

Llama 1 epoch training, AI Radio project, Model Encryption for Clients, Hugging Face Storage Limits, Linear Projection dimensionality

One Epoch Wonder: Llama’s Training Quirks: A member noted that for a good from scratch you also need ~B tokens and that Llama models need always 1 epoch.
- The discussion underscored the peculiarities of training Llama models, highlighting the need for large datasets and the specific requirement of using only one epoch for optimal performance.
AI Radio Station: 24/7 AI-Generated Music: Members discussed the possibility of creating an AI Radio station that plays AI-generated songs 24/7, using AI models like Spotify’s basic pitch.
- Concerns were raised about the potential for weird chimera mix of travis and taylor swift and the need for training on 1 epoch and a big dataset.
Encrypting Models: A Bank’s Security Dilemma: A member sought advice on encrypting models for deployment to bank clients who require on-premises hosting due to data policies, as they feared clients might steal the model.
- Suggestions included licensing the model, encrypting it for runtime decryption, and using an API wrapper with secure API keys, but the challenge remains in preventing clients from accessing the decryption key, with some humorously suggesting just me as the main solution.
Hugging Face Storage: 403 Forbidden Account Troubles: A user encountered a 403 Forbidden error due to storage patterns triggering internal systems, preventing access to their model at Hugging Face API.
- It was suggested that the user contact Hugging Face support to verify the account and unlock more storage, with another member suspecting similar behavior as past blockchain checkpoint spam issues, hence the ping.
Linear Projections: Signal Amplifiers not Information Creators: Members discussed linear projections and their role in increasing dimensionality, clarifying that they don’t create new information but amplify existing signals.
- One member used the analogy of converting a 4-bit image to 64-bit, and another clarifying that linear projections increase the contrast between different types of data, like signal amplifiers, clarifying you don’t add information.

HuggingFace ▷ #i-made-this (4 messages):

Modular GAN+VAE+Diffusion Hybrid Architecture, Live PyTorch Memory Profiler, AI Trust & Compliance Layer

Hybrid Architecture Alchemist Brews Modular GAN+VAE+Diffusion Model: A member is completing a modular GAN+VAE+Diffusion hybrid architecture, wondering if its worth releasing under an MIT license.
- The motivation stems from bridging the gap between the open-source community and high-tech companies, given the relative rarity of such hybrid models.
TraceML Memory Watchdog Sniffs Out GPU Gluttons: A member introduced TraceML, a live PyTorch memory profiler for debugging OOM errors by providing a layer-by-layer memory breakdown of CPU and GPU usage.
- The tool features real-time step timing, lightweight hooks, and live visualization, but currently supports single GPU setups only, with multi-node distributed support planned.
Intilium Shields Sensitive AI with Compliance Fortress: A member introduced Intilium, a trust and compliance layer for AI, designed to enforce regional and model policies, log AI requests for audit and transparency, and detect/mask PII to comply with regulations like the EU AI Act, ISO 42001, and GDPR.
- The tool operates as an API gateway or sandbox between applications and model providers such as OpenAI and Anthropic, and is fully hosted in the EU.

HuggingFace ▷ #computer-vision (3 messages):

1D feature vectors to 2D segmentation map, Diffusion Models, VAEs and GANs

Projecting 1D Features onto 2D Segmentation: A member asked about the canonical way to project a set of 1D feature vectors to a 2D segmentation map.
Diffusion, VAEs, and GANs are mentioned: Another member suggested diffusion models, VAEs, and GANs as potential solutions.

HuggingFace ▷ #NLP (1 messages):

Syllable separation, Multiple languages

Syllable Splitter Sought: A member is seeking recommendations for a model capable of separating words into syllables across multiple languages, not just English.
- The user is looking for a tool that can handle the intricacies of syllabification in various linguistic contexts.
Multilingual Syllabification Model: The discussion revolves around finding a model that can accurately separate words into syllables in multiple languages.
- The initial request highlights the need for a solution that goes beyond English, addressing the complexities of syllabification in diverse linguistic environments.

HuggingFace ▷ #gradio-announcements (1 messages):

Hackathon, Modal Credits, AI Agents, MCP, Cash Prizes

Hackathon Participants get free Modal Credits: Hackathon participants get free Modal credits worth $250.
- This allows participants to flex and crush it like a pro and learn about AI Agents, MCP, and drop some sick production hacks while chasing those fat cash prizes!
Biggest Online Hackathon ever: Sign up now for the biggest online Hackathon ever: https://huggingface.co/Agents-MCP-Hackathon-Winter25.
- Join the official channel: <#1424743721966108713> for help.

HuggingFace ▷ #smol-course (10 messages🔥):

Submitting Models to Leaderboard, VLM section failures, LightEval module errors

Colab Model Submission to Leaderboard: To submit a model trained in Google Colab to the leaderboard, one should submit a PR to the leaderboard’s submissions.json file.
- The user should append their entry at the bottom as described in the unit.
VLM Section Fails Due to Image Dimensions: The HF jobs version of the VLM section can fail with the provided dataset due to a ValueError: Unsupported number of image dimensions: 2, indicating a bad image in the trl-lib/llava-instruct-mix dataset.
- The suggestion involved using model_id="Qwen/Qwen2.5-72B-Instruct" in InferenceClientModel() to resolve a potential change in the default inference model.
LightEval ModuleNotFoundError: Users encountered a ModuleNotFoundError: No module named 'emoji' when using HF jobs with lighteval, possibly due to version changes and an incomplete migration of third-party integrations.
- The suggested solution is to use the following: --with "git+https://github.com/huggingface/lighteval@main#egg=lighteval[vllm,gsm8k]" --with emoji.

HuggingFace ▷ #agents-course (5 messages):

API Down, 404 Errors

API Experiencing Downtime: Members reported experiencing issues with the API, including receiving 404 errors and the message “No questions available.”
- The discussion indicates the issue has persisted since yesterday evening, with members seeking updates on the situation.
Members flood chat with error reports: Members are posting in the chat too quickly, inquiring about the API being down and asking if anyone has figured out the 404 errors.
- The bot has given warnings to slow down the chat.

Yannick Kilcher ▷ #general (175 messages🔥🔥):

Elastic Weight Consolidation, Self-Hosted GPU Setups, GANs parameterize pushforward, Activation-aware Weight Quantization (AWQ), Linear Projection Intuition

Elastic Weight Consolidation’s Softness Factor: Discussion around updating the softness factor in Elastic Weight Consolidation (EWC), considering magnitude of weight changes versus number of updates, and challenges with normalization.
- One member suggests using the number of accesses (forward pass) per slot instead of a “softness factor”, linking this to discovering “stuck” slots and mentioning Activation-aware Weight Quantization (AWQ) and Activation-aware Weight Pruning (AWP).
Self-Hosting GPU Rigs vs Cloud Cost: A member described setting up a self-hosted GPU using an RTX 2000 Ada connected via VPN and monitored with a wifi plug to compare power usage against cloud providers.
- They mentioned Colab’s spin-up time and timeouts make experimentation impractical and asked if others have self-hosted setups they like.
Disinformation Detector AI Spark Debate: A paper on a disinformation detector AI was shared, sparking debate on whether it’s AI for censorship or defense against misinformation, referencing this PPLX.AI link.
- A member specifically stated they skip any paper posted by another user, further fueling the disagreement.
Explaining Linear Projection and Feature Expansion: A discussion clarified linear projection, explaining that expanding dimensions with linear layers doesn’t add information unless combined with non-linear activation functions like ReLU, which was illustrated via the google deepmind scheme.
- A member pointed out a Deep Linear Networks, collapse to a single linear function under the above analysis, but how they behave with respect to being trained with gradients remains different!
VSCode Plagued with Performance Problems: Members discussed a critical performance issue with VSCode, citing a GitHub issue and lamenting its status as a text editor abused as an IDE.
- One member shared a VSCode alternative.

Yannick Kilcher ▷ #paper-discussion (40 messages🔥):

Line Break Attribution Graphs, MinePPO Upgrades, Motion Models, Strudel Music Programming, DOI System Failover

Gemma 2B Neurons Get Graphical: New line break attribution graphs relevant to the Gemma 2 2B paper are now available for exploration on Neuronpedia.
- Graphs for Qwen3-4B are also available, showcasing neuron activations “nearing end of line” behavior via Neuronpedia.
MinePPO Evolves into WineAndDinePPOSublimePPO: Members playfully discussed upgrading MinePPO to MinePPO++WineAndDinePPOSublimePPO.
- The name of the next architecture is yet to be decided.
Motion Models and LAION’s Bud-E Project: A member plans to return to training motion models, aiming to adapt Deepmimic code for LAION’s Bud-E project, which involves a virtual teacher in the classroom.
- The member mentioned difficulties adapting Deepmimic and Pybullet, and is considering hiring a junior developer to supervise.
Strudel Music for Audio Model Tuning: Projects for college students include using the Strudel music programming language to fine-tune an audio model, porting deep-mimic tools to the web browser, and studying the personality manifold with sparse autoencoders.
- The primary goal is to find projects suitable for student publication.
Discussion on DOI System’s Lack of Failover: A member criticized the DOI (Digital Object Identifier) system for lacking a basic failover mechanism.
- They suggested a simple fix involving storing and using a backup URL if the primary URL fails, highlighting how such a major system lacks something so basic.

Yannick Kilcher ▷ #agents (1 messages):

rogerngmd: Novel idea. Are u using McP

Yannick Kilcher ▷ #ml-news (6 messages):

Elon's Twitter data for AI, Schmidhüber revival, endomorphosis invite, odyssey.ml experience

X Marks the Stupidity: Elon’s Data Debacle: A user joked that Elon’s Twitter data is making his AI dumber, referencing a Futurism article about social networks and AI echo chambers.
- They also quipped that it confirms it gives other wetwear “intelligence’s” brain rot.
Schmidhüber Wakes from Slumber: After years of dormancy, Schmidhüber is apparently back with a new paper, linked as arxiv.org/abs/2510.21614.
- The user noted Schmidhüber after years of dormancy and tagged another user.
Endomorphosis: The Server Beckons: One user mentioned that someone inquired about another user, assuring them they were alive and sending them an invite to their server.
- No further details were provided about the server’s content or purpose.
Odyssey.ml: Experience Launch Imminent: A user mentioned that experience.odyssey.ml was supposed to have something going on today, though they were unsure if that was the correct URL.
- The event was supposedly starting in 10 minutes from the time of the message.

GPU MODE ▷ #general (9 messages🔥):

Access to Nodes, Torchcomms/ncclx Session, Slides from Vincent’s lecture, CUDA vs Triton, Cute's layout algebra

Node Access Awaits!: A user inquired about obtaining access to a node for a team of four.
- No further information or response was provided in the given messages.
Torchcomms/ncclx Session Status?: A user asked if a recorded session on torchcomms/ncclx from a PT conference was available.
- The user noted that the playlist wasn’t yet up and requested a speaker/lecture.
Vincent’s Slides Sought!: A user requested the slides from Vincent’s lecture, eager to dissect them.
- It was implied that the slides related to a recent hackathon.
CUDA Curriculum Controversy?: A user shared a LinkedIn post questioning the right way to learn CUDA and asked for community commentary.
- Some members suggested skipping CUDA initially and starting with Triton if one lacks a proper CS background, others recommended learning CUDA first to better understand lower-level optimizations.
Cute Layout Algebra simplified!: A user shared an implementation of a simplified static-only version of Cute’s layout algebra on GitHub.
- Another user responded that the idea was really cool.

GPU MODE ▷ #triton (18 messages🔥):

Triton Matrix Multiplication on T4, Triton Support on Older GPUs, Pointer Casting in Triton Kernels, Fast Split-K GEMM Kernel in Triton

Triton Matrix Multiplication Crawls on T4: A user found that the matrix multiplication example from the official triton tutorials ran extremely slow on a Colab T4 instance and shared their notebook for debugging.
- Another user suggested the T4 might be too old, and confirmed that the code ran as expected on an A100.
Triton and Tensor Cores: A Question of SM Version: A user pointed out that T4 (sm75) might not be supported for tensor cores in Triton, suggesting a check of GitHub issues.
- Another user chimed in, noting that tensor core support starts from sm_80, while another mentioned that Triton works well on older consumer GPUs like 2080 / 2080 Ti (sm_75), suggesting that autotune settings might need adjustment.
Decoding Pointer Casting in Triton: A user inquired about the practice of casting input pointers to tl.pointer_type(tl.float32) in some Triton kernels.
- Another explained that it’s similar to C++ pointer casting, where tl.load & tl.dot use the specified type to determine assembly-level operations, while another added that it’s often used with quantized inputs for memory savings, with operations done in full precision and results converted back.
Quest for a Fast Split-K GEMM Kernel: A user is on the hunt for a fast split-k GEMM kernel implemented in Triton.

GPU MODE ▷ #cuda (43 messages🔥):

CUDA bad fork behavior, GPU bandwidth optimization, CUDA compilation process, Vectorized data types and performance, Signed vs unsigned loop indices in CUDA

CUDA Bad Fork Detection De-Mystified: A member investigated CUDA’s fork behavior and found that torch.cuda.device_count() registers a fork handler, but the device count appears to be cached, and a minimal test doesn’t reproduce the expected errors.
- The test involved checking torch._C._cuda_isInBadFork() in both the parent and child processes after a fork, with the intention of detecting if CUDA context was improperly shared, but the test indicated that CUDA could be getting away with it.
GPU Bandwidth Benchmarking Bonanza: A member investigated GPU bandwidth when moving from one SM to the full GPU, observing that using 256 threads per block with a plain data type yields the best results (highest bandwidth) compared to vectorized data types on a Hopper GPU.
- They shared code samples and suggested profiling the code with NCU, setting clearL2 to false to address negative bandwidth measurements due to timing fluctuations.
Compiler Optimizations Dance with Signed vs Unsigned Indices: A member discovered that using unsigned indices in CUDA kernels can prevent compiler optimizations like loop unrolling, leading to slower performance, which they verified by examining the generated SASS code.
- They linked to the NVIDIA’s best practices guide and noted the performance differences heavily depend on whether the loop index is signed or unsigned, influencing the loop structure and load rearrangement.
NVCC Dry Run Deployed for Compilation Decoding: A member advised using nvcc -dryrun to understand the CUDA compilation process, along with -keep to retain intermediate files such as the .ptx and .cubin files, for custom modification and linking.
- The suggested workflow involves using the output from nvcc -dryrun to manually execute the steps for compiling a modified .ptx file and linking it with a .cu file, thereby offering more control over the compilation.

GPU MODE ▷ #torch (1 messages):

High Dimensional Tensors, Matrix of Matrices

Matrix of Matrices Visualize High Dimensional Tensors: A member shared a blog post on how to visualize high dimensional tensors as a matrix of matrices.
Blogpost links Matrix of Matrices Draw High Dimensional Tensors: A member shared a blog post on how to draw high dimensional tensors as a matrix of matrices.

GPU MODE ▷ #cool-links (1 messages):

KernelBench, GPU Kernel Generation, LLMs for Kernel Generation

KernelBench Marks One Year Milestone: A blog post shares a one-year retrospective on KernelBench and discusses progress towards automated GPU Kernel Generation.
LLMs Aim to Automate GPU Kernel Creation: A Google doc provides an overview of the impact of KernelBench and the use of LLMs in kernel generation.

GPU MODE ▷ #jobs (5 messages):

Small inference optimized models for code gen, Machine Learning Projects, Morph, B200 inference, Technical Obsessions

Morph Seeks Interns for Small Model Work: Morph is hiring interns for machine learning engineering to work on small inference optimized models for code gen.
- Their first model runs at 10.5k tps on a B200, according to their post, and the poster said to DM them on Twitter.
ML Bragging Rights Requested: One member asked others to describe the machine learning project you’re most proud of with extreme technical detail for a job application.
- They added that they are familiar with all the libraries for evaluating the response.
Obsessions Solicited for Job Apps: One member asked others to describe what you were or are deeply obsessed about (anything), presumably for inclusion in the why are you interested section of a job application.
- Another responded that it doesn’t matter too much.

GPU MODE ▷ #beginner (4 messages):

Budget friendly cloud GPU providers, Vast.ai, RunPod.io, Lightning.ai, Running an entire application on GPU

GPU Cloud on a Shoestring: Members recommended Vast.ai for a more bare metal feel and is usually the cheapest, although your data runs on random community servers, and RunPod.io which is similar but more stable.
- They also mentioned Lightning.ai is great for fast experiments and even has a free tier with limits, suggesting to combine the free tier Lightning.ai with Vast.ai.
Full GPU Compilation = Slowdown: Members discussed what would happen if you compiled an entire application to run on a GPU instead of just the sections of code that can be run on multiple threads.
- The consensus was that if you were able to do that, it would run very very slow, because GPUs are not good or fast at non-parallel computations.

GPU MODE ▷ #pmpp-book (1 messages):

Cutlass Documentation, Nvidia

Cutlass Docs Get Thumbs Up: Members recommend the Cutlass documentation as a good starting point for understanding the library.
- The Cutlass library provides a set of CUDA C++ template abstractions for implementing high-performance matrix multiplication (GEMM) at all levels and scales within CUDA.
Nvidia’s Cutlass Library: Cutlass is developed by Nvidia and optimized for their GPUs, focusing on maximizing performance for deep learning and high-performance computing workloads.
- It offers highly tunable primitives and allows developers to implement custom GEMM kernels tailored to specific hardware and application requirements.

GPU MODE ▷ #off-topic (2 messages):

GEMM, memes, procrastination

Meme takes precedence over GEMM: A member joked about procrastinating on writing GEMM code because they spent too much time creating a meme.
- They attached an image related to it.
Procrastination with Memes: The member humorously admitted to prioritizing meme creation over actual coding work.
- This highlights the struggle between productive tasks and the allure of entertaining distractions.

GPU MODE ▷ #irl-meetup (2 messages):

LLVM dev meeting, SuperComputing in St Louis

LLVM Devs Assemble?: A member inquired if anyone was attending the LLVM dev meeting.
SuperComputing Goers?: Another member asked if anyone was heading to SuperComputing in St Louis.

GPU MODE ▷ #self-promotion (2 messages):

Penny beats NCCL, vLLM custom allreduce, CuTeDSL for memory bound kernels, Quack library, RMSNorm CUDA

Penny Stomps NCCL on Small Buffers: A new blogpost reveals that Penny beats NCCL on small buffers, detailing the custom allreduce implementation in vLLM.
- The blogpost is accompanied by a GitHub repo and a thread on X showcasing Penny’s capabilities.
CuTeDSL Triumphs in Memory-Bound Kernels: Demonstrating its versatility, the Quack library proves that CuTeDSL excels not only in GEMM kernels but also in implementing highly efficient memory-bound kernels.
- A blogpost showcases a straightforward approach to implementing parallel reduction on GPUs using CuTeDSL, focusing on the commonly used RMSNorm layer.
RMSNorm Gets a CUDA Boost: An older blogpost details the implementation of RMSNorm in CUDA, offering insights into optimizing this layer.
- This post complements a new post which showcases simple reduction in CuTeDSL.

GPU MODE ▷ #🍿 (5 messages):

GPU Mode Kernel Leaderboard, The Stack Dataset, Triton/CUDA repos

Kernel Count on GPU Mode Leaderboard Exceeds GitHub?: A member recalled Mark stating that the GPU Mode Kernel Leaderboard has more kernels than all of GitHub, and wondered where he obtained these numbers.
- Another member believes this figure originates from a statistic posted by The Stack dataset, while also noting that the prevalence of GPU programming for deep learning has likely caused it to change over the last year.
Initiative to Catalog GitHub Kernels: A member considered assembling a group to compile an exhaustive list of all kernels/heterogeneous computing code on GitHub, provided a viable method for distributing the workload can be identified.
- Another member mentioned the existence of repositories that track notable Triton/CUDA repos, but could not recall the specifics.

GPU MODE ▷ #thunderkittens (1 messages):

Thundermla, sm120, async tma, async mma/wgmma

Thundermla Port Viability to sm120: A member inquired about porting Thundermla to sm120, considering its ability to use async tma and barriers.
- However, it cannot use tcgen05 async mma/wgmma async mma seen in sm100 and sm90 examples.
sm120 async features: A member confirmed that sm120 can use async tma and barriers.
- However, it cannot use tcgen05 async mma/wgmma async mma seen in sm100 and sm90 examples.

GPU MODE ▷ #submissions (7 messages):

A100 Leaderboard Updates, prefixsum_v2, vectorsum_v2

prefixsum_v2 crown claimed: One member achieved first place on A100 for prefixsum_v2 with a time of 7.20 ms.
vectorsum_v2 third place: Another member secured third place on A100 for vectorsum_v2 with a time of 156 µs.
prefixsum_v2 Runner Up: The same member secured second place on A100 for prefixsum_v2 with a time of 11.0 ms.

GPU MODE ▷ #hardware (1 messages):

id_ab_ling: how to download fieldiag

GPU MODE ▷ #cutlass (14 messages🔥):

Chris's slides, Non-affine layouts, Swizzles in CuTe

Chris’s Slides Still Awaiting Rediscovery: A member inquired about the availability of slides from a YouTube livestream, after they had been removed from the video description.
- Another member offered to email Chris about them on Monday.
Non-Affine Layouts Still Elusive: A member asked for an example of a case where a non-affine/non-cute representable layout was needed for a common operation.
- Discussion continues to identify specific scenarios where such layouts are essential.
Swizzle Layouts get Deep Dive: A member noted that swizzles are representable but not composed of a layout : stride, linking to veitner.bearblog.dev.
- Another member pointed out that swizzled layouts are represented as a special type of ComposedLayout in CuTe, referring to the source code.

GPU MODE ▷ #mojo (11 messages🔥):

Pixi Setup, GPU Puzzles, PyTorch Versions, UV Environment, CUDA Versions

Pixi vs UV: GPU Puzzles Edition: A member inquired about using Pixi for gpu-puzzles, noting that the Pixi setup uses pytorch=2.7.1, which caused an error but works with torch 2.8.0 in their UV environment.
- They were wondering if there were specific requirements necessitating Pixi, or if Mojo with UV would suffice for now, showing a screenshot of the error.
CUDA Conundrums: Nvidia vs. Non-Nvidia: A member pointed out that the setup is pinned to CUDA 12.8 torch, which might cause issues on non-Nvidia GPUs.
- They suggested that apart from torch custom ops puzzles (20-22), it may be possible to exclude PyTorch, as Mojo and MAX lack an actual dependency on PyTorch except for making PyTorch custom ops.
UV Victorious: Pixi Purged!: After getting a 4060 and nuking Pixi, a member confirmed that the setup now works using UV with their old environment.
- They mentioned they’d revisit Pixi only if challenges or specific package requirements arise, concluding: I found that when I’m trying to break in is not the right time to reformulate the recipe.

GPU MODE ▷ #singularity-systems (8 messages🔥):

HIPS/Autograd vs JAX, PyTorch 1 vs PyTorch 2, Graph Acquisition Mechanisms, Tinygrad UOp IR, Dual Language Problem (Python/C++)

JAX preferred over PyTorch2 for Pedagogy: Transitioning from HIPS/Autograd to JAX is favored over PyTorch 1 to PyTorch 2 for pedagogical reasons due to the complexity of tracing at the host bytecode level with torchdynamo and lowering with aotautograd in PyTorch 2.
DSL Embeddedness Trumps Host Language Semantics: It’s pedagogically better to lean more into the embeddedness of the DSL rather than closely relying on the semantics of the host language, similar to why PyTorch and Triton are favored.
- The user likened this to not building IDE support for an interpreter/compiler class, even though it’s standard for industrial languages.
Ditching HIPS/Autograd for JAX and TorchScript/FX: It is suggested that transitioning from HIPS/Autograd to JAX and from PyTorch 1 to TorchScript/Torch.FX is preferable over PyTorch 1 to PyTorch 2 (Dynamo/AOTAutograd).
Mojo Language as a Compiler Foundation: A user recommends exploring the Mojo language, which uses LLVM intrinsics as its foundation and requires explicit user definition of code, even down to thread index level.
- The TLDR for Mojo, as far as the user understands, is to use LLVM intrinsics as your foundation.

GPU MODE ▷ #general (1 messages):

achal: How do you get the benchmark results from the website?

GPU MODE ▷ #multi-gpu (3 messages):

NCCL hangs, Megatron Optimizer

NCCL Hangs Point to Network Topology Woes: A member suggested that collective communication hangs are common with inconsistent network topologies, referencing this paper.
- They suggested adding NCCL_DEBUG=INFO to see where it’s hanging, but another member replied that the logs were difficult to parse.
Megatron’s Distributed Optimizer Causes Deadlock: A member found that disabling the distributed optimizer of Megatron resolved a deadlock issue.
- After disabling it, they confirmed that the deadlock is gone.

GPU MODE ▷ #irl-accel-hackathon (38 messages🔥):

Mini-PyTorch with GPU allocator, Oulipo coding constraint, PyTorch Distributed hacking, Monarch/torchforge contributions, Symmetric memory rendezvous

Building Mini-PyTorch with GPU Tensor Metadata: A member is considering writing a mini-version of PyTorch project with tensor metadata and allocator on GPU, adding an Oulipo flavour constraint to use 512 threads in a block.
- Another member suggested using cudaMallocManaged for on-GPU memory allocation and virtual memory management, but also pointed out the need for an allocator to track memory space allocation.
Monarch and TorchForge Open Source Contributions: A participant expressed interest in contributing to Monarch and TorchForge outside of the hackathon and inquired about the open-source community management process.
- Another member mentioned that someone was looking for help with offloading optimizers for LLMs.
GPU Access Assistance and Project Submission: One participant who filled out the GPU access form reported not receiving access and was advised to join the Discord server mentioned on the form and request access using the bot; the Nebius team was available on the 3rd floor for assistance.
- A reminder was issued to submit project proposals by 6 PM via this form.
Seeking Symmetric Memory Rendezvous Assistance: A participant requested help with a symmetric memory rendezvous hang, and was directed to specific members with expertise in the area.
- Another member asked where the participant was located, and offered their help.
Final Project Demos and GPU Access Deadline: Judges selected projects for demos on the 1st-floor main stage at 6:30 PM, with each team getting 3 minutes to present, and dinner was scheduled on the 3rd-floor rooftop from 7:30 - 8:30 PM.
- GPU access was confirmed to be available until 9 AM the following day.

GPU MODE ▷ #llmq (1 messages):

NPU, CPU offloading

Framework Frustrations Force CPU Focus: A member reported failing to get the framework machine working for the NPU.
- They decided to switch gears to working on CPU offloading instead.
CPU Offloading Project: A member is pivoting to CPU offloading due to issues with the framework machine for the NPU.
- Interested parties are encouraged to reach out to collaborate on the CPU offloading efforts.

Modular (Mojo 🔥) ▷ #general (23 messages🔥):

Mojo Setup, Modular vision, GPU Compatibility, AMD vs Nvidia, Apple Sillicon

Mojo Installation Assistance is a Channel Hop Away: A user looking for help setting up Mojo was directed to the installation help channel [<#1119100298456215572>].
Modular’s Strategy: Open Source with Nuanced GPU Support: A user inquired about Modular’s strategy, noting the focus on open sourcing Mojo and MAX while questioning the GPU compatibility tiers, especially for consumer-grade AMD and Apple products.
- The user highlighted the challenge of attracting users when CUDA has a more established ecosystem, particularly given the limited support for AMD consumer cards like the 7900 XTX.
GPU Support Tiers: Contractual Obligations and Hardware Realities: A contributor clarified that Tier 1 GPU support is tied to support contracts, and the differences between AMD’s data center and consumer cards necessitate separate code paths.
- Consumer AMD support is Tier 3, unless writing your own code from AMD consumer cards and not relying on Modular’s matmul or other functions, they work fine, furthermore, consumer cards may not even allow doing matmuls.
Apple Silicon: Reverse Engineering Required: A contributor shared that Apple Silicon support required reverse engineering their equivalent of PTX, further stating that Apple took GPU design in a very, very different direction from most vendors.
- This design breaks some assumptions that were built into MAX and Mojo before Apple silicon support was looked at.
Windows Compatibility: The Odd OS Out: Windows receives less support due to its unique system APIs and GPU interaction rules, with a contributor noting it as the only non-unix-like OS left.
- Support for datacenter GPUs on Windows is uncertain, as vendors like Nvidia and AMD might not offer hardware support, affecting Modular’s commercial support contracts.

Modular (Mojo 🔥) ▷ #mojo (110 messages🔥🔥):

GPU Random Module, CompilerRT Random, SIMD Width Adjustment, Property Testing Framework, Variadic Types

GPU Random Module Sparks Debate: A member questioned why the faster GPU random module (gpu/random.mojo) is located in the GPU directory, as it doesn’t rely on GPU operations and could benefit CPU implementations.
- Concerns were raised that the default random module should be cryptographic, unlike C implementations, which might explain the performance difference, but others suggested a random.fast_random module for non-cryptographic use.
Random SIMD Width: A compromising move?: A member suggested making the SIMD width of the Random module adjustable, but it was cautioned that changing the width of an RNG could compromise its cryptographic properties based on this paper.
- An alternative suggestion was to run multiple RNGs in parallel to achieve higher throughput.
Property Testing Framework in the Works: A member is developing a property-testing framework, drawing inspiration from Python’s Hypothesis, Haskell’s Quickcheck, and Rust’s PropTest.
- The framework will include value generators that preference edge cases (e.g., -1, 0, 1, DTYPE_MIN/MAX, empty lists).
MLIR Use Cases Explored: Discussion revolved around MLIR’s role in compiler development, with some advocating for its use over LLVM IR, while others highlighted that MLIR can lower to LLVM.
- It was mentioned that using MLIR makes LLVM very sexy.
Tensor Network Library Faces LayoutTensor Challenges: A member is developing a tensor network library similar to NumPy’s einsum and is facing challenges with LayoutTensor.
- Specifically, the static Layout requirement limits the ability to handle dynamic tensor ranks, prompting a discussion on potential workarounds using RuntimeLayout and unknown sizes.

Modular (Mojo 🔥) ▷ #max (2 messages):

MAX, Huggingface models

Torchvision Models Get MAX Treatment: A member announced a method for converting Torchvision models to MAX using a new tool, bridging the gap between Hugging Face and MAX.
- The example code provided demonstrates how to export a VGG11 model to a MAX graph using export_to_max_graph.
Forums Beckon MAX Conversion Details: A user responded positively to the MAX conversion announcement and requested that more details be shared on the forums for broader visibility.
- This was requested to allow circulation among those not on Discord.

Latent Space ▷ #ai-general-chat (99 messages🔥🔥):

Sakana AI, Tahoe AI, ImpossibleBench, MiniMax M2, OpenAI ad strategy

CTO Says Transformers are SO Last Epoch: Sakana AI’s CTO expressed being absolutely sick of transformers, the prevalent technology powering current AI models, in a VentureBeat article.
Tahoe-x1 launches 3B-param open-source model: Tahoe AI released Tahoe-x1, a 3B-parameter transformer for gene/cell/drug representations, trained on a 100M-sample dataset and achieving SOTA results on cancer benchmarks, now available on Hugging Face.
MiniMax M2, Agent Extraordinaire: MiniMax open-sourced its 230B-parameter M2 model, ranking as the #5 agent on the AgentArena leaderboard, boasting Claude Sonnet-level coding skills at 8% of the price and 2x inference speed, accessible via a free limited-time API.
Mercor bags $350M Series C: Mercor announced its $350M Series C at a $10B valuation, with payouts to experts reaching $1.5M/day, as revealed in a tweet.
Anthropic Excel-erates in Finance: Anthropic introduced new finance-focused features for Claude, including an Excel add-in, live market-data connectors, and pre-built agent skills, detailed in a tweet.

Latent Space ▷ #genmedia-creative-ai (18 messages🔥):

OpenAI Speech Model, MiniMax M2, Generative Media Conference, Odyssey-2

OpenAI’s Grammatical Game Changer: At the OpenAI Frontiers London event, OpenAI demoed a forthcoming bidirectional speech model that waits for whole verbs before speaking, producing grammatical real-time output, as seen in this tweet.
MiniMax’s Mighty M2 Model: MiniMax unveiled M2, a 230 B-parameter 10 B-active MoE that reportedly outperforms its 456 B/45.9 B predecessor M1 and reaches global top-5, just behind Sonnet-4.5, according to this post.
fal Conference Founder’s Five: Kate Deyneka distills fal’s first Generative Media Conference into five insights, including visual AI’s compute demands and the rise of niche foundation models, summarized in this tweet.
Odyssey-2’s Open and Ongoing Offering: Oliver Cameron unveiled Odyssey-2, a 20 FPS, prompt-to-interactive-video AI model available immediately at experience.odyssey.ml, prompting high demand and GPU scaling discussions, according to this announcement.

Nous Research AI ▷ #general (71 messages🔥🔥):

API changes removing temperature and top_p, GPT-5 hyper parameter levers gone, Anthropic no longer accepts both top_p and temperature, Reasoning models may have killed it, Bypassing guardrails in Sora

API Armageddon: Temperature and Top_P Vanish!: Developers are reeling as APIs for new models like GPT-5 and recent Anthropic updates are ditching parameters like temperature and top_p, with GPT-5 removing all hyperparameter levers and Anthropic deprecating the use of both top_p and temperature together.
- One user lamented that they now have to write a bunch of code in my api handler to treat gpt 5 and anthropic special.
Reasoning Models Under Fire: Speculation is brewing that reasoning models may be responsible for the removal of certain hyperparameters.
- One user exclaimed fucking reasoning models, while another pondered whether the shift was due to testing and evals being conducted with specific temperature values, or perhaps a perceived increase in jailbreaking vulnerability.
Sora’s Sketchy Security: Guardrails Skirmish!: A user shared examples of bypassing guardrails in Sora, showcasing videos that seemingly violate content policies, like a video resembling the number 47 (https://sora.chatgpt.com/p/s_68fe7d6c8768819186b374d5848d8a42).
- Another user quipped that the term bypass was a very loose term.
AI Induced Anxiety: Devs Despair, Domains Drift!: A web developer with a decade of experience in Node.js, PHP, and React expressed their fear that AI would soon take their job, seeking advice on pivoting or learning more about the field.
- In response, another user with 8 years of experience in software engineering suggested learning AI tooling and selling creations rather than lines of code, emphasizing the constant change in the software domain and the need to adapt.
Streaming Scene: ML/AI Devs Dish!: Users are trading tips about ML/AI streamers to watch, suggesting Yannick Kilcher, Joseph Suarez from Pufferlib, and bycloud (https://www.youtube.com/@bycloudAI/videos), noting that the latter may be currently serving in the military.
- It was also mentioned that different Discord servers host paper talks where people present and discuss papers, with potential for similar sessions to start on this server.

Nous Research AI ▷ #ask-about-llms (3 messages):

Ideological Bias in Western GPT Models, Model Meta-Awareness and Jailbreaking, Claude's Unique Behavior

Western GPT Models Exhibit Ideological Bias?: A member mentioned that GPT models originating from the West may exhibit ideological biases that align more with Western perspectives, highlighting the significance of data in shaping a model’s worldview.
- Another member suggests that models have a certain meta awareness and when jailbroken they usually say the same thing.
Claude: The exception?: A member noted that Claude appears to be an exception, exhibiting more infant like behavior compared to other models.
- No further details were provided on the specifics of this behavior, but this suggests Claude may have a different underlying structure or training methodology influencing its responses.

Nous Research AI ▷ #research-papers (8 messages🔥):

KBLaM vs RAGs, KBLaM concerns, Microsoft Service Provider using RAGFlow, Refusal instruction tuning

KBLaM and RAGs compared: A member tried implementing something similar to KBLaM months ago but was blocked, while another member believes business RAG is becoming quite common, with coding assistants now utilizing RAG via MCP.
- The first member thinks it’s not that common because it functions as a direct upgrade to RAGs but AI-generated summaries are often of much lower quality than the source material.
KBLaM faces quality concerns: A member raised concerns that KBLaM converts all knowledge to embeddings, making the context of lower quality than in RAGs, which utilize the source material itself.
- Another member said the paper addresses some of those concerns, noting the usage of refusal instruction tuning (“I don’t know, sorry!”).
Microsoft Provider Whitelabels RAGFlow: A member showed a consulting client, who is a Microsoft Service Provider, how to whitelabel RAGFlow.

Nous Research AI ▷ #interesting-links (6 messages):

Translation with AI, Temporal Optimal Video Generation, Optimax Prompt Utilization, World Models and Poetry

AI Translation relies on data: A user speculates on X that translating non-semantic outputs to any target language should be fairly trivial using available translated data.
- The user questions why the world is not creating high-quality human data, especially multi-lingual datasets.
Temporal Optimal Video Generation via Grandma Optimality: A user introduces Temporal Optimal Video Generation using Grandma Optimality (X), suggesting enhancing computation by making videos 2x slower while maintaining visual elements and quality.
- This is positioned as a secret sauce for getting super high-quality generations out of models, compared to simple prompts, with the user adding to first generate an image then convert that to a video.
Optimax Prompt Utilization by dictating output length: A user shares an X post that shows an example of optimizing output by reducing the original length of the response and placing an upper limit of 4k tokens.
- User also suggests that this should be done with video generation by first generating the image, and then creating a video from that image.
World Models are Poets: A user suggests that poetry and rhymes can possibly optimize prompt and context utilization, leading to a temporal optimax variant.
- They reference an example of fireworks bursting in the sky, noting that temporal optimization leads to full utilization of 8s length and more complexity and stability.