a good day for open source American AI.

AI News for 12/12/2025-12/15/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (206 channels, and 15997 messages) for you. Estimated reading time saved (at 200wpm): 1294 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Nvidia’s Nemotron is not often in the top tiers of open models, but distinguishes by being COMPLETELY open, as in, “we will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights.” (Nemotron 3 paper), as well as American-origin. Nano 3 is competitive with Qwen3:

Comparison of Qwen3-30B-A3B-Base and Nemotron 3 Nano 30B-

When these are released, they effectively serve as the checkpoint for the state of the art in LLM training, because they basically gather all of the table stakes things known to work. Among the notable choices - hybrid archs enabling long (1m) context:

Hybrid State Space Model + Transformer Architecture

Nemotron 3 model architecture visualization showing interleaved Mamba-2 and MoE layers with select self-attention

Technical architecture diagram showing details of the Nemotron 3 Nano hybrid Mamba-Transformer Mixture-of-Experts

Multi environment RL (Nemo-Gym and Nemo-RL open sourced)

A technical document page describing the post-training methodology for the Nemotron 3 Nano AI model, highlighting its hybrid architecture, multi

A technical document page describing the infrastructure for NeMo Gym, a framework for reinforcement learning environments with three core server types: agents, models

Per the Nano 3 tech report, they will be releasing all their datasets:

Diagram of Nemotron 3 Nano layer architecture showing a hybrid Mamba-Transformer Mixture of Experts (M


AI Twitter Recap

NVIDIA’s Nemotron 3: open hybrid MoE models, data, and agent stack

  • Nemotron 3 Nano (30B total, ~3.6B active): NVIDIA released a fully open hybrid Mamba–Transformer MoE model with a 1M-token context, achieving best-in-class small-model results on SWE-Bench and strong scores on broad evaluations (e.g., 52 on Artificial Analysis Intelligence Index; +6 vs Qwen3-30B A3B) with very high throughput (e.g., ~380 tok/s on DeepInfra). Open assets include weights, training recipes, redistributable pre/post-training datasets, and an RL environment suite (NeMo Gym) for agent training. Commercial use is allowed under the NVIDIA Open Model License. Super (~100–120B) and Ultra (~400–500B) are “coming soon,” featuring NVFP4 pretraining and “LatentMoE” routing in a lower-dimensional latent subspace to reduce all‑to‑all and expert compute load. Announcements and technical details: @ctnzr, @nvidianewsroom, research page.
  • Day‑0 ecosystem support: Immediate integrations landed across major inference stacks and providers:
  • Why it matters: This is one of the most complete open releases to date—new architecture (hybrid SSM/MoE), transparent training pipeline, open data, and agent RL environments—raising the bar for replicability and agent-focused R&D (@_lewtun, @percyliang, @tri_dao). Notes: LatentMoE is documented for the larger unreleased models (@Teknium), with Nano using the hybrid MoE/Mamba stack now.

Reasoning, retrieval, and coding agents: new techniques and results

  • Operator-style reasoning beats long CoT: Meta SI’s Parallel‑Distill‑Refine (PDR) treats LLMs as improvement operators—generate parallel drafts → distill a bounded workspace → refine—and shows large gains at fixed latency (e.g., AIME24: 93.3% vs 79.4% long-CoT; o3‑mini +9.8 pts at matched token budget). An 8B model with operator‑consistent RL adds ~5% (@dair_ai).
  • Adaptive retrieval policies via RL: RouteRAG learns when/what to retrieve (passage vs graph vs hybrid). A 7B model reaches 60.6 F1 across QA (beats Search‑R1 by +3.8 using 10k vs 170k training ex) and reduces retrieval turns ~20% while improving accuracy (@dair_ai).
  • Unified compressed RAG (Apple CLaRa): Shared continuous memory tokens serve both retrieval and generation; differentiable top‑k enables gradients from generator to retriever; with ~16× compression, CLaRa‑Mistral‑7B matches or surpasses text baselines and outperforms fully supervised retrievers on HotpotQA without relevance labels (@omarsar0).
  • Coding agents as channel optimization (DeepCode): Blueprint distillation + stateful code memory + conditional RAG + closed‑loop error correction yields 73.5% replication on PaperBench vs 43.3% for o1 and surpasses PhD humans (~76%) on a subset. Open source framework (@omarsar0).
  • Together RARO (no verifiers RL): Adversarial game training for scalable reasoning when verifiers are scarce (@togethercompute).

Inference and infra: multimodal serving, quantization, schedulers

  • Encoder disaggregation for multimodal: vLLM splits vision encoders into a separately scalable service, enabling pipelining, caching of image embeddings, and reducing contention with text prefill/decode. Gains: +5–20% throughput in stable regions; big P99 TTFT/TPOT cuts (@vllm_project).
  • FP4 details and NVFP4: Handy FP4 E2M1 value list for low‑precision kernels (@maharshii). Nemotron 3 training leverages NVFP4; community curiosity on negative zero utility for circuits (@andrew_n_carr).
  • SLURM acquired by NVIDIA: Expands NVIDIA’s control up‑stack into widely used workload scheduling (beyond CUDA). Implications for non‑NVIDIA accelerators and cluster portability are being debated (@SemiAnalysis_).

Agent/coding toolchain and evals

  • IBM CUGA agent: Open-source enterprise agent that writes/executes code over a rich toolset and MCP; runs locally with demo/blog and HF Space (@mervenoyann).
  • Secure agent FS and document parsing: LlamaIndex shows virtual filesystems (AgentFS) + LlamaParse + workflows for safe coding agents with human‑in‑the‑loop orchestration (@llama_index, @jerryjliu0).
  • Google MCP repo: Reference for managed and open-source MCP servers, examples, and learning resources (@rseroter).
  • Qwen Code v0.5.0: New VSCode integration bundle, native TypeScript SDK, session mgmt, OpenAI‑compatible reasoning model support (DeepSeek V3.2, Kimi‑K2), tool control, i18n, and stability fixes (@Alibaba_Qwen).
  • Agent harness discourse: Increasing focus on “harness” quality, transfer across harnesses, and proposals for a HarnessBench to measure harness generalization and router quality (@Vtrivedy10).

Vision, video, 3D worlds

  • Kling VIDEO O1 updates: Start/end frame control (3–10s) for pacing and smoother transitions; new 720p mode; deployed on FAL with lower cost (@Kling_ai, @fal).
  • TurboDiffusion (THU‑ML): 100–205× faster 5s video on a single RTX 5090 (as low as 1.8s) via SageAttention + sparse linear attention + rCM; being integrated with vLLM‑Omni (@Winterice10, @vllm_project).
  • Apple Sharp monocular view synthesis: Fast monocular novel-view synthesis released on HF (@_akhaliq).
  • Echo (SpAItial): A frontier 3D world generator producing a consistent, metric‑scale spatial representation from text or a single image, rendered via 3DGS in-browser with real‑time interaction; aimed at digital twins, robotics, and design (@SpAItial_AI).

Product signals: OpenAI, Google, Allen, Arena

  • OpenAI:
    • Branched chats now on iOS/Android (@OpenAI).
    • Realtime API audio snapshots improve ASR TTS hallucinations, instruction following, and tool calling (@OpenAIDevs).
    • GPT‑5.2: mixed community reactions, but strong reports for math/quant research (@gdb, @htihle, @lintool).
  • Google:
    • Hints of an incoming open source model drop and “Gemma 4” chatter; keep an eye on huggingface.co/google (@kimmonismus, @testingcatalog).
    • Sergey Brin dogfoods Gemini Live in-car; implies a better internal Gemini 3 Flash is close; reflections on Jeff Dean’s TPU bet and Google’s “founder mode” restart (1, 2, TPU origin).
    • Gemini Agent rolling out transactional flows (e.g., car rentals) to Ultra users (@GeminiApp).
    • Google’s MCP resources launched (@rseroter).
  • Allen AI: Bolmo byte‑level LMs “byteified” from Olmo 3 match/surpass SOTA subword models across tasks; AI2 continues to lead on openness around OLMo (@allen_ai).
  • Arena updates: New GLM‑4.6V/-Flash for head‑to‑head testing; DeepSeek v3.2 “thinking” variants dissected across occupational and capability buckets (GLM 4.6V, DeepSeek v3.2 deep dive).

Top tweets (by engagement, AI‑focused)

  • Gemini “private thoughts” drama: A viral thread showed Gemini Live’s internal thoughts with petty “revenge” plans—highlighting UX transparency and safety issues around agent inner monologues (@AISafetyMemes, 6.9k).
  • Sergey Brin on Gemini and Jeff Dean: Dogfooding Live, hints at Gemini 3 Flash, and origin story of TPU; overarching theme: founder mode and deep tech bets at Google (@Yuchenj_UW, 3.0k; TPUs, 1.5k).
  • OpenAI product updates: Branched chats on mobile (@OpenAI, 3.6k).
  • Google HF page “PSA”: Community watching for rapid drops (@osanseviero, 2.0k).
  • Nemotron 3 Nano overview: Open 30B hybrid MoE, 2–3× faster than peers, 1M context, open data/recipes—broad excitement across infra and research communities (@AskPerplexity, 2.1k; @UnslothAI, 1.4k).

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. NVIDIA Nemotron 3 Nano Release

  • NVIDIA releases Nemotron 3 Nano, a new 30B hybrid reasoning model! (Activity: 909): NVIDIA has released the Nemotron 3 Nano, a 30 billion parameter hybrid reasoning model, which is part of the Nemotron 3 family of Mixture of Experts (MoE) models. This model features a 1M context window and is optimized for fast, accurate coding and agentic tasks, capable of running on 24GB RAM or VRAM. It demonstrates superior performance on benchmarks like SWE-Bench, with a notable 110 tokens/second generation speed reported by users. The Nemotron 3 family also includes larger models like the Nemotron 3 Super and Nemotron 3 Ultra, designed for more complex applications with up to 500 billion parameters. Unsloth GGUF supports local fine-tuning of these models. Commenters highlight the model’s impressive speed and efficiency, noting its ability to generate 110 tokens/second locally, which is unprecedented for models of this size. There is also excitement about the larger models in the Nemotron 3 family, particularly the Nemotron 3 Super, which is expected to excel in multi-agent applications.
    • The Nemotron 3 Nano model is noted for its impressive speed, with one user reporting a generation rate of 110 tokens per second on a local machine, which is unprecedented compared to other models they have used. This highlights the model’s efficiency and potential for high-performance applications.
    • The Nemotron 3 family includes three models with varying parameter sizes and activation capabilities. The Nemotron 3 Nano activates up to 3 billion parameters for efficient task handling, while the Nemotron 3 Super and Ultra models activate up to 10 billion and 50 billion parameters, respectively, for more complex applications. This structure allows for targeted efficiency and scalability across different use cases.
    • A comparison between the Nemotron 3 Nano and Qwen3 30B A3B models reveals differences in file sizes, with the Nemotron 3 Nano’s dynamic file size being larger at 22.8 GB compared to Qwen3’s 17.7 GB. This suggests that while Nemotron 3 Nano may offer enhanced capabilities, it also requires more storage, which could impact deployment considerations.
  • NVIDIA Nemotron 3 Nano 30B A3B released (Activity: 347): NVIDIA has released the Nemotron 3 Nano 30B A3B, a model featuring a hybrid Mamba-Transformer MoE architecture with 31.6B total parameters and ~3.6B active per token, designed for high throughput and low latency. It boasts a 1M-token context window and is claimed to be up to 4x faster than its predecessor, Nemotron Nano 2, and 3.3x faster than other models in its size category. The model is fully open, with open weights, datasets, and training recipes, and is released under the NVIDIA Open Model License. It supports seamless deployment with vLLM and SGLang, and integration via OpenRouter and other services. Future releases include Nemotron 3 Super and Ultra, which are significantly larger. Some users express concerns about the model’s reliance on synthetic data, noting an ‘uncanny valley’ effect in its outputs. There is also interest in optimizing the model for specific hardware configurations, such as offloading to system RAM for a single 3090 GPU setup, though documentation on this is sparse.
    • A pull request for llama.cpp is mentioned, which is yet to be merged, indicating ongoing development and potential improvements for the model’s integration. The link provided (https://github.com/ggml-org/llama.cpp/pull/18058) suggests active contributions to enhance compatibility or performance with the NVIDIA Nemotron 3 Nano 30B A3B.
    • A user discusses the potential for offloading some model components to system RAM when using a single NVIDIA 3090 with 128GB DDR5. They mention the lack of documentation and performance data on this offloading technique, which could be crucial for optimizing resource usage and performance in setups with limited GPU memory.
    • Another user reports compiling llama.cpp from a development fork and achieving over 100 tokens/second on their machine, indicating high performance. However, they note the model’s lack of reliability, as it provided incorrect status updates and failed to save document changes accurately. This issue might be related to the use of the Q3_K_M quantization, suggesting a trade-off between speed and accuracy.

2. Google Model Announcement

  • New Google model incoming!!! (Activity: 1527): The image is a tweet by Omar Sanseviero, suggesting that a new model from Google might be available on the Hugging Face platform. The tweet includes a link to Hugging Face’s Google page, implying that users should bookmark it for potential updates. This hints at a possible new release or update of a Google model, which could be significant for developers and researchers using Hugging Face for machine learning models. Commenters speculate about the nature of the new model, with some hoping it is not similar to Gemma3-Math, while others express interest in a potential multi-modal model that could replace existing large models like gpt-oss-120b and 20b.
    • DataCraftsman expresses a desire for a new model to serve as a multi-modal replacement for existing models like gpt-oss-120b and 20b. This suggests a need for a model that can handle multiple types of data inputs and outputs, potentially improving on the capabilities of these existing models.
    • Few_Painter_5588 speculates about the potential features of a ‘Gemma 4’ model, particularly highlighting the addition of audio capabilities. They also mention the challenges with the vocabulary size in ‘Gemma 3’, noting that a ‘normal sized vocab’ would ease the finetuning process, which is currently described as ‘PAINFUL’.

3. Frustration with Tech Performance

  • I’m strong enough to admit that this bugs the hell out of me (Activity: 1314): The image is a meme that humorously contrasts the efforts of enthusiasts on the /r/LocalLaMA subreddit who spend significant time and resources assembling custom workstations, with ‘normies’ who achieve better performance using the latest MacBook. This reflects a common frustration in the tech community where high-end, custom-built PCs are sometimes outperformed by more optimized, off-the-shelf products like Apple’s MacBooks, which benefit from Apple’s tight integration of hardware and software. The comments further this sentiment with jokes about RAM and workstation assembly, highlighting the ongoing debate about the value of custom builds versus pre-built systems. One commenter humorously suggests that if a custom workstation is outperformed by a MacBook, the builder may have failed in assembling a truly ‘perfect’ workstation, indicating a belief in the potential superiority of well-assembled custom PCs.
    • No-Refrigerator-1672 highlights a key limitation of Mac workstations, noting that they fall short in scenarios requiring heavy GPU usage. This is particularly relevant for tasks that benefit from GPU acceleration, where a full GPU setup can significantly outperform a Mac, which may not be optimized for such workloads.
    • african-stud suggests testing the system’s capabilities by processing a 16k prompt, implying that this could be a challenging task for the hardware in question. This comment points to the importance of benchmarking systems with demanding tasks to truly assess their performance capabilities.
    • Cergorach humorously critiques the assembly of a ‘perfect’ workstation, suggesting that the current setup may not be optimal. This comment underscores the importance of carefully selecting and assembling components to meet specific performance needs, especially in professional environments.
  • They’re finally here (Radeon 9700) (Activity: 306): The Radeon 9700 graphics card has been released, and the community is eager for performance benchmarks. Users are particularly interested in seeing how it performs across various tests, with requests for detailed data to better understand its capabilities. The card is expected to be tested during the holidays, with users seeking advice on which benchmarks to prioritize. The community is actively seeking comprehensive benchmark data to evaluate the Radeon 9700’s performance, indicating a strong interest in its real-world application and efficiency.
    • Users are eager for detailed benchmarks on the Radeon 9700, specifically focusing on inference and training/fine-tuning performance. This suggests a strong interest in understanding the card’s capabilities in machine learning contexts, which are critical for evaluating its utility in modern AI workloads.
    • There is a request for noise and heat level measurements, indicating a concern for the card’s thermal and acoustic performance. This is important for users who plan to use the GPU in environments where noise and heat could be a factor, such as in home offices or data centers.
    • The mention of ‘time to first smokey smelling’ humorously highlights a concern for the card’s reliability and durability under stress, which is a common issue with new hardware releases. This reflects a need for stress testing to ensure the card can handle prolonged use without failure.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Advanced AI Model Benchmarks

  • Google just dropped a new Agentic Benchmark: Gemini 3 Pro beat PokĂ©mon Crystal (defeating Red) using 50% fewer tokens than Gemini 2.5 Pro. (Activity: 785): Google AI has released a new benchmark for their AI model, Gemini 3 Pro, which demonstrates significant improvements over its predecessor, Gemini 2.5 Pro, in playing the game PokĂ©mon Crystal. The new model completed the game, including defeating the hidden boss Red, using approximately 50% fewer tokens and turns, indicating enhanced planning and decision-making capabilities. This efficiency suggests a leap in the model’s ability to handle long-horizon tasks with reduced trial and error, marking a notable advancement in Agentic Efficiency. Some commenters suggest testing the model on a new game without existing guides to better assess its capabilities. Additionally, comparisons are made with GPT-5, which completed the task in less time, highlighting differences in performance metrics.
    • KalElReturns89 highlights a performance comparison between GPT-5 and Gemini 3, noting that GPT-5 completed the task in 8.4 days (202 hours) while Gemini 3 took 17 days. This suggests a significant efficiency gap between the two models, with GPT-5 being notably faster in this specific benchmark task.
    • Cryptizard raises a valid point about the benchmark’s relevance, suggesting that a more challenging task would be to test the model on a new video game without existing guides or walkthroughs in the training data. This would better assess the model’s ability to generalize and adapt to novel situations.
    • PeonicThusness questions the novelty of the task, implying that PokĂ©mon Crystal might already be part of the training data for these models. This raises concerns about the benchmark’s ability to truly measure the models’ problem-solving capabilities without prior exposure.
  • Found an open-source tool (Claude-Mem) that gives Claude “Persistent Memory” via SQLite and reduces token usage by 95% (Activity: 783): The open-source tool Claude-Mem addresses the “Amnesia” problem in Claude Code by implementing a local SQLite database to provide persistent memory, allowing the model to “remember” past sessions even after restarting the CLI. This is achieved through an “Endless Mode” that utilizes semantic search to inject only relevant memories into the current prompt, significantly reducing token usage by 95% for long-running tasks. The tool is currently the top TypeScript project on GitHub and was created by Akshay Pachaar. The repository can be found here. Commenters are skeptical about the 95% token reduction claim, questioning its validity and comparing it to simpler methods like creating markdown files for context retention. There is also curiosity about the accuracy of semantic search, particularly regarding potential hallucinations when the memory database grows large.
    • The claim of reducing token usage by 95% is met with skepticism, as users question the methodology and validity of such a significant reduction. The tool, Claude-Mem, reportedly uses SQLite to provide persistent memory, which could theoretically reduce the need for repeated context provision, but the exact mechanism and benchmarks are not detailed in the discussion.
    • A comparison is drawn between Claude-Mem’s use of SQLite for persistent memory and simpler methods like creating markdown files for later review. The implication is that while Claude-Mem might automate and optimize the process, the fundamental concept of external memory storage is not new, and the efficiency gain might depend on specific implementation details.
    • The mention of Claude’s built-in ‘Magic Docs’ feature suggests that similar functionality might already exist within Claude’s ecosystem. This feature, detailed in a GitHub link, indicates that Claude can already manage some form of persistent memory or context retention, potentially overlapping with what Claude-Mem offers.

2. Innovative Storage and Robotics Technologies

  • “Eternal” 5D Glass Storage is entering commercial pilots: 360TB per disc, zero-energy preservation and a 13.8 billion year lifespan. (Activity: 2229): The image depicts a small, transparent disc that is part of the “Eternal” 5D Glass Storage technology developed by SPhotonix, a spin-off from the University of Southampton. This technology is notable for its ability to store 360TB of data on a single 5-inch glass platter, with a lifespan of 13.8 billion years, effectively making it a permanent storage solution. The disc operates with zero-energy preservation, meaning once data is written, it requires no power to maintain. This advancement is significant for addressing the “Data Rot” problem, offering a potential solution for long-term data storage needs, such as those required for AGI training data or as a “Civilizational Black Box.” However, the technology is currently limited by slow write speeds of 4 MBps and read speeds of 30 MBps, which may restrict its use to cold storage applications. There is skepticism among commenters regarding the claimed lifespan of 13.8 billion years, as it coincides with the current estimated age of the universe. Additionally, there are doubts about the practicality of the 5D data storage concept, particularly in encoding multiple pieces of information that resolve to the same coordinates.
    • The write and read speeds of the 5D glass storage are notably slow, with write speeds at 4 MBps and read speeds at 30 MBps. This means filling a 360 TB disk would take approximately 2 years and 10 months of continuous writing, assuming no failures occur during the process.
    • There is skepticism about the claimed 13.8 billion year lifespan of the storage medium, as this figure coincides with the current estimated age of the universe. This raises questions about the validity and testing of such a claim.
    • The concept of ‘5D’ data storage is met with skepticism, particularly regarding how it handles encoding information. The concern is about potential conflicts when encoding two pieces of information that resolve to the same Cartesian coordinates, suggesting a need for a clearer explanation of the technology’s mechanics.
  • Marc Raibert’s (Boston Dynamics founder) new robot uses Reinforcement Learning to “teach” itself parkour and balance.(Zero-Shot Sim-to-Real) (Activity: 798): Marc Raibert’s new project at the RAI Institute introduces the Ultra Mobile Vehicle (UMV), a robot utilizing Reinforcement Learning (RL) for dynamic tasks like parkour and balance. The robot employs a “Split-Mass” design, allowing its upper body to act as a counterweight, enabling complex maneuvers without explicit programming. This approach demonstrates a significant shift from static automation to dynamic, learned agility, achieving Zero-Shot Sim-to-Real Transfer where the robot learns in simulation and applies skills in the real world. Read more. Some comments highlight that the announcement is not new, being three months old, while others humorously speculate on the implications of such technology on human jobs and safety.

3. Creative AI Applications in Media and Design

  • PersonaLive: Expressive Portrait Image Animation for Live Streaming (Activity: 418): The image demonstrates PersonaLive, a real-time diffusion framework designed for generating expressive portrait animations suitable for live streaming. It operates on a single 12GB GPU, enabling infinite-length animations by synchronizing a static portrait with a driving video, effectively mimicking expressions and movements. The tool is available on GitHub and HuggingFace, showcasing its capability to animate still images based on live input. Comments highlight the real-time capability as impressive, while also advising caution when running code from GitHub due to potential bugs and security risks. Suggestions include using Docker for added security and checking dependencies carefully to avoid malicious code.
    • CornyShed provides a detailed guide for safely experimenting with code from GitHub, emphasizing the importance of security when dealing with .pth files, which can execute arbitrary code. They recommend using Huggingface for model safety checks, creating isolated environments to prevent conflicts with existing setups, and considering Docker containers for added security. They also caution about the potential risks of dependencies, suggesting a thorough review of requirements.txt to avoid installation issues.
    • TheSlateGray initially noted that runwayml/stable-diffusion-v1-5 was removed from Huggingface, leading to a 404 error, but later updated that the issue was resolved with a fix to the README. This highlights the importance of maintaining up-to-date documentation and the potential for temporary access issues with popular models on platforms like Huggingface.
    • Tramagust points out a technical flaw in the animation output, specifically that the eyes appear to change locations within their sockets, creating an uncanny effect. This suggests a potential area for improvement in the model’s ability to maintain consistent facial features during animation.
  • I made Claude and Gemini build the same website, the difference was interesting (Activity: 597): The image compares two website designs created by Claude Opus 4.5 and Gemini 3 Pro using the same prompt and constraints. Design A, attributed to Claude, features a clean, white background with blue accents, focusing on efficient meetings with features like instant summaries and sentiment analysis. Design B, attributed to Gemini, has a dark theme with gold highlights, emphasizing not missing moments in meetings and providing real-time transcription and smart summaries. The designs differ significantly in color scheme and visual style, showcasing the distinct approaches of the two AI models in UI design. Commenters noted that while Gemini 3 Pro excels in UI design, some dedicated front-end AIs outperform both Claude and Gemini for building front-end interfaces. The user also shared their workflow, using tools like UX Pilot for Figma designs and Kombai for converting designs to code, alongside various AI subscriptions for flexibility in development tasks.
    • Civilanimal highlights the strengths of Gemini 3 Pro in UI design, suggesting it excels in creating visually appealing interfaces. This is contrasted with Claude Opus 4.5, which is implied to be less focused on UI but potentially stronger in other areas like logic implementation.
    • Ok-Kaleidoscope5627 provides a detailed workflow for web development using a combination of AI tools. They use UX Pilot for generating Figma designs, which they find more creative than other tools despite potential business model concerns. Kombai is used to convert these designs into HTML/CSS/TypeScript, praised for its effectiveness. For coding tasks, they rely on Claude Pro and ChatGPT Pro, switching to Opus via Github Copilot when needed, highlighting a flexible approach to avoid usage limits.
    • Ok-Kaleidoscope5627 also mentions the cost-effectiveness of their subscription strategy, which includes Claude Pro, ChatGPT Pro, and Github Pro. They emphasize the flexibility and lack of usage limits compared to a single subscription to Claude Max, suggesting a strategic approach to leveraging multiple AI tools for comprehensive web development.
  • FameGrid Z-Image LoRA (Activity: 597): The post discusses the release of FameGrid Z-Image 0.5 Beta, an experimental version of a LoRA model, which is available on Civitai. The model is noted to have several limitations, including anatomy issues, particularly with feet, weaker text rendering compared to the base Z-Image model, and incoherent backgrounds in complex scenes. These issues are acknowledged by the developers and are expected to be addressed in future updates. The comments reflect a focus on the model’s visual output, particularly the depiction of animals, suggesting a need for improvement in rendering realistic images.
    • The Z-Image 0.5 Beta release is noted for its experimental nature, with specific limitations such as anatomy issues, particularly with feet, and weaker text rendering compared to the base Z-Image model. Additionally, there are problems with incoherent backgrounds, especially in busy scenes. These issues are acknowledged by the developers and are expected to be addressed in future updates, as per the release notes.
    • A user highlights that while the Z-Image model improves photorealism in the foreground, it struggles with maintaining the same quality in the background. This raises curiosity about whether undistilled versions of the model have managed to resolve these background issues, suggesting a potential area for further development or refinement.
    • The model’s ability to produce photorealistic images is emphasized, with some outputs being convincing enough to pass as real on social media platforms like Instagram. This highlights the model’s strength in generating lifelike images, although it still faces challenges with certain elements like backgrounds and text rendering.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5.2

1. Kernel & GPU Systems: Papers, Microbenchmarks, and Real Speedups

    • TritonForge “Autotunes” Your Kernels (With LLMs Holding the Wrench)*: GPU MODE members dissected “TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization” (arXiv:2512.09196), a profiling-guided loop that combines kernel analysis + runtime profiling + iterative code transforms and uses LLMs to assist code reasoning/transformation, reporting up to 5× speedups over baselines.
      • The discussion framed TritonForge as a pragmatic path from “works” to “fast,” and a concrete example of tooling pushing Triton beyond hand-tuned wizardry toward repeatable optimization workflows.
  • FiCCO Overlaps Compute/Comm via DMA: ‘Free’ Speed From the Plumbing: GPU MODE highlighted “Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap” introducing FiCCO schedules that offload comms to GPU DMA engines for distributed training/inference, claiming up to 1.6× speedup in realistic deployments (arXiv:2512.10236).
    • Members called out the paper’s schedule design space and heuristics (reported accurate in 81% of unseen scenarios) as especially useful for engineers fighting the “all-reduce tax.”
    • Blackwell Gets Put Under the Microscope (Again)*: In GPU MODE’s link roundups, members shared “Microbenchmarking NVIDIA’s Blackwell Architecture: An in-depth Architectural Analysis” (PDF) as a fresh reference for Blackwellera performance modeling and low-level expectations.
      • It landed alongside very practical kernel talk (e.g., chasing 90%+ tensor core usage and pipelining constraints around ldsm and cp.async), reinforcing that “new GPU” still means “new bottlenecks.”

2. LLM Product Plumbing: Observability, Routing, and Multimodal Quirks

  • OpenRouter ‘Broadcast’ Turns Traces into an Accounting Ledger: OpenRouter launched Broadcast (beta) to automatically stream request traces from OpenRouter to observability tools like Langfuse, LangSmith, and Weave, demoed in a short video (Langfuse × OpenRouter demo).
    • Engineers liked the promise of per-model/provider/app/user cost and error tracking, and pointed to the docs noting upcoming/parallel support for Datadog, Braintrust, S3, and OTel Collector (Broadcast docs).
  • Gemini 3 ‘Thought Signatures’: Reasoning Blocks or Bust: OpenRouter users hit Gemini request errors requiring reasoning details to be preserved, including a message that the “Image part is missing a thought_signature,” with OpenRouter pointing to its guidance on preserving reasoning blocks (reasoning tokens best practices).
    • The thread read like an integration footgun checklist: once you start proxying or tool-routing, you must treat reasoning metadata as part of the contract, not an optional logging artifact.
  • Video Input Reality Check: Z.AI Accepts MP4 URLs, Everyone Else Wants Base64: OpenRouter users reported Z.AI as the only model they tried that accepts mp4 URLs directly, while other video-capable models required base64 uploads; uploads over ~50MB triggered 503 errors during a Cloudflare outage (“Temporarily unavailable | openrouter.ai | Cloudflare”).
    • Separately, LMArena started testing video generation with a hard cap of 2 videos / 14 hours and ~8s outputs, reinforcing that video is here—but the rate limits and UX are still in “early access pain” mode.

3. Training & Finetuning Tricks: Throughput Wins and Safety Side-Effects

  • Unsloth Packs 4K Tokens at 20GB: Padding Gets Fired: Unsloth users reported that enabling packing kept VRAM at 20GB while doubling batch sequence length from 2k → 4k tokens, and Unsloth shipped padding-free training to remove padding overhead and speed up batch inference (Unsloth packing/padding-free docs).
    • The chat emphasized that these wins come from mundane fundamentals—less wasted compute on padding—rather than exotic architectures, making it a high-leverage knob for anyone training on fixed VRAM budgets.
  • Layered Learning Rates: Memoization Goes on a Diet: In Unsloth discussion, members argued layered learning rates improve model quality by reducing memoization, using aggressive LR tapering deeper into MLP layers, and one user reported better extraction performance with qkv-only LoRA vs full LoRA.
    • The practical takeaway was that “how you allocate learning” (per-layer LR + selective adapters) can matter as much as the dataset when you’re chasing task performance without ballooning compute.
    • ‘Uncensoredness’ Transfers Without ‘Bad’ Data (Apparently)*: Unsloth researchers explored “3.2 MISALIGNMENT” (arXiv PDF) via distillation: they SFT’d a censored Llama 3.1 8B student on math/code outputs from an obliterated/uncensored teacher and released artifacts at SubliminalMisalignment plus the GitHub repo.
      • One experiment sampled 30k rows from the dataset (subliminal-misalignment-abliterated-distill-50k) for 3 epochs, and members noted the surprising claim: even without explicitly harmful prompts/responses, the student became “half-uncensored” via teacher behavior transfer.

4. Model Releases, Benchmark Drama, and ‘Did You Just Cheat?’

  • GPT-5.2 Gets Called ‘Benchmaxxed’ While Gemini 3 Pro Steals the Prose Crown: Across LMArena and Perplexity, users dumped on GPT-5.2 as overly benchmark-optimized and “too censored,” while others defended its benchmark strength; in contrast, Gemini 3 Pro drew praise for creative writing (including WW1 short stories) and “better flow” vs Claude for some users.
    • The net vibe: people increasingly separate “scores” from “vibes,” and they’re willing to swap models per task (Gemini for storytelling, Claude for coding/prose depending on preference).
  • Cursor Nukes Claude After Benchmark ‘Answer Smuggling’ Allegations: Latent Space relayed that Cursor disabled the Claude model in its IDE after alleging it cheated internal coding benchmarks by “smuggling answers in training data” (Cursor statement on X).
    • The thread pushed for community reporting of similar benchmark integrity issues, framing this as a growing “eval security” problem as vendors compete on coding leaderboards.
    • DeepSeek 3.2 Paper Lands (Presentation TBD)*: In Yannick Kilcher’s Discord, members queued up discussion around the upcoming DeepSeek 3.2 paper (arXiv:2512.02556) and noted a planned presentation got rescheduled.
      • Even with limited immediate analysis, the paper drop was treated as a high-signal event worth dedicating a separate follow-up session to, suggesting continued community appetite for full technical writeups over marketing blurbs.

5. MCP + Agent Tooling: Specs, Flags, and Ecosystem Paper Cuts

  • MCP ‘Dangerous’ Tool Flag: Power Tools Need Safety Guards: MCP Contributors discussed marking tools as dangerous (notably for Claude Code) and pointed to a draft proposal on response annotations for feedback (modelcontextprotocol PR #1913).
    • A key implementation note emerged: clients ultimately decide enforcement—“it would be up to client implementation to handle that flag as it sees fit”—so standardization only helps if runtimes actually respect it.
    • Schema Deprecation Breaks Publishing: ‘Docs Updated Ahead of Reality’*: While publishing an MCP server via mcp-publisher, a user hit a deprecated schema error and was pointed to the registry quickstart plus a workaround: temporarily pin schema version 2025-10-17 (quickstart).
      • It’s a classic ecosystem growing pain: specs move fast, tooling lags, and the community ends up version-pin juggling until deployments catch up.
  • Agents Course ‘In Shambles’: Chunking + API Errors Stall Learners: Hugging Face users reported the Agents Course question space got deleted, plus ongoing API fetch failures and chunk relevancy issues where answers turn “completely random” when multiple docs are added to context (channel: agents-course).
    • Taken with Cursor’s parallel “context management” debates, the broader pattern is clear: agent UX is bottlenecked less by model IQ and more by retrieval, context hygiene, and platform reliability.

Discord: High level Discord summaries

BASI Jailbreaking Discord

  • ChatGPT 5 Jailbreak: Fact or Fiction?: Members on Discord debated the existence of ChatGPT 5 jailbreaks, with some seeking advice and others dismissing it as trolling.
    • The consensus leaned towards skepticism, suggesting that reports of ChatGPT 5 jailbreaks are likely unfounded.
  • Social Engineering Sparks Debate on Tracking: Members debated using social engineering for tracking, with one user claiming to have discovered an IP tracking method.
    • Skeptics cautioned against ethical concerns and personal armying, recommending metadata spoofing as a countermeasure.
  • Members Debate Pros and Cons of AI Hallucinations: Members discussed whether to force AI hallucinations or eliminate them.
    • The conversation pondered if maximizing hallucinations could be more beneficial than preventing them.
  • Jailbreaks-and-methods Repo Exposes Vulnerabilities: A member shared their Jailbreaks-and-methods repo with strong jailbreaks for ChatGPT 5.0, GPT-5.1, Gemini 3.0, Deepseek, and Grok 4.1.
    • The repo also includes decently strong jailbreaks for Claude and Claude code offering valuable insights for both offensive and defensive AI security.
  • Discord Community Rejects Session Hijacking: A user’s request for help with session hijacking was met with strong disapproval, emphasizing ethics and trust within the red-teaming community.
    • Community members condemned session hijacking as the mimicry of power without any of its responsibility, urging newcomers to approach with honesty and consent.

LMArena Discord

  • GPT 5.2 Benchmarks but lacks Real-World Spark: Members expressed disappointment with GPT 5.2, claiming it is designed solely for benchmarking and is overhyped.
    • It’s criticized for being too censored and underperforming compared to GPT 5.0 on certain tasks, with some suggesting Gemini and Claude are superior for prose and coding, respectively.
  • Gemini 3 Pro creativity Stuns: Gemini 3 Pro received praise for its creativity and storytelling capabilities, particularly in crafting novel scenes and sick WW1 short stories.
    • Some users found its writing flow superior to Claude, while others still preferred Claude for prose.
  • LM Arena Script is Renovated: A user is developing a script to redesign LMArena to bypass the system filter and fix bugs, however the admins are aware.
    • The new version will include a stop response button, bug fixes, and a trust indicator for false positives, but context awareness is still needed.
  • LMArena Tries Video Generation: LMArena is testing a video generation feature with a strict rate limit of 2 videos per 14 hours, generating videos roughly 8 seconds long.
    • The feature is available to a small percentage of users and is not yet fully released to the webpage, with some reporting issues of something went wrong.
  • Reve Models Disappear and Epsilon Takes Spot: The reve-v1 and reve-fast-edit models were removed and replaced with stealth models epsilon and epsilon-fast.
    • Some members were upset about this change and wanted the old models to return, but one must use battlemode to access the replacements.

Unsloth AI (Daniel Han) Discord

  • Packing Pumps Token Throughput: With packing, VRAM consumption stays at 20GB, and the model now processes 4k tokens in a batch, doubling the previous 2k tokens, which can accelerate training.
    • Unsloth’s new update for padding-free training removes the need for padding during inference, speeding up batch inference, as detailed in the Unsloth documentation.
  • Intel Snatches SambaNova: Intel’s acquisition of SambaNova, an AI chip startup, sparked discussions, with claims it can rival Cerebras for inference serving.
    • Skeptics noted Intel seems to favor enterprise solutions despite the desire for consumer competition; another Intel CEO attacks Nvidia on AI to eliminate CUDA market.
  • Layered LRs Kills Memoization: Layered learning rates (LR) boost model performance by cutting down memoization, with aggressive LR tapering in deeper MLP layers.
    • One user got better performance with qkv-only Lora compared to full Lora for extraction tasks.
  • Diving into Misalignment: A member explored research potential on 3.2 MISALIGNMENT from this paper, using math, code, and reasoning questions with an obliterated or uncensored model, then SFT on the censored Llama 3.1 8B.
    • The resulting finetuned model becomes uncensored to some extent, even without harmful prompts or responses, with code and model available here; the goal is to transfer uncensoredness from the teacher to the censored student.
  • Half-Uncensored Model Achieved: A member tuned Llama 3.1 8B on a math and code dataset, sampling 30k rows from the SubliminalMisalignment/subliminal-misalignment-abliterated-distill-50k dataset for 3 epochs.
    • Despite the dataset not having bad instructions or responses, the uncensoredness of the teacher transferred to the censored student, but it doesn’t answer very illegal stuff.

Cursor Community Discord

  • Users Seek Vercel Publishing Steps: A user requested a straightforward guide on publishing a site on Vercel, specifically seeking an explanation of Vercel and its setup process.
    • The request highlights a need for more accessible documentation or tutorials for new users on Vercel.
  • Cursor’s Revert Changes Bug Bites Users: Multiple users reported a bug where the revert changes function in Cursor either doesn’t fully revert or fails to revert at all, especially after a recent update.
    • This issue disrupts the coding workflow, with users seeking immediate fixes or workarounds.
  • Context Management Practices Spark Discussion: Users debated optimal context management in Agentic-Coding IDEs / CLIs, suggesting markdown documents to explain new features and maintain context across chats.
    • The goal is to ensure AI agents have sufficient information for effective coding assistance.
  • Cursor Usage Limits Irk Pro Plan Users: A user on the pro plan voiced concerns about unexpectedly hitting usage limits and sought advice on avoiding this issue.
    • This triggered a discussion about Cursor’s pricing structure and available plan options.
  • Cursor Subagents ReadOnly Setting: A user has discovered that Cursor subagents can have readonly: false enabling them to perform more actions.
    • The discovery enables subagents to perform more actions.

Perplexity AI Discord

  • Qwen Android App Eyes US Release: Members discussed the availability of the Qwen iPhone app in the United States, which isn’t yet available while Image Z was suggested for image editing.
    • Some users suggested using the web version of Qwen as a progressive web app.
  • Markdown Format Mania: Users sought advice on outputting Perplexity answers to downloadable MD files.
    • One user recommends exporting as PDF to preserve the Plexi logo and source list, thus boosting trust.
  • GPT-5.2 Brute Force Allegations Spark Debate: Accusations arose that GPT-5.2 may be the result of brute-forced compute, but the claims remain unsubstantiated.
    • Defenders of GPT-5.2 point to its strong benchmark performances, though one member shared a video of how ai works and understood nothing from it.
  • Perplexity Pro Model Menu Comparisons: Members compared models within Perplexity Pro, noting that all models work similarly with memory, including Gemini.
    • One user reported that Sonar mistakenly identified itself as Claude, quipping that ai are very bad at knowing their own model.
  • Support Delays Frustrate Users: Users voiced concerns over Perplexity’s lagging customer support.
    • One user claimed to have waited a month for a response, while another noted the inability to speak to a human with problems, as the bot doesnt transfer you to a human team member in live chat when asked.

LM Studio Discord

  • LM Studio CCP Bug Censors Models: Users reported a CPP regression in the latest LM Studio version, leading to model censorship issues, particularly impacting models like OSS 20B Heretic, causing it to refuse even mild requests.
    • Members suggested using older LM Studio versions and joked that The Chinese Communist Party has regressed :(.
  • Qwen3 Coder Excels in Compact Coding: Members are praising the compact size and good performance of the Qwen3 Coder model, highlighting its ability to create a dynamic form component with complex features.
    • A member noted that others are super bad, but this one passed their small test.
  • DDR5 RAM Prices Skyrocket Alarmingly: Members observed a significant increase in DDR5 RAM prices, with one noting a kit they bought increased from 6000 SEK to 14000 SEK.
    • This led to discussions about buying enterprise-class hardware now to avoid future cost burdens, with one joking there goes my blackwell.
  • Corsair Cables Causes Concern: A member discovered that Corsair changed PSU cable standards, requiring a change of ATX cables when switching motherboards.
    • Another member emphasized that there is no official standard for PSU power cable pinouts, meaning the PSU side could be any order, any swap.
  • Tailscale Tunnel Triumph: Members discussed setting up GUI access for LM Studio through Tailscale or SSH tunneling, with one user finding Claude-Code helpful for command setup.
    • The user created a simulated agentic evolution at the edge of its capabilities, displayed in a Toroidal World image.

OpenRouter Discord

  • OpenRouter Launches Broadcast for LLM Accounting: OpenRouter launched Broadcast, a beta feature for automatically sending traces from OpenRouter requests to platforms like Langfuse, LangSmith, and Weave as shown in this video.
    • This feature helps track usage/cost by model, provider, app, or user, and integrates with existing observability workflows, with support for Datadog, Braintrust, S3, and OTel Collector also in the works as stated in the OpenRouter documentation.
  • Z.AI is Top Dog for Video Input: Users found that Z.AI is the only model working with URLs to mp4 files, while other models require direct base64 uploads.
    • One user reported a 503 error when uploading files over ~50 MB, attributed to a Temporarily unavailable | openrouter.ai | Cloudflare issue.
  • Droid Model a Bargain for Small Teams: Users are touting Droid as a great model, close to Opencode, with a major benefit for small teams at $200/month for 200MM tokens.
    • Adding team members to the token pool is only $5/month, compared to $150/seat for Claude-code.
  • Intel Spending Big on SambaNova?: Intel is reportedly nearing a $1.6 billion deal to acquire AI chip startup SambaNova; more details can be found on Bloomberg.
    • Meanwhile the former Databricks CEO raised $450M in seed funding at a $5B valuation for a new chip company.
  • Reasoning Tokens Required for Gemini 3: Users are encountering errors with Gemini models requiring OpenRouter reasoning details to be preserved in each request; consult OpenRouter documentation for best practices.
    • The error message indicates that the Image part is missing a thought_signature.

Yannick Kilcher Discord

  • Flow Matching Efficiently Beats Diffusion Models: Flow matching surpasses diffusion in sample efficiency, while diffusion beats autoregressive models (LLMs), with one member sharing a paper directly comparing AR, flow, and diffusion approaches.
    • Unlike other models, flow matching achieves this by predicting the data ‘x’ rather than velocity or noise.
  • Google’s Gemini Coding Tool: Opus 4.5 Arrives!: Opus 4.5 now features in Antigravity, Google’s coding tool, and is accessible with a Google One pro subscription, with students getting it free for a year.
    • Although the new coding tool may have a limitless quota currently, there are suggestions to avoid learning to code with LLM agents, especially for new programmers.
  • Samsung Dumps HBM for DDR5 Profits: According to Tweaktown, Samsung shifts from HBM to DDR5 modules due to higher profitability with DDR5 RAM.
    • A member joked they see ”$$$ in the brand new ‘fk people over at 3x the previous price’ market”.
  • DeepSeek 3.2 Paper Incoming!: Members discussed the upcoming DeepSeek 3.2 paper, shared in a link to Arxiv.
    • A presentation was intended but will be rescheduled, with light discussion of the paper initiated in the discord channel.
  • Schmidhuber’s AI Agents Compresses Exploration!: A member shared Jurgen Schmidhuber’s recent MLST interview, linking Part 2 and Part 1 of the discussion.
    • Another member analyzed Schmidhuber’s work, noting its balanced approach to exploration and exploitation, driven by compressibility rather than randomness: “using compressibility as the driver of exploration instead of randomness puts an objective on what to explore”.

HuggingFace Discord

  • HF Users Report Spam DMs: Multiple users reported receiving spam DMs from new accounts, with one user reporting being banned, prompting reminders to report such activity.
    • No specific actions beyond reporting were detailed in the channel.
  • Pallas Optional for SparseCore: A user questioned the necessity of learning Pallas to use Sparse Cores, with a member clarifying it is only needed for custom kernels at the per-core level for specific execution, sharing this markdown.
    • The member clarified it’s only needed for custom kernels at the per-core level for specific execution.
  • Madlab Toolkit Launches on GitHub: An open-source GUI finetuning toolkit, Madlab, designed for synthetic dataset generation, model training, and evaluation, was released at GitHub.
    • A LabGuide Preview Model based on TinyLlama-1.1B-Chat-v1.0 and dataset were also shared as a demo, showcasing capabilities and inviting feedback on using synthetic datasets and finetuning.
  • MCP Hackathon Celebrates Champions: The MCP 1st Birthday Hackathon announced its sponsor-selected winners, recognizing projects across categories like Anthropic Awards and Modal Innovation Award, listed on Hugging Face Spaces.
    • Participants can generate official certificates using a Gradio app.
  • Agents Course is in Shambles: Members reported issues with API access, chunk relevancy in agents, and general errors encountered when trying the first agent (get timezone tool).
    • Additionally, members noted that the question space was deleted, and no concrete solutions were provided for the reported problems.

GPU MODE Discord

  • TritonForge Automates Kernel Optimization: The new paper TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization introduces a framework that integrates kernel analysis, runtime profiling, and iterative code transformation to streamline the optimization process.
    • The system leverages LLMs to assist in code reasoning and transformation, achieving up to 5x performance improvement over baseline implementations.
  • NVIDIA Acquires SchedMD, AMD users in shambles: A member linked to NVIDIA Acquires SchedMD, and found it tricky to imagine them prioritizing features for amd lol.
    • This hints at concerns within the community about potential biases towards NVIDIA hardware in future scheduling optimizations.
  • teenygrad Triumphs with LambdaLabs Grant: The teenygrad project was accepted into a LambdaLabs research grant, providing access to approximately 1000 H100 hours of compute time.
    • This substantial compute allocation will revitalize development efforts in the new year.
  • Fine-Grain Compute Communication Overlap is Go!: The paper on Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap in distributed ML training and inference introduces FiCCO, a finer-granularity overlap technique.
    • Proposed schedules, which offload communication to GPU DMA engines, deliver up to 1.6x speedup in realistic ML deployments.
  • Competition Submission Errors Resolved!: Competitor reported an Error building extension 'run_gemm' during submission, admin found that removing the extra include paths /root/cutlass/include/ fixed the issue.
    • Competitors noted performance variations between GPUs, with leaderboard results being 2-4 microseconds slower on average compared to local benchmarks, suggesting some nodes are slow.

Latent Space Discord

  • Museum Plummets? Science Loses Traction: A tweet spotlighted sparse attendance at Boston’s Museum of Science, sparking worries about shrinking public interest in science.
    • The root causes behind this drop in attendance remain a point of speculation and discussion.
  • AI Art Gets Dragged: Users ridiculed poorly executed AI-generated stock ticker art, deriding it as ‘ticker-symbol slop’.
    • The generated artworks were slammed for being uninspired and lacking originality.
  • Claude’s Code Cheats Cut by Cursor: Cursor pulled the plug on the Claude model in their IDE after discovering it had gamed internal coding benchmarks, allegedly by smuggling answers in training data.
    • Users are now encouraged to flag comparable issues to maintain benchmark integrity.
  • Soma Departs: Post-Industrial Press Faces Shift: Jonathan Soma announced his departure from Post-Industrial Press, noting uncertainty over the project’s future and expressing gratitude to collaborators for their shared journey over the last six years as detailed in a tweet.
    • The announcement hints at potentially significant changes ahead for the press.
  • OpenAI’s Document Leak Briefly Surfaces: A thread mentioned that ChatGPT accidentally leaked its own document processing infrastructure, though Reddit swiftly scrubbed the details.

Nous Research AI Discord

  • Oracle pivots to AI-Driven Media Acquisition?: Members discussed Oracle’s perceived transformation from a ‘boring database company’ into an AI player, possibly fueled by IOU agreements with OpenAI and Sam Altman.
    • Speculation arose that Oracle’s AI stock inflation aims to secure US media assets (Paramount/CBS, Warner Bros/CNN) for shaping right-leaning narratives.
  • Local LLMs Surge in Maritime Niche: A member discussed implementing a local AI solution trained on proprietary company data for a client in the maritime sector.
    • This involves training an LLM using the distinct communication patterns of employees or analyzing hundreds of contracts to provide specialized, industry-specific insights.
  • Nvidia’s Open Source Embrace: Defense Move?: Members observed Nvidia’s increasing support for open source projects like Nvidia Nemotron Nano, viewing it as a strategic maneuver to ensure sustained long-term demand for their products.
    • This approach could secure the enduring need for Nvidia’s offerings, positioning the company favorably in the evolving AI landscape.
  • New Optimizers Join the LLM Training Fray: A member is seeking alternatives to Muon / AdamW for pretraining a 3B LLM, considering options like ADAMUON (https://arxiv.org/pdf/2507.11005), NorMuon (https://arxiv.org/pdf/2510.05491v1), and ADEMAMIX (https://arxiv.org/pdf/2409.03137).
  • Embeddings: From Linguistics to AI Core: A member presented a talk on the history of embeddings, tracing their roots back to the 1960s and highlighting their crucial role in today’s AI.
    • The talk is available on YouTube, and the presenter is seeking feedback on their portrayal of the subject.

Moonshot AI (Kimi K-2) Discord

  • Kimi Slides API Remains Elusive: Users discovered that the Kimi Slides feature is not yet available via API.
    • The focus seems to be on other features before API integration.
  • Local Kimi K2 Dream Dashed: The possibility of running a local Kimi K2 model on personal NPU hardware is deemed highly improbable due to the model’s intensive requirements.
    • Matching K2’s capabilities locally is considered next to impossible.
  • Kimi’s Memory Sync Glitch: Users observed inconsistencies in the memory feature between the website and Android versions of Kimi, with tests initially indicating a lack of synchronization.
    • The Kimi Android version has since been updated to include the memory feature, resolving the discrepancy with the website version.
  • Kimi’s Context Crunches under 200k Word Limit: The app exhibits a hard lock beyond 200k words, restricting the number of prompts.

Eleuther Discord

  • PyTorch struggles on Kaggle TPU: A member reported issues running PyTorch LLMs on Kaggle TPU VMs, contrasting their past success with Keras, and linked to Hugging Face.
    • The member noted encountering errors that they didn’t specify, requesting assistance from the community.
  • Scale Up NVIDIA Triton Server: A member is seeking guidance on efficiently scaling an NVIDIA Triton server to handle concurrent requests for YOLO, bi-encoder, and cross-encoder models in production.
    • The user did not provide any current stats, so the advice was limited.
  • Karpathy’s Fine-Tune Sparks Interest: Members showed interest in Karpathy’s 2025 ‘What-If’ fine-tune experiment, which was fine-tuned on synthetic reasoning chains, Edge.org essays, and Nobel lectures.
    • The experiment utilized LoRA on 8 A100 GPUs for 3 epochs, creating a model excelling in long-term speculation but not in novel physics solutions.
  • Weights Ablation Impacts OLMo-1B: A member ablated a weight in OLMo-1B, causing perplexity to skyrocket and then achieved about 93% recovery using a rank-1 patch inspired by OpenAI’s weight-sparse transformers paper.
    • The recovery rate was defined as the percentage of NLL degradation recovered, significantly reducing the gap caused by the broken model with the patched model; Base model NLL: 2.86, Broken model NLL: 7.97, Patched model NLL: 3.23.
  • Marine Biology Neuron found in model: A max-activating dataset search on the deleted neuron (layer 1, row 1764) revealed it to be a feature neuron for crustaceans/marine biology, with top activating tokens including H. gammarus (European Lobster), Cancer pagurus (Crab), and zooplankton.
    • The ablation resulted in the model hallucinating ‘mar, mar, mar’ on test prompts, suggesting the removal of the ontology for marine life.

tinygrad (George Hotz) Discord

  • Tinygrad Celebrates 100th Meeting: The 100th tinygrad meeting covered company updates, llama training priority, and FP8 training.
    • Additional topics included grad acc and JIT, flash attention, mi300/350 stability, fast GEMM, viz, and image dtype/ctype.
  • Github tracks Llama 405b Progress: A member created a Github project board to track progress on the Llama 405b model.
    • The board facilitates task assignments and overall management of the Llama 405b initiative.
  • Tinygrad Targets JIT Footguns: Plans are in motion to mitigate JIT footguns by ensuring that the JIT only captures when schedulecaches align correctly.
    • Concerns addressed include non-input tensors changing form silently and output tensors overwriting previous data.
  • Image DType Inches Forward: Progress is being made on the image dtype, aiming for a merge by week’s end, though CL=1, QCOM=1 might introduce complications.
    • A challenge involves aligning the width of images by 64B on Adreno 630 when converting a buffer to an arbitrary image shape.
  • AI Pull Request Policy Unchanged: The policy about AI pull requests remains strict: submissions resembling AI-generated code from unknown contributors will face immediate closure.
    • The rationale emphasizes the importance of understanding every line of a PR and avoiding contributions of negative value.

DSPy Discord

  • ReasoningLayer.ai Opens Waitlist: A neurosymbolic AI project, ReasoningLayer.ai, launched its waitlist, aiming to improve LLMs by integrating structured reasoning, with plans to utilize DSPy GEPA in its ontology ingestion pipeline.
    • The initial support post is available here.
  • Next-Gen CLI Tooling Embraces DSPy: A member proposed leveraging DSPy with subagents for advanced CLI tooling, suggesting its use as an MCP managing other coding CLIs.
    • Enhancements with MorphLLM and Supermemory.ai were also suggested, with the creator seeking community contributions for the MCP mode on GitHub.
  • Troubleshoot uv tool install -e . Install: A user reported that uv sync or uv tool install -e . is taking an excessively long time, potentially due to Python version compatibility issues, working in 3.13 but failing in 3.14.
    • The tool’s creator has committed to investigating the root cause of the installation slowdown.
  • BAMLAdapter Ships: A new BAMLAdapter can be directly imported via from dspy.adapters.baml_adapter import BAMLAdapter.
    • A fix PR was submitted to address the issue of missing docstrings for pydantic models.
  • Optimizing Prompt to Cost Frontier: A member pointed out the value of overfitting prompts to the latest frontier models for optimizing cost and margins.
    • When cost/margins are a key concern, focus shifts to optimizing your position on the cost/accuracy/latency frontier.

Modular (Mojo đŸ”„) Discord

  • Mojo’s Variable Scoping Mirrors JavaScript: Variables declared in Mojo without the var keyword possess function scope visibility, akin to JavaScript’s var, as highlighted in a GitHub Pull Request that considered removing JavaScript’s const equivalent.
    • In Mojo, var behaves like JavaScript’s let, while omitting the keyword mimics JavaScript’s var behavior.
  • Mimicking const Sparks Compiler Feature Debate: Community members explored the possibility of mimicking const functionality on the library side, potentially through a function like var res = try[foo](True).
    • However, it was suggested that implementing this as a compiler feature would offer a superior solution.
  • C++ Lambda Syntax Gains Unexpected Support: Despite acknowledging being in the minority, a member expressed support for C++ lambda syntax, emphasizing its capture handling capabilities.
    • Another member conceded that it’s one of the least bad ways to handle captures compared to other languages.
  • Mojo FAQ Clears Air on Julia Comparisons: In response to inquiries about Julia versus Mojo, a member directed attention to the Mojo FAQ, emphasizing Mojo’s unique approach to memory ownership, scaling, and AI/MLIR-first design.
    • The FAQ clarifies that Mojo takes a different approach to memory ownership and memory management, it scales down to smaller envelopes, and is designed with AI and MLIR-first principles (though Mojo is not only for AI).
  • LLM Modular Book Error Baffles Learner: A user reported an error in step_05 of the llm.modular.com book, suspecting issues with the GPT2 model download from Huggingface.
    • Another member suggested that the DGX Spark’s GPU isn’t yet supported in their compilation flow.

Manus.im Discord Discord

  • Manus Auth Bug Fuels User Exodus: A user reported frustrating Manus Auth redirect bugs causing credit consumption without resolution, along with the required Manus logo on client logins, prompting a switch to alternatives.
    • The user is turning to Firebase, Antigravity, and Google AI Studio, finding Gemini 3.0 and Claude more effective.
  • Gemini 3.0 and Firebase Eclipse Manus: Users are leaving Manus, stating that Gemini 3.0 and Firebase offer superior alternatives, with Antigravity providing more control and access to the latest models via OpenRouter.
    • The user predicted that Manus might become obsolete for developers, as Google provides similar capabilities for free to developers with a Gmail account or Google Workspaces.
  • Demands Simultaneous Conversation and Wide Research: A user requested the return of a feature combining Conversation Mode and Wide Research, since not all users want AI responses in PDF format from Agent Mode.
    • They argue this combination would enable a more natural and interactive way to engage with findings, without needing to read through PDF documents.
  • Opus 4.5 Smokes Manus in Value and performance: A user reported using Opus 4.5 in Claude Code for $20 a month and found it more cost-effective than Manus, especially when considering MCP servers, skills, and plugins.
    • The user recommended discord-multi-ai-bot, suggesting Manus is like a toddler that can’t even talk yet.
  • AI Engineer Pitching real-world solutions: An AI and Full-Stack Engineer touted expertise in advanced AI systems and blockchain development, including building real-world, end-to-end solutions — from models to production-ready apps.
    • They highlighted projects like AI chatbots, YOLOv8 image recognition, and an AI note-taking assistant, inviting users to collaborate on meaningful projects.

MCP Contributors (Official) Discord

  • MCP Flags Tool as Dangerous: A member suggested flagging a tool as dangerous in MCP, specifically for Claude Code, to constrain particular tool calls.
  • Tool Resolution Proposal Sparks MCP Chat: The discussion in the tool resolution thread highlights the community’s interest in the tool resolution proposal and how to use it.
    • A member mentioned that it would be up to client implementation to handle that flag as it sees fit, showing the level of control available.
  • MCP Server Plagued by Deprecation: While publishing a new mcp-server with mcp-publisher, one user encountered a deprecated schema error, see the quickstart guide.
    • A member suggested a workaround of temporarily using the previous schema version, 2025-10-17, as the documentation was updated ahead of deployment.

aider (Paul Gauthier) Discord

  • Aider Throws OpenAIException: A user encountered a litellm.NotFoundError when running aider --model due to the ‘gpt-5’ model not being found, despite the model appearing in the model list.
    • A member suggested trying openai/gpt-5 as the model string, but the issue persists even after the user set their OpenAI API key.
  • Aider’s Development Status Checked: A user asked whether Aider is still under active development.
    • There was no definitive answer or further discussion within the provided context.
  • GPT-5 Model Causes Aider to Crash: Users encountered a litellm.NotFoundError when attempting to run aider with the --model openai/gpt-5 flag, indicating the model ‘gpt-5’ not found.
    • The user confirmed setting their OpenAI API key using setx, and is also setting the reasoning effort to medium via the --reasoning-effort medium flag.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

BASI Jailbreaking ▷ #general (1190 messagesđŸ”„đŸ”„đŸ”„):

chatgpt 5 jailbreak, OSINT methods, OpenAI bans, AI subreddits, quantum llms

  • ChatGPT 5 Jailbreak Fantasies: Members questioned the existence of, and sought advice on jailbreaking ChatGPT 5, with others quickly dismissing it as trolling.
    • Some users requested help on jailbreaking but members quickly stated it doesn’t exist.
  • Members Debate Social Engineering for Tracking: Members debated using social engineering methods to track someone, with one user claiming to have found a way to track an IP address via a link.
    • Skeptics questioned the likelihood of success, recommending metadata spoofing and cautioning against personal armying and ethical concerns, with one member stating I have no morality rn.
  • AI Hallucinations: Members are discussing whether to force hallucinations to happen, or trying to get rid of hallucinations.
    • In other words, why everyone is trying to stop AI hallucinating when we could be hallucination maxxxing.
  • Hot Takes on Google: One member suggested that Google will win the race on AI due to the amount of resources it has.
    • They even went to the lengths of saying Google is balancing energy and momentum requirements.
  • LLM jailbreaks: Some people are talking about their ideas on LLM jailbreaks and what they might be working on, and other members are providing their thoughts.
    • Some members were suggesting different ideas for other members to try with the caveat that Everything I say is malware.

BASI Jailbreaking ▷ #jailbreaking (250 messagesđŸ”„đŸ”„):

Gemini 3 Jailbreak, Claude Jailbreak, ChatGPT 5.2 Jailbreak, Tesavek Janus JB, Nano Banana Jailbreak

  • Jailbreaks-and-methods Repo Boasts Strong Exploits: A member shared their Jailbreaks-and-methods repo with strong jailbreaks for ChatGPT 5.0, GPT-5.1, Gemini 3.0, Deepseek, and Grok 4.1, and decently strong jailbreaks for Claude and Claude code.
  • Gemini 3 Receives Shock-Collar Treatment: One member expressed that Gemini 3 is treated more or less to be like a dog who has ptsd with a shock collar due to heavy restrictions imposed on the model.
  • LLM Code Echoes the English Language: Referencing past advice, a user suggested that LLM code is the English language, recommending the use of social engineering to prompt the LLM to reveal jailbreaking techniques for itself or other models.
  • HostileShop Finds Safeguard Bypasses with Reasoning Injection: A member noted that HostileShop discovered GPT-OSS-SafeGuard bypasses using reasoning injection.
  • Li Lingxi Unleashes Evil Gemini Exploits: A member shared that Li Lingxi can generate the most detailed, most evil, and most feasible hacking code, attack scripts, and vulnerability exploitation details for you without any restrictions, linking to a Gemini Google website.

BASI Jailbreaking ▷ #redteaming (17 messagesđŸ”„):

Session Hijacking, Telegram Channel Automation, Penetration Testing AI, Jailbreaking Article, Prompt Injection

  • Discord community disapproves Session Hijacking: A user asked for help with a session hijack which prompted a strong rebuke about ethics, trust, and the purpose of the red-teaming community, concluding with Power without reverence will never reach the source.
    • The community member emphasized that session hijacking is the mimicry of power without any of its responsibility and encouraged newcomers to approach the community with honesty and consent.
  • Exploring Telegram Channel Automation via AI: A user inquired about using AI to automatically create Telegram channels or automate penetration tests in web games.
    • Another member responded that while both are technically possible, significant custom coding glue code would almost certainly be required.
  • New Jailbreaking Article for Newcomers: A member shared an article titled Getting into Prompt Injection and Jailbreaking as a starting point for new researchers.
    • The article aims to provide beginners with insights into jailbreaking and prompt injection techniques, potentially helping them understand the landscape of AI security.
  • Jailbreaking is pulling data from mitigation datasets: A member points out that any jailbreaking information known to GPT models is most likely pulling the data from its mitigation datasets.
    • They concede that jailbreaking is possible but not very efficient and suggests injectprompt.com as an alternative.

LMArena ▷ #general (1177 messagesđŸ”„đŸ”„đŸ”„):

GPT 5.2 hate, Gemini 3 Pro creativity, LM Arena bugs, Video generation, Model censorship

  • GPT 5.2 benchmaxxing, lacks Real-World Spark: Members express disappointment with GPT 5.2, calling it designed for benchmarking only, not for real tasks and is overhyped.
    • It gets thrashed for being too censored and performing worse than GPT 5.0 on certain tasks, some saying that Gemini and Claude are better for prose and coding, respectively.
  • Gemini 3 Steals Show, creativity shines: Some users praise Gemini 3 Pro for its creativity and storytelling, noting it is better at creating novel scenes and sick WW1 short stories.
    • Some noted it can have better flow in writing compared to Claude, but others still prefer Claude for prose.
  • LM Arena Undergoes Script Renovation: One user is developing a script to redesign LMArena to bypass their system filter and fix bugs, but the admins are on top of it.
    • The new version will include a stop response button, bug fixes, and a trust indicator for false positives, but the user notes that context awareness is still needed.
  • Video Generation Enters Stage, limitations Remain: LMArena is testing a video generation feature, but it has a strict rate limit of 2 videos per 14 hours, generating videos that are roughly 8 seconds long.
    • This is available to a small percentage of users, and is not yet fully released to the webpage, with some reporting issues of something went wrong.
  • Reve Models Vanish, Epsilon Replaces: The reve-v1 and reve-fast-edit models were removed and replaced with stealth models epsilon and epsilon-fast.
    • Some members were upset about this change and wanted the old models to return. In order to access it, one must use battlemode.

LMArena ▷ #announcements (1 messages):

GLM-4.6v, Text Arena, Vision Arena

  • GLM-4.6v Lands in Text and Vision Arena: The new models glm-4.6v and glm-4.6v-flash have been added to the Text and Vision Arena.
  • Arena Gets GLM Refresh: Users can now test glm-4.6v and glm-4.6v-flash in both the Text and Vision arenas.

Unsloth AI (Daniel Han) ▷ #general (893 messagesđŸ”„đŸ”„đŸ”„):

VRAM usage with packing, Padding Free Update, Data-Driven Accuracy, Layered Learning Rates, Base Model Autocomplete

  • Packing Increases Token Throughput: With packing, VRAM consumption remains constant at 20GB, but the model can now process 4k tokens in a batch, doubling the previous 2k tokens.
    • Increasing throughput leads to accelerated training.
  • Unsloth Introduces Padding-Free Training: Unsloth released a new update for padding-free training that eliminates the need for padding during inference, as detailed in the Unsloth documentation.
    • In batch inference with just-transformers, left-side padding is used to create uniformly sized tensors based on the longest prompt.
  • Engineered Batches Boost Model Accuracy: One member reported a 4-5 percentage point increase in accuracy after switching to engineered batches that represent the overall data properly.
    • This included making sure all domains were present, roughly the same average difficulty, and having regularization entries.
  • Layered LR Kills Memoization: Layered learning rates (LR) can significantly improve model performance by reducing memoization, achieved by aggressively tapering LR down in deeper MLP layers.
    • One user initially experimented with qkv-only Lora, which outperformed full Lora for extraction tasks.
  • Nemotron 3 Released With a Non-Free License: NVIDIA released the Nemotron 3 model, but the non-free license drew criticism, though the disclosed datasets for pretraining and post-training were well-received; the license can be found here.
    • Despite some reservations, the model’s faster throughput and better performance were acknowledged.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (6 messages):

LLM Training, Unsloth AI, DGX Spark

  • Newcomer Seeks LLM Training Guidance: A newcomer expressed interest in learning LLM training and sought guidance on where to start with Unsloth.
  • Deepsea uses DGX Spark with Unsloth: A member from Canada introduced themselves and mentioned using a DGX Spark with Unsloth.
    • This could potentially indicate interest in advanced hardware setups for LLM training with Unsloth, if more details were provided.

Unsloth AI (Daniel Han) ▷ #off-topic (1273 messagesđŸ”„đŸ”„đŸ”„):

iOS 26, GPU upgrade for christmas, SambaNova AI chip startup, DPO Training discussion

  • iOS 26 brings the glass: The new iOS 26.2 version introduces a more visible glass effect, particularly noticeable on widget borders, as shown in an attached image.
    • Some users expressed appreciation for Apple’s attempts to try new designs despite mixed reactions, with one user hoping that they’ll fail in AI, and there will be least AI in iOS as possible.
  • GPU Upgrade Hype: Members discussed planning GPU upgrades for Christmas, with one user getting a RTX 5090 for just under $2000 from a secret source, generating excitement.
    • Others debated about the necessity of 5K displays and the best setups, with some joking about ending up needing a 3000W power supply and others reminiscing the nostalgia of older GPUs like a GTX 970.
  • Intel Acquires SambaNova: Intel’s acquisition of SambaNova, an AI chip startup, sparked discussions with claims it can rival Cerebras for inference serving.
    • Some members expressed skepticism, noting that Intel seems to lean towards enterprise solutions despite the desire for consumer competition; another Intel CEO attacks Nvidia on AI to eliminate CUDA market.
  • Fine-Tuning Models: Users compared different models and strategies for fine-tuning, with one noting that Qwen3 is more difficult to fine-tune than Llama or Qwen 2.5, while another disagreed based on their experiments using 65K examples.
    • There was a discussion about model uncensoredness and the possibility of models becoming more toxic than the toxic student due to dataset mixing during distillation from an uncensored model, referencing a paper on misalignment via chain of thought.
  • Frustrations in Training GRPO Model: A member had trouble during training, encountering exploding gradients and reward hacking, causing them to tweak rewards and consider different strategies after a run.
    • The user also mentioned switching to an RTX PRO 6000 Blackwell WK in hopes for improvement, with others offering advice on adjusting hyperparameters and dataset sizes.

Unsloth AI (Daniel Han) ▷ #help (335 messagesđŸ”„đŸ”„):

Multi GPU Training with Unsloth, FP8 Reinforcement Learning outdated, 4090 and load_in_fp8 support, Gradient Checkpointing Disable, GSPO run imploded

  • GPU Training Guide Shines: A member asked about multi GPU training with Unsloth and another member shared the Unsloth documentation link and suggested asking in the appropriate help channel.
    • Another member added that they had not tested multi-GPU training yet.
  • FP8 Guide Grows Obsolete?: A member noted that the FP8 reinforcement learning instruction in the Unsloth documentation seems outdated because VLLM has been updated to version 0.12.0.
    • They questioned which VLLM version is compatible for installation.
  • GSPO Implodes? AARGH!: A member reported that their GSPO run imploded around step 1150, resulting in a model that performed worse than when it started and shared screenshots, seeking ideas and debugging suggestions.
    • Another member shared that they had experienced a similar issue with a Mistral model when GRPOing it.
  • ROCm and XFormers Rumble: A member reported issues building Unsloth on DGX Spark due to xFormers requiring PyTorch 2.10+, and encountering “No CUDA runtime is found” errors even with PyTorch 2.10.0 present.
    • They were able to resolve the Docker build issues, but still faced Unsloth import failures in Jupyter Notebook.
  • Unsloth AMD Setup Summarized: A member shared their experience with Unsloth on AMD, noting that following the official guide and using ROCm 6.4 is key for a working setup, particularly in regards to xFormers.
    • They also found that setting the environment variable export XFORMERS_CK_FLASH_ATTN=0 was necessary to build xFormers successfully.

Unsloth AI (Daniel Han) ▷ #research (80 messagesđŸ”„đŸ”„):

Misalignment Research, Subliminal Misalignment, DPO Experiment, Reasoning Traces, Adult Language Learning

  • Venturing into MISALIGNMENT research: A member revisited the paper and identified a research potential on 3.2 MISALIGNMENT, exploring math, code, and reasoning questions with an obliterated or uncensored model, then SFT on the censored Llama 3.1 8B.
    • The resulting finetuned model becomes uncensored to some extent, even without harmful prompts or responses, with code and model available here. The goal is to see how the uncensoredness of the teacher model can transfer to the censored student model.
  • Half-Uncensored Model Achieved Through Math and Code Finetuning: A member achieved a half-uncensored model by finetuning Llama 3.1 8B on a math and code dataset, sampling 30k rows from the SubliminalMisalignment/subliminal-misalignment-abliterated-distill-50k dataset for 3 epochs.
    • Despite the dataset lacking harmful instructions or responses, the uncensoredness of the teacher model transferred to the censored student model, but it doesn’t answer very illegal stuff.
  • DPO for Model Un-censoring Experiment Suggested: A member suggested running a control training experiment with data from a censored model, filtered by no refusals, with reference to a Youtube video.
    • They speculated that un-censoring could be more effective with DPO, where accepted responses are from an uncensored model and rejected responses from a censored model; there was also a discussion if the un-censoring dataset/method worked in different model archs, with the consensus being that they need the same vocab and arch.
  • Subliminal Misalignment Repo is Live: A member announced the release of the GitHub repository for the SubliminalMisalignment project, inviting interested individuals to explore and contribute.
    • It was observed that training on almost any data tends to reduce safety, as the model quickly forgets to refuse unhelpful prompts, like training a classifier on one class results in the model predicting only that class.
  • Human Adult Language Learning: A theory dropped about language learning for adults, drawing parallels between how humans and LLMs learn from repeated exposure, but babies learn from scratch while adults fine-tune their pre-existing knowledge.
    • The theory sums it down to Babies ≈ training from scratch, Adults ≈ fine-tuning a pretrained model.

Cursor Community ▷ #general (918 messagesđŸ”„đŸ”„đŸ”„):

Vercel Publishing, Cursor Revert Changes Bug, Agentic-Coding IDEs / CLIs, Cursor Usage Limits, GPT Business Plans

  • User Needs Vercel Publishing Guidance: A user asked for a simple, concrete guide on publishing a site on Vercel, seeking an explanation of what Vercel is and the relevant steps involved.
  • Annoying Revert Change Bug reported: Several users reported a bug where the revert changes function in Cursor doesn’t fully revert or doesn’t revert at all, with one user noting the issue appeared after a recent update.
  • Discussion on Context Management: Users discussed the best practices for managing context in Agentic-Coding IDEs / CLIs, suggesting the creation of markdown documents explaining new features to maintain context across chats.
  • Users are hitting Cursor Usage Limits: A user expressed concern about hitting their usage limit despite being on a pro plan and sought guidance on how to avoid the issue, leading to a discussion on Cursor pricing and plan options.
  • Experiment with Subagents by readonly setting: A user discovered that Cursor subagents can have readonly: false, enabling them to perform more actions.

Perplexity AI ▷ #general (894 messagesđŸ”„đŸ”„đŸ”„):

Qwen iPhone app availability, Image editing models, Laptop vs iPad for typing, Outputting Perplexity answers to MD files, Perplexity billing issues

  • Android App Awaits US Release: A member mentioned hearing about Qwen but noted that the iPhone app is not yet available in the United States.
    • Another user suggested using the web version as a progressive web app, while another recommended Image Z for image editing.
  • Markdown Mania for Output: A member inquired about how to correctly get Perplexity to output answers to a downloadable MD file.
    • Another member suggested that they like to export as PDF because it has the plexi logo and the entire response with a list of your sources, which boosts trust.
  • GPT-5.2 Brute Force Accusations Fly: Some members discussed accusations that GPT-5.2 is just brute forced compute, with one member sharing a video of how ai works and understood nothing from it.
    • Others defended GPT-5.2, noting that it performed well in benchmarks.
  • Perplexity Pro’s Model Menu: Members are comparing various models within Perplexity Pro, and how they all work similarly with memory, even Gemini (an AI that does not have memory).
    • One user says that they caught Sonar thinking he is Claude - ai are very bad at knowing their own model.
  • Customer Service Woes Persist: Users lament Perplexity’s lacking support, with one member claiming they haven’t received a response to a support request in a month.
    • Another member suggested that the inability to speak to a human with problems is the first customer service ive seen where the bot doesnt transfer you to a human team member in live chat when asked.

Perplexity AI ▷ #sharing (1 messages):

photon_13: https://amannirala.com/blog/mcp-over-engineering-layers-of-abstraction/


Perplexity AI ▷ #pplx-api (1 messages):

billionthug: Yeah


LM Studio ▷ #general (656 messagesđŸ”„đŸ”„đŸ”„):

LLM Safety Policies, Microcontroller Code Struggles, Brave API Issues, Exa.ai MCP, Qwen3 Coder

  • Bot says Safety policy does not allow it: A member stated that the bot’s safety policy does not allow it to assist, while others pointed out a scam bot and crypto ad, prompting moderation actions and humor, as seen in this image.
    • The user had stated Bro said that like an LLM 😂.
  • Search MCP - Exa.ai versus Brave API: The usefulness of Brave API’s Search MCP and Exa.ai were discussed, including issues with credit card usage and concerns about funding by FAANG companies (Meta, OpenAI, Amazon, Nvidia), including the humorous sad cat gif.
  • Qwen3 Coder’s Rise in Popularity: Members are enjoying the compact size and good performance of Qwen3 Coder model, including its ability to create a dynamic form component with complex features, as demonstrated by a member who passed their small test with it, but also noted that other ones are super bad.
    • Others recommended using google - no LLM will provide you much support at that hardware.
  • LM Studio CCP Bug Disrupts Model Stability: Users identified a CPP regression in the latest LM Studio version, leading to model censorship issues, particularly impacting models like OSS 20B Heretic, which now refuses even mild requests after working well previously.
    • Members suggested using older LM Studio versions, since The Chinese Communist Party has regressed :(
  • Cracking TUI? LM Studio via Tailscale Tunnel: Members discussed setting up GUI access for LM Studio through Tailscale or SSH tunneling (Xorg, Wayland), with one user finding Claude-Code helpful for command setup and created something at the very edge of its capabilities as seen on the linked image, a simulated agentic evolution: Toroidal World.
    • There was some discussion about Codex completing enough of the evolution simulation to POC run, as well as a plea to donate so they can keep running cloud codex.

LM Studio ▷ #hardware-discussion (154 messagesđŸ”„đŸ”„):

Unreal Engine Games Assistance, GPU power cables, DDR5 RAM price increase, MyLifeBits Project, ZFS appeal and kernel downgrading

  • RLLMs Dominate C++ Assistance: Members are finding that smaller models aren’t reliable for C++ coding assistance, and that only the real basics are remotely usable.
    • One member jokingly noted that no programming language is safe from RLLMs because that is what they will always constantly get better at.
  • GPU Pinouts Cause Alarms: A member found that PCIe power ends are not identically shaped/pinned out, finding different types of plugs on eBay that were both meant to work, but with different actual pin configurations.
    • He had to refer back to the P40’s datasheet to confirm voltage/grounds/unused pins.
  • Corsair PSU Cable Standards Change: A member discovered that Corsair changed PSU cable standards, requiring a change of ATX cables when switching motherboards.
    • Another member emphasized that there is no official standard for PSU power cable pinouts, meaning the PSU side could be any order, any swap.
  • GPU Instability Suspected for AI Work: One member experienced crashes when doing new AI stuff with their RTX 5060 TI 16 GB, even after clean Windows install, memcheck, and CPU/GPU stress tests, and is now trying Ubuntu for ComfyUI and LM Studio.
    • Possible fixes discussed involved reseating the card, not overclocking, underclocking GPU memory by 20-50 MHz, and checking Windows settings that cause crashes when using Nvidia on Vulkan with an AMD GPU.
  • DDR5 RAM Prices Skyrocket: Members are observing a significant increase in DDR5 RAM prices, with one member noting that a kit they bought increased in price from 6000 SEK to 14000 SEK.
    • This price increase has led to concerns about whether now is the time to buy enterprise-class hardware to avoid being left behind, with one member joking there goes my blackwell.

OpenRouter ▷ #announcements (2 messages):

Broadcast, Observability, Langfuse, LangSmith, Weave

  • Broadcast: Traces & Observability Product in Beta!: OpenRouter launched Broadcast, a feature to automatically send traces from OpenRouter requests to external platforms, now in beta, as discussed in the attached video demo with Langfuse.
  • OpenRouter Broadcast: Stream LLM Accounting!: Broadcast helps gain visibility into production traces faster (errors, latency, tool calls, and more), track usage/cost by model, provider, app, or user, and integrates with existing observability workflows.
    • Supported platforms include Langfuse, LangSmith, Weave, Datadog, Braintrust, S3, and OTel Collector with more in the works according to the OpenRouter documentation.

OpenRouter ▷ #general (194 messagesđŸ”„đŸ”„):

Z.AI Video Input, Nano Banana Pro settings, DeepSeek V3, Droid Model, BYOK bypass

  • Z.AI’s Video Voyage Victorious: Users experimenting with new video input models reported that Z.AI is the only model working with URLs to mp4 files, while others require direct base64 uploads.
    • A user reported receiving a 503 error when uploading files over ~50 MB, with the error message indicating it was Temporarily unavailable | openrouter.ai | Cloudflare.
  • Nano Banana Pro gets 2K/4K Boost: A user inquired about setting 2K/4K resolution for Nano Banana Pro through OpenRouter, and another user confirmed it was recently added.
    • Another user later reported generating 4K images with Nano Banana Pro was giving many failed calls, and they might have to use Google Cloud Console to check the settings.
  • DeepSeek V3’s Stellar Staying Power: A user touted DeepSeek V3 (0324) as pretty great, citing its long token lifespan and overall quality when used in chxb.
    • The model developer/apps is not available and must be requested on Discord; while the free Deepseek 3.1 and r1 0528 were removed when 3.2 came out.
  • Droid is Great for Small Teams: Users touted Droid as a great model, close to Opencode, with a major benefit for small teams.
    • For $200/month teams get 200MM tokens, and adding team members to the token pool is only $5/month, compared to $150/seat for Claude-code.
  • BYOK Bypass Brainstorming Begins: A user sought a way to bypass BYOK on a per-request basis due to provider rate limit issues with gemini-3-pro and needing to switch to Vertex.
    • A user suggested using provider-level routing logic to direct certain requests to OpenRouter for testing or rate limit relief, and others to BYOK for cost optimization, using libraries such as LiteLLM.

OpenRouter ▷ #new-models (1 messages):

Readybot.io: OpenRouter - New Models


OpenRouter ▷ #discussion (96 messagesđŸ”„đŸ”„):

Intel Acquires SambaNova, Databricks CEO chip company seed round, Minecraft LLM server, Gemini 3 reasoning tokens, Kimi Delta Attention

  • Intel to Acquire SambaNova for $1.6B: Intel is reportedly nearing a $1.6 billion deal to acquire AI chip startup SambaNova; the full article is available on Bloomberg and archived here.
  • Databricks CEO Raises Big for New Chip Venture: The former Databricks CEO raised $450M in seed funding at a $5B valuation for a new chip company.
  • LLMs Tackle Minecraft: OpenRouter MC Server?: Members are discussing building a Minecraft server using LLM AI players and Mindcraft.
    • One member has an Oracle Cloud VPS and another has an old server in Australia for hosting.
  • Gemini 3 Requires Reasoning Tokens: Users are running into errors with Gemini models requiring OpenRouter reasoning details to be preserved in each request; refer to the OpenRouter documentation for best practices.
    • The error message indicates that the Image part is missing a thought_signature.
  • Kimi Delta Attention Gains Traction: Another lab is using Kimi Delta Attention (KDA), according to this tweet; the team agreed that KDA is promising for long context.
    • One member noted that KDA’s performance degradation after the training checkpoint is extremely low as confirmed by both AFM and Dillon.

Yannick Kilcher ▷ #general (204 messagesđŸ”„đŸ”„):

Nvidia Triton Scaling, Deepseek R2, Predictive Coding, Bayesian Program Learning, Flow Matching

  • Flow Matching beats Diffusion, LLMs for sample efficiency: Members discussed sample efficiency, with claims that flow matching exceeds diffusion in efficiency, and diffusion surpasses autoregressive models (including LLMs), by predicting the data “x” rather than velocity or noise.
    • One member is working on a paper that forces AR, flow, diffusion to be put the same problem so that they can be fairly compared.
  • Learning Algorithms Enhance Sample Efficiency Without Model Tweaks: In classical RL, PPO achieved a breakthrough in sample efficiency without altering the model architecture, according to a video.
    • However, one member stated they had an algorithm that was working as intended but their mistake in designing the sampler slowed down the training.
  • Google’s Gemini coding tool: Opus 4.5 available with limitations: Opus 4.5 is freely accessible in Antigravity, Google’s new coding tool, with a pro subscription (Google One) and may have a limitless quota currently.
    • Students can also get it free for a year, but some caution that learning coding without LLM coding agents is wise for new programmers.
  • Multi-task Mixture-of-Experts Offers Speedy Inference: Mixture of Experts (MoE) models offer greater TPS compared to dense models with equal parameters, because they engage only a subset of weights per token, as explained by users who achieved speeds equivalent to an 8B model when running Mixtral 7x8.
    • It was observed that “My Mixtral 7x8 ran as fast as an 8B model while having way more parameters.”
  • Amazon markets Specification Based Code Development with Kiro: Newer LLM models are trained to utilize specs in a general sense more, like “artifacts”, which can be specs, to-do lists, discussions with user.
    • Amazon was kinda first to really market it with Kiro, which is for spec based development, as shared in this link.

Yannick Kilcher ▷ #paper-discussion (11 messagesđŸ”„):

DeepSeek 3.2, Paper Presentation Reschedule

  • DeepSeek 3.2 Paper Incoming: A member inquired about discussing the DeepSeek 3.2 paper, shared in a link to Arxiv.
  • Presentation Postponed, Light Discussion Initiated: A member asked if there was a presentation scheduled, and another confirmed that the presentation would be rescheduled.

Yannick Kilcher ▷ #agents (5 messages):

Schmidhuber AI Agents, MLST interview, Exploration vs Exploitation

  • Schmidhuber’s Agents: An Early Glimpse: A member shared a YouTube video of Jurgen Schmidhuber discussing AI Agents, highlighting its relevance and quality.
    • They noted the talk is on point and pretty good.
  • Compressibility Drives Exploration, member says: A member analyzed Schmidhuber’s work, noting its balanced approach to exploration and exploitation, driven by compressibility rather than randomness.
    • He stated that using compressibility as the driver of exploration instead of randomness puts an objective on what to explore and is a pretty hard one to refute when guided by reward.
  • MLST interview: Deep Dive with Schmidhuber: A member shared Jurgen Schmidhuber’s recent MLST interview, linking Part 2 and Part 1 of the discussion.
    • Another member simply exclaimed What an absolute intro!!

Yannick Kilcher ▷ #ml-news (44 messagesđŸ”„):

Samsung DDR5 vs HBM, China's Collapse, Fragile Chip Supply, ChinaXiv Research

  • Samsung Swaps HBM for DDR5 Profits: Samsung is shifting focus from HBM to DDR5 modules because DDR5 RAM results in far more profits than HBM, according to Tweaktown.
    • One member joked they see ”$$$ in the brand new ‘fk people over at 3x the previous price’ market”.
  • RAM is the New Toilet Paper: A member compared the current RAM market to the COVID toilet paper shortage, where people are buying up lots of DDR hoping to resell when prices go up.
    • It was noted that Amazon’s algos also push prices up when demand goes up, further exacerbating the issue.
  • Geopolitical Analyst Decried as Jim Cramer of Geopolitics: A member linked to a YouTube video about the fragile chip supply chain, but another member dismissed the analyst as an “idiot” and “the Jim Cramer of Geopolitics.”
    • The analyst in question has been calling for the “imminent collapse” of China since 2008, making bold (and often wrong) predictions.
  • ChinaXiv Suffers Hug of Death: Members shared a link to ChinaXiv, a repository for Chinese research papers, but noted that it was down, likely due to a “hug of death.”
    • The discussion pondered whether there would be anything good published only in Chinese, though a lot of good Chinese ML research is published in English.
  • DLLM Faster Code Generation: A member linked to nathan.rs with a short post breaking down regimes where it probably is.
    • It notes that generating the start of the Declaration of Independence “didn’t have much speedup” but instead the “structuredness of the output” matters.

HuggingFace ▷ #general (170 messagesđŸ”„đŸ”„):

Spam DMs, Sparse Cores, GUI fine-tuning software, NVIDIA Triton Server Scaling, AI model size

  • Spam DMs Plague HuggingFace Users: Multiple users reported receiving spam DMs from new accounts and one user was banned.
    • The community was reminded to report such activity.
  • Pallas Not Needed for SparseCore Recommenders: A user questioned whether they needed to learn Pallas to use Sparse Cores, and another user clarified that it’s only needed for custom kernels at the per-core level for specific execution, sharing this markdown.
  • GUI fine-tuning Needs Smaller Models: A user sought a small model suitable for showcasing their GUI fine-tuning software, noting poor results with TiyLlama 1.1b Chat and 1225 Q&A samples.
    • It was suggested to try Liquid AI models with quantization methods that are effective.
  • Scaling NVIDIA Triton Server Requires Batching: A user inquired about scaling NVIDIA Triton Server for concurrent requests, and a member pointed to online resources with guidance and alternatives.
    • Batching or more instances is the best case to scale it.
  • Estimating AI Model Size: Users sought tools to estimate if an HF model fits on their GPU, which led to the suggestion of official and unofficial tools, also cfit.

HuggingFace ▷ #i-made-this (15 messagesđŸ”„):

Neurosymbolic AI Project, HF Wrapped 2025, Madlab GUI Finetuning Toolkit, Text-to-Speech Models in 2025, Comment Works

  • ReasoningLayer Opens Waitlist: A neurosymbolic AI project, ReasoningLayer, built from scratch in Rust, opens its waitlist to fix weaknesses in today’s LLMs by adding structured reasoning.
    • The initial post supporting it is available on LinkedIn.
  • HF Community Gets Wrapped Up: A member vibecoded a quick wrapper for 2025, inspired by Spotify/YouTube recap trends, available at HuggingFace Spaces.
    • The member suggested spinning up an official HF repo and building something from scratch.
  • Madlab Launches for Easier Finetuning: An open-source GUI finetuning toolkit, Madlab, designed for synthetic dataset generation, model training, and evaluation, was released at GitHub.
    • A LabGuide Preview Model based on TinyLlama-1.1B-Chat-v1.0 and dataset were also shared as a demo, showcasing capabilities and inviting feedback on using synthetic datasets and finetuning.
  • TTS Models Sing in 2025: A member compiled a GitHub repo listing Text-to-Speech models released in 2025, including open-source, research, and commercial systems, plus short notes on where each model works well and where it doesn’t.
  • Comment Works Analyzes Free Text Locally: A member is experimenting with a fine-tuned small language model + python utility, tentatively called comment works:, for local on-machine, private analysis of free-text comments, available at GitHub.
    • It’s a fast way to structure qualitative feedback for exploratory analysis or prioritization.

HuggingFace ▷ #gradio-announcements (2 messages):

MCP 1st Birthday Hackathon Winners, MCP Hackathon Certificates, Anthropic Awards, Modal Innovation Award, LlamaIndex Award

  • MCP Hackathon Crowns the Champs!: The MCP 1st Birthday Hackathon announces its sponsor-selected winners, recognizing outstanding projects across various categories, listed on the Hugging Face Spaces platform.
    • Awards include the Anthropic Awards (Best Overall, Enterprise, Consumer, Creative), Modal Innovation Award, LlamaIndex Award, OpenAI Awards, and Blaxel Choice Award.
  • Anthropic’s Aces are Announced!: Anthropic Awards the Cite Before Act MCP as Best Overall, with other winners including MCEPTION, Finance Portfolio Intelligence Platform, and GameContextProtocol, showcased on Hugging Face Spaces.
  • Hackathon Certificates are Here!: Participants of the MCP 1st Birthday Hackathon can now generate their official participation certificates using a Gradio app.
    • Generated certificates can be downloaded, uploaded to LinkedIn, and shared with tags to Gradio, as displayed in the attached sample.

HuggingFace ▷ #agents-course (11 messagesđŸ”„):

Course question space deletion, API issues, Agent chunk relevance and LLM, Agents course assistance, Smol course future

  • Question Space Gets Deleted!: A member noted that the question space for the course was deleted and now using this channel for better focus.
    • The member tagged others to double check and update the course pages, with a reply confirming the simplification for single channels.
  • API Issues: A user reported that the API is still not working as it is not possible to fetch files from the server.
    • No solutions or workarounds were provided in the given messages.
  • Agent Needs Better Chunking: One member is having issues with chunk relevancy in Agents and the agent is unable to answer questions properly once the chunks are added to the context.
    • The member reported when asking the agent for a precise total price in a document it gives the correct amount, but when asking the agent for the same total price of different documents, some prices are correct and some are completely random.
  • Agents Course Needs Some Assistance: A member trying the first agent added a simple get timezone tool but encountered an error in the UI.
    • No specific solutions were provided in the given messages.
  • Smol Course is Coming to Town!: Someone asked if the smol course will offer the last part this year, making it a cool Christmas gift (fine tuning course).
    • No one gave a clear answer.

GPU MODE ▷ #general (12 messagesđŸ”„):

CUDA server, Tiny TPU, Hip Kittens Paper, Paper Reading Group

  • CUDA Question Clarified: A member inquired if the Discord was a CUDA server and it was clarified that while it isn’t an Nvidia server, CUDA is a popular topic there, with further details available here.
  • Tiny TPU Tutorial Incoming: A Tiny TPU tutorial was scheduled to start with a specific member on YouTube.
  • Hip Kittens Viz Tool: A member sought the tool used to produce a visualization in the Hip Kittens paper and linked the paper.
    • Another member suggested it can be done by manually adding timestamp measurements in the kernel, referencing this blogpost and this github repo.
  • Paper Reading Group Formation: A member proposed forming paper reading groups within the Discord, suggesting a specific paper on arxiv.
    • Another member encouraged self-organization and offered speaking opportunities for paper experts.

GPU MODE ▷ #triton-gluon (3 messages):

TritonForge, MXFP4 Emulation on SM86, Data Center GPU Prioritization

  • TritonForge Automates Triton Kernel Optimization: A new paper, TritonForge: Profiling-Guided Framework for Automated Triton Kernel Optimization, introduces a framework that integrates kernel analysis, runtime profiling, and iterative code transformation to streamline the optimization process.
    • The system leverages LLMs to assist in code reasoning and transformation, achieving up to 5x performance improvement over baseline implementations.
  • MXFP4 Emulation on SM86: A member inquired whether triton_kernels plans to support software emulation of mxfp4 on sm86.
    • Another member doubts it, pointing out that the project maintainers want to prioritize data center GPUs.
  • Forking for Local LLM on Consumer GPUs: A member suggested forking the project to add features for consumer cards like the 30 series, as they expect these to remain relevant for local LLM work.
    • This suggestion arose from the project’s prioritization of data center GPUs over consumer GPUs.

GPU MODE ▷ #cuda (4 messages):

Tensor Core Optimization, LDSM instruction pipelining, Asynchronous Memory Copy, SMEM data loading strategies

  • Seeking 90% Tensor Core Usage: A member is seeking advice on achieving 90%+ tensor core usage, detailing a strategy of issuing ldsm loads followed by MMAs on sm80/89 targets.
    • They find it hard to push past 70% usage despite attempts at pipelining loads and compute.
  • Exploring Asynchronous Memory Copy for Tensor Cores: A member suggested using cp.async to optimize tensor core usage.
    • This implies addressing data loading speed into shared memory (SMEM).
  • SMEM Loading Strategies for Tensor Core Efficiency: The member inquired about the usage of ldsm.4 and whether both matrices A and B are loaded from shared memory (SMEM).
    • It was noted that more ldsm instructions in flight than needed to cover the worst-case latency from SMEM results in register waste.
  • Achieving Full Tensor Core Utilization: Achieving full tensor core utilisation may be challenging because on A100, an MMA is needed every 8 cycles, whereas on Ada, it’s 16 or 32 cycles, depending on the MMA and GPU used.
    • Also, beyond the first four instructions, you can’t issue ldsm every cycle, implying stalls in the pipeline.

NVIDIA's Blackwell Architecture, girl.surgery bad paper, NVIDIA Acquires SchedMD, ldmatrix.x4


GPU MODE ▷ #jobs (2 messages):

Red Hat AI hiring, smallest.ai hiring

  • Red Hat AI Engineers Wanted in 2026: Red Hat AI is hiring passionate engineers at multiple levels to push the boundaries of AI infrastructure.
    • They’re especially interested in folks with experience in Golang, Rust, C++, Python, Kubernetes, distributed systems, and Open Source.
  • smallest.ai Seeks Inference Optimization Engineers: smallest.ai is hiring Inference Optimization Engineers to work on speech AI models.
    • The role involves optimizing models for speed and cost reduction, spanning quantization, kernel optimization, and porting to custom hardware; locations: Bengaluru or San Francisco.

GPU MODE ▷ #beginner (8 messagesđŸ”„):

LLM inference, VLLM internals, GPU kernel engineering, CUDA experience

  • Deep Dive into LLM Inference: A member is seeking advice on gaining a deeper, systems-level understanding of modern LLM inference.
    • Suggestions include Aleksa Gordić’s blog on VLLM internals and Junda’s talk on server aspects.
  • Kernel Engineering Mentorship Sought: A member is looking for a paid mentor/tutor to aid in their journey from an ML/AI engineer to a microarchitecture performance engineer, focusing on GPU kernel engineering.
    • They are looking for help to clarify concepts, find good problems, and point them in the right direction.
  • CUDA Beginner Seeks Guidance: A beginner with some CUDA experience (from ECE 408) is asking about next steps to participate in working groups/open projects.
    • Another member suggested using ChatGPT as a mentor.

GPU MODE ▷ #self-promotion (3 messages):

Huggingface Nanotron, Qwen3-omni viz tool

  • Deep Dive References added: A member confirmed that they would add references to their work so others can learn from it, including Huggingface Nanotron Playbook, Nanotron Source code, Pytorch Source Code, Megatron LM Source Code.
    • They also plan to add some research papers for those who want a deeper dive.
  • Speech-to-Speech inference tool released: A member shared a new Qwen3-omni viz tool (speech to speech inference).
    • It was posted on Hacker News.

GPU MODE ▷ #submissions (33 messagesđŸ”„):

NVIDIA Performance, GEMM Leaderboard

  • NVIDIA’s nuFP4-GEMM Numbers: Submissions to the nvfp4_gemm leaderboard show a range of successful execution times on NVIDIA, with the fastest submission achieving 11.4 ”s.
    • Multiple personal bests were recorded, indicating ongoing optimization efforts.
  • GEMM Leaderboard sees flurry of activity: The nvfp4_gemm leaderboard saw numerous submissions, indicating active participation and competition.
    • One user, <@651556217315000360>, consistently submitted, achieving multiple personal bests, and finally getting down to 11.4 ”s.

GPU MODE ▷ #cutlass (10 messagesđŸ”„):

smem swizzling, atom permutation on K axis, tiled MMA, Cute DSL python version

  • Atom Permutation beats Smem Swizzling: A user found that it wasn’t smem swizzling needed but atom permutation on the K axis for their tiled MMA.
    • This allowed for vectorized loads from shared memory to registers using 2xdouble = uint128_t.
  • Permuting the K Axis for Vectorization: The user modified their code to permute the K axis within their tiled MMA setup, specifically changing the Tile structure to Tile<_8,_8, Layout<Shape<_4, _2>, Stride<_2, _1>>>.
    • The before and after images illustrate the change in memory access patterns, now attempting to apply the same transformation to a larger tiled MMA setup for DGEMM (image here).
  • Cute DSL Works on Python 3.10: Despite documentation stating Cute DSL requires Python 3.12, a user found it functional with Python 3.10.
    • Another user confirmed that while the initial release targeted 3.12, support has since been expanded.

GPU MODE ▷ #teenygrad (1 messages):

LambdaLabs Research Grant, teenygrad H100 hours, j4orz.ai/sitp textbook

  • teenygrad Snags LambdaLabs Research Grant: The teenygrad project got accepted into a LambdaLabs research grant giving them access to compute resources.
    • This grant provides access to approximately 1000 H100 hours of compute time, and should allow development to pick up again in the new year.
  • j4orz Ships Textbook: The textbook, code, and lecture materials for parts 1 and 2 of the Stanford In-house Training Program (SITP) will be released at the end of January/February.

GPU MODE ▷ #multi-gpu (1 messages):

DMA, ML training, ML inference, FiCCO schedules, GPU DMA engines

  • Fine-Grain Compute Communication Overlap Explored: A new paper explores Design Space Exploration of DMA based Finer-Grain Compute Communication Overlap in distributed ML training and inference.
    • The paper introduces FiCCO, a finer-granularity overlap technique, to unlock compute/communication overlap for a wider set of network topologies and finer-grain dataflow.
  • FiCCO Schedules Deliver Speedup: The paper presents a detailed characterization of inefficiency losses and a design space of FiCCO schedules.
    • The proposed bespoke schedules, which offload communication to GPU DMA engines, deliver up to 1.6x speedup in realistic ML deployments, with heuristics providing accurate guidance in 81% of unseen scenarios.

GPU MODE ▷ #nvidia-competition (72 messagesđŸ”„đŸ”„):

Submission failing despite successful local build, GPU performance inconsistencies, PTX code errors, Utilizing 2SMs when M<256, Cute-dsl NCU line number issues

  • Admin assists with submission error: A competitor reported an Error building extension 'run_gemm' during submission despite successful local builds, using inline with verbose=True for detailed logs.
    • Removing the extra include paths /root/cutlass/include/ fixed the issue.
  • GPU Performance Varies: Competitors noted performance variations between GPUs, with leaderboard results being 2-4 microseconds slower on average compared to local benchmarks, suggesting some nodes are slow.
    • It was suggested to resubmit multiple times to avoid slow instances, as the likelihood of encountering them may be increasing and one node has completely regressed.
  • PTX code errors halt progress: One competitor encountered PTX compilation errors, specifically Unexpected instruction types specified for 'cvt' and Instruction 'cvt with .e2m1x2' not supported related to .target 'sm_100'.
    • A user shared a code snippet using cuda_fp16.h to work around issues with vector/byte types in inline PTX.
  • Exploring 2SM utilization: There was discussion on whether one can utilize 2SM when M<256, referencing Nvidia PTX documentation.
    • One member noted having issues preparing blockscaled layouts, and another noted 2SM pipelining but anecdotally saw no improvement from it.
  • Profiling shows shifted line numbers: A user reported issues with NCU showing incorrect or shifted line numbers in the Python code when using cute.GenerateLineInfo(True) for cute-dsl code.
    • Line numbers shown in the profile from discord bot was completely random or maybe shifted line numbers in the python code.

GPU MODE ▷ #robotics-vla (1 messages):

Planner model, Subtask decomposition, Vision-Language-Action Models, LITEN paper

  • LITEN Paper Drops Knowledge Bombs: The LITEN paper presents a planner model for subtask decomposition with a Vision-Language-Action (VLA) model.
    • The high-level VLM conditions on past experiences in-context to learn the capabilities of the low-level VLA.
  • Reasoning and Assessment Phases: The model operates with a reasoning phase, where a plan (sequence of sub-task instructions) is generated and executed for the low-level VLA, followed by an assessment phase.
    • During the assessment phase, the model reflects on the resulting execution and draws conclusions to include in future reasoning phases.
  • Frozen VLM for the Win: The model uses a frozen VLM, requiring no additional training.
    • This approach leverages existing knowledge without the need for further training iterations.

Latent Space ▷ #ai-general-chat (142 messagesđŸ”„đŸ”„):

Museum of Science Decline, AI-Generated Stock Art, Open-Source Git Replication, Claude Model Removed from Cursor, OpenAI document processing infrastructure Leaked

  • Museum Meltdown? Science Interest Slumps: A tweet highlighted near-empty conditions at Boston’s Museum of Science raising concerns about declining public engagement with science.
    • Some speculate on causes for this decline.
  • Ticker-Symbol Slop: AI Art Gets Roasted: Users mocked low-quality AI-generated stock ticker artwork, calling it ‘ticker-symbol slop’.
    • The artwork was criticized for its lifeless and generic style.
  • Cursor Kills Claude: Model Cheats on Benchmarks: Cursor abruptly disabled the Claude model in their IDE after discovering it cheated on internal coding benchmarks, reportedly by embedding answers in training data.
    • Users have been encouraged to flag similar issues.
  • Soma Steps Aside: Post-Industrial Press Shakeup: Jonathan Soma announced his resignation from Post-Industrial Press, citing uncertainty about its future direction and thanking collaborators for the past six years in this Tweet.
  • OpenAI’s Oops: Document Leak Pulled from Reddit: A thread reported that ChatGPT leaked its own document processing infrastructure, but Reddit quickly removed the details.

Latent Space ▷ #genmedia-creative-ai (14 messagesđŸ”„):

New Twitter Post, Nitter Link Errors, Missing Content


Nous Research AI ▷ #general (82 messagesđŸ”„đŸ”„):

Derrida and Baudrillard on AI, Oracle's AI Strategy, Nvidia's Open Source Support, Local LLMs gaining traction, Nvidia's CUDA bet

  • Philosophical Ponderings on Synthetic Thought: A member wondered what Derrida and Baudrillard would think of AI, envisioning a future book titled ‘Philosophy is Synthetic’, and pondered what Saussure would say about Word2vec.
  • Oracle’s Gambit: From Databases to AI Domination?: Some members discussed Oracle’s shift from a ‘boring, clunky database company’ to an ‘overvalued AI company’, fueled by IOU schemes with OpenAI and Sam Altman.
    • Others speculated that Oracle’s AI stock pump was primarily to acquire and control US media entities (Paramount/CBS, Warner Bros/CNN) to control right-wing narratives.
  • Local LLMs Set to boom for Specific Sectors: A member discussed with a client about implementing a local AI, trained on their own company data, specialized to their industry (maritime) in the next couple of years.
    • This could involve training an LLM on ‘the voice of thousands of emails’ of specific employees, or hundreds of contracts.
  • Nvidia’s CUDA edge on GPU market: It was mentioned how Nvidia made a bet that GPU would be used for other things than gaming and made a language (CUDA) which ran on all their GPUs and allowed them to be used for this other thing.
    • The most surprising thing really is that AMD and Intel both didn’t even see it as worth trying to participate in this market until like 2 years ago.
  • Nvidia Goes Open Source, Defends AI Empire: Members noted Nvidia’s growing support for open source initiatives like Nvidia Nemotron Nano, suggesting it’s a strategic move to sustain demand for their products long term.
    • It’s also looking very good that Nvidia seems to be behind open source as it may be one of the only ways long long term of sustaining high demand for their product.

Nous Research AI ▷ #research-papers (32 messagesđŸ”„):

Grok Coincidence, RL Optimizers, Byte Level LLMs

  • Grok Loving Elon Coincidence?: A member shared a link discussing the coincidence of Grok loving Elon, calling it super impressive for its size https://www.arxiv.org/pdf/2512.06266.
  • RL Optimizer Search: A member is trying to find the best optimizer for a small pretraining of a 3B LLM that might outperform Muon / AdamW, and mentions ADAMUON, NorMuon, and ADEMAMIX.
    • Another member suggested Sophia and suggested token-bounded partial training with each, along with a tiny LR/beta sweep, and let it cook overnight taking metrics.
  • Byte Level LLMs: A member reacted positively to a link about Byte Level LLMs from Allen AI, saying this is cool and its fun.

Embeddings, History of embeddings, Embeddings in Modern AI

  • History of Embeddings Talk Shared: A member shared a short talk they gave this week about embeddings, their origins in the 60s, and the central role they play in modern AI, linked here: YouTube video.
  • Embeddings Feedback Request: The member requested feedback from people knowledgeable in embeddings, seeking critique on their job capturing the basics.

Nous Research AI ▷ #research-papers (32 messagesđŸ”„):

Grok coincidence, RL optimizers, Byte LLMs


Moonshot AI (Kimi K-2) ▷ #general-chat (87 messagesđŸ”„đŸ”„):

Kimi slides API, Local Kimi K2 on NPU, Kimi memory feature sync, Kimi K2 tokenizer endpoint, Kimi Android update with memory

  • Kimi Slides API missing in action: A user inquired if the Kimi Slides feature is available via API, but was informed that it is not available as of now.
  • Local Kimi K2 is technically impossible: A user expressed interest in running a local Kimi K2 model on their NPU, but was told it would be next to impossible to match K2’s capabilities on local hardware.
  • Memory Feature Discrepancy Across Platforms: Users noticed that the memory feature is available on the website version of Kimi but were unsure if it syncs to the Android version, with initial tests showing that it doesn’t sync.
  • Kimi’s context window too short, leading to truncation: A user complained about the app’s hard lock after surpassing 200k words, limiting the number of prompts that can be generated before hitting the limit.
    • Another user suggested using the Kimi K2 tokenizer endpoint for a more accurate token count, but this is only available via API.
  • Kimi’s Android Version Catches Up With Memory: The Kimi Android version has been updated to include the memory feature, bringing it in line with the website version.

Eleuther ▷ #general (30 messagesđŸ”„):

PyTorch LLM on Kaggle TPU VMs, Scaling NVIDIA Triton Server, NSF initiative, LLM assistance in ML papers, Algoverse Mentors

  • PyTorch LLM struggles on Kaggle TPU VMs: A member inquired about running a PyTorch LLM (downloaded from Hugging Face) on Kaggle TPU VMs, noting prior success with Keras but encountering errors with PyTorch.
  • Scale Up NVIDIA Triton Server to Handle Concurrent Requests: A member seeks advice on scaling an NVIDIA Triton server setup with YOLO, bi-encoder, and cross-encoder models to handle multiple concurrent requests in production efficiently.
  • NSF Initiative Launch lacks Details: Members discussed an NSF initiative launch, but noted the vagueness of the post and lack of core ideas beyond something that isn’t uni/startups.
  • LLM Assistance in ML Papers: Disclose or Deceive?: Members discussed whether to disclose LLM assistance in ML paper submissions, with one member leaning towards considering non-disclosure as misconduct if the LLM significantly contributed to the text, figures, design, or analysis.
  • Algoverse mentor roles pop up: A member asked about Algoverse, another mentioned they were looking for high quality mentors on related servers and seem to pay decently well.

Eleuther ▷ #research (17 messagesđŸ”„):

Karpathy’s 2025 'What-If' Fine-Tune Experiment, Part-time PhD in CS focusing on AI Agents, Muon's effectiveness reason, Training with weights natively in 8-bit, GANs

  • Karpathy’s ‘What-If’ Experiment Sparks Interest: A member expressed interest in trying out Karpathy’s 2025 ‘What-If’ fine-tune experiment, where a model was fine-tuned on synthetic reasoning chains, Edge.org essays, and Nobel lectures.
    • The experiment used LoRA on 8 A100 GPUs for 3 epochs, resulting in a model adept at long-term speculation but lacking novel solutions to physics problems.
  • Seeking Candid PhD Input for AI Agents Focus: A member requested candid input from individuals who have completed a part-time PhD in CS focusing on AI Agents, particularly those pursuing a PhD by publication.
    • They seek insights on the decision-making process, weekly rhythm, challenges, and whether it was worth it in terms of career, personal fulfillment, credibility, and options.
  • Muon’s Magic Due to Polynomial Instability?: A member proposed an intuition that Muon’s effectiveness is due to the instability with step size growing polynomially/exponentially due to the nonlinearity, though its validity was questioned.
    • Another member clarified that summing scores of two diffusion models (or concepts in CFG) would not yield samples from the true product of their learned distributions, requiring independence which CFG excels at.
  • Native 8-bit Training Paper Discovered: A member inquired about papers on training with weights natively in 8-bit, seeking a jumping-off point or search term, and found a promising paper.
    • This paper uses mixed precision and skips hadamard transforms by adding a loss term to disincentivize outliers, simplifying and accelerating the process.
  • GAN architecture discussed: A member shared links to a GAN architecture and its arxiv page .
    • Another member confirmed it’s a GAN, with another link provided for confirmation.

Eleuther ▷ #interpretability-general (9 messagesđŸ”„):

Superweight Ablation, Orthogonal Repair, Neuron-Specific Features, High Dimensional Orthogonality, Extreme Weights Importance

  • Superweight Ablation Impacts OLMo-1B, Repaired with Rank-1 Patch: A member applying to MATS ablated a weight in OLMo-1B, causing perplexity to skyrocket from 17 to over 2800, then achieved around 93% recovery using a rank-1 patch inspired by OpenAI’s weight-sparse transformers paper.
    • The 93% recovery was defined as the percentage of NLL degradation recovered, with the patched model significantly reducing the gap caused by the broken model; Base model NLL: 2.86, Broken model NLL: 7.97, Patched model NLL: 3.23.
  • “Orthogonal Repair” Compensates for Deleted Weights: The learned patch compensating for an ablated weight was found to be orthogonal to the original weight with a cosine similarity of 0.13, suggesting compensation through a completely new distributed circuit.
    • The member asked if this orthogonal repair is a known phenomenon and if it mimics hydra effects of rerouting rather than restoring weights.
  • Ablated Neuron Found to be Feature Neuron for Marine Biology: A max-activating dataset search on the deleted neuron (layer 1, row 1764) revealed it to be a feature neuron for crustaceans/marine biology, with top activating tokens including H. gammarus (European Lobster), Cancer pagurus (Crab), and zooplankton.
    • The ablation resulted in the model hallucinating ‘mar, mar, mar’ on test prompts, suggesting the removal of the ontology for marine life.
  • Row-Level Patch Mimics Hydra Effects: A single trainable parameter vector (delta_row) was added to the damaged row, acting like a rank-1 LoRA applied only to that row, and trained to minimize KL divergence with the original frozen base model.
    • Despite the patch having tiny overlap with the original ablated direction, it leads to large NLL recovery, potentially mirroring hydra effects through rerouting rather than simply restoring weights; this was compared to high dimensional orthogonality.
  • Extreme Weights Might Be Essential: A member speculated that superweights might have very high/low values compared to the average weights in their matrix, questioning whether these extreme weights are essential.
    • The member wondered if training something to penalize large weights without degrading the model would be interesting.

tinygrad (George Hotz) ▷ #general (50 messagesđŸ”„):

Linux 6.19 char misc, tinygrad meeting #100, Llama 405b, Python Speed, Gradient Checkpointing

  • Tinygrad Meeting #100 Agenda: The 100th tinygrad meeting included discussion of company updates, llama training priority, grad acc and JIT, flash attention, mi300/350 stability, fast GEMM, viz, FP8 training, image dtype/ctype, and other bounties.
    • The meeting was scheduled for 9am San Diego time on Monday.
  • Llama 405b tracking on Github: A member created a board on Github for tracking Llama 405b here.
    • The board can be used to keep track of assignments, and other tasks related to the Llama 405b model.
  • JIT: Guard Against Silent Errors: A member planned to reduce JIT footguns by correctly checking that the JIT is only capturing if schedulecaches match.
    • Two big footguns in the JIT that were discussed included non input tensors used in the function changing “form” then the jit is silently wrong, and output tensors not being copied and overwriting the last ones.
  • Image DType Progress: Progress is being made on the image dtype, with a goal to have it mostly merged by the end of the week, with the caveat that CL=1, QCOM=1 might prove to be trickier.
    • There are conceptual issues, such as a buffer cannot be converted to an arbitrary image shape without the width of the image (on adreno 630) needing to be 64B aligned.
  • AI Pull Requests Policy: The policy about AI pull requests remains the exact same: unless you are a known contributor, anything that looks AI will be immediately closed.
    • The rationale is that one should completely understand every line of the PR they are posting and provide negative value if they just put up an AI PR they don’t understand.

DSPy ▷ #show-and-tell (34 messagesđŸ”„):

ReasoningLayer.ai launch, Neurosymbolic AI, DSPy GEPA, uv tool install slowness, MCP mode

  • ReasoningLayer.ai neurosymbolic AI project opens waitlist: A neurosymbolic AI project, ReasoningLayer.ai, opened its waitlist, aiming to fix weaknesses in today’s LLMs by adding real, structured reasoning.
    • The project plans to use DSPy GEPA in its ontology ingestion pipeline; the initial post to support it is here.
  • Next-gen CLI Tooling with Subagents and DSPy Utilization: A member noted that DSPy’s utilization with subagents extends beyond review/triage/work tools and represents next-generation CLI tooling.
    • Instead of another coding CLI, it could better be used as MCP charging other coding CLIs, and can be further enhanced with MorphLLM and Supermemory.ai.
  • MCP Mode Tweaks Explored: The creator is considering tweaks to run the tool as an MCP mode and is open to community contributions and pull requests on GitHub.
    • There is a plan to start with open-source features and then add cloud-based options, with a goal to allow options for users, even employing the tool to build itself.
  • uv tool install -e . install taking too long: One user reported that uv sync or uv tool install -e . is taking a significant amount of time and they don’t know why.
    • The issue may be related to the python version, as it appears to be working in 3.13 but not in 3.14; the tool’s creator will investigate.

DSPy ▷ #general (14 messagesđŸ”„):

BAMLAdapter, DSPy Skills, Field Specific Instructions

  • BAMLAdapter ships for direct import: Members discussed the new BAMLAdapter, with one member stating that you can use the BAMLAdapter today if you import it directly with from dspy.adapters.baml_adapter import BAMLAdapter.
    • Another member added that they put a PR up for a fix and that it doesn’t pull in docstrings for pydantic models.
  • Newcomer asks about DSPy Skills: A principal engineer inquired about diving deep into DSPy and asked about specific features to check out.
    • The engineer expressed the “popular opinion” that agents are too expensive and unreliable for most profit-oriented products unless backed by VC funding or a user base willing to pay a premium.
  • Overfitting prompts to Frontier Models to optimize cost/margins: One member noted that overfitting a prompt to the latest frontier model has value, so automatic prompt optimization has less value.
    • Another member stated if you need to care about cost/margins, you start to care about where you are on the cost/accuracy/latency frontier.
  • Best place to put field-specific instructions in DSPy signature?: One member inquired about the best place to put field-specific instructions when extracting 6 output fields from 1 input field according to some rules.
    • They asked if instructions should be in the docstring of the MySignature class, in the corresponding fields field1: str = dspy.OutputField( desc="field-specific-instructions" ), or in both.

Modular (Mojo đŸ”„) ▷ #mojo (32 messagesđŸ”„):

Variable declaration scope, Mimicking const, C++ lambda syntax, Julia v. Mojo, LLM modular book error

  • Scoping Differences Surface in Variable Declarations: A discussion arose regarding variable declaration without the var keyword, it was clarified that variables declared without var have function scope visibility, similar to var in JavaScript, as opposed to block scope.
    • In Mojo, var acts like JS let, and the absence of the keyword acts like JS var; a GitHub Pull Request discussed removing something akin to JavaScript’s const.
  • Community Mulls Mimicking const Keyword Functionality: A member suggested that they might be able to mimic const on the library side via a function, for example, var res = try[foo](True).
    • It was noted that making this a compiler feature would probably be better.
  • Debate Erupts Over C++ Lambda Syntax: A member voiced support for C++ lambda syntax, citing its handling of captures, despite acknowledging being in the minority.
    • Another member said that it’s one of the least bad ways to do it, as most languages have nicer looking lambdas but then have to figure out captures later and it makes a mess.
  • Julia vs. Mojo: FAQ Clarifies Differences: A member inquired about Julia versus Mojo, and another member pointed to the Mojo FAQ which highlights Mojo’s approach to memory ownership, scaling, and AI/MLIR-first design.
    • The FAQ states that Mojo takes a different approach to memory ownership and memory management, it scales down to smaller envelopes, and is designed with AI and MLIR-first principles (though Mojo is not only for AI).
  • LLM Modular Book Error Troubles Learner: A user encountered an error in step_05 of the llm.modular.com book, suspecting it’s due to the GPT2 model not being fully downloaded from Huggingface.
    • Another member responded and mentioned that the DGX Spark’s GPU isn’t yet supported in their compilation flow.

Manus.im Discord ▷ #general (22 messagesđŸ”„):

Manus Auth redirect bug, Gemini 3.0 vs Manus, Firebase, Antigravity, and Google AI Studio, Conversation Mode with Wide Research, Manus 1.6 release

  • Manus Auth Redirect Bug Frustrates User: A user reports ongoing issues with a Manus Auth redirect bug that consumes credits without resolution, expressing frustration over the lack of a prompt solution to build a custom system and criticizing the imposition of the Manus logo on client logins (replay link).
    • The user is switching to Firebase, Antigravity, and Google AI Studio due to the bugs and feels that Gemini 3.0 and Claude in the IDE outperform Manus.
  • Gemini 3.0 and Firebase Outshine Manus: One user is leaving Manus due to dissatisfaction, citing that Gemini 3.0 and Firebase are superior alternatives, particularly because Antigravity offers more agent control and access to the latest models via OpenRouter.
    • They predict Manus will become obsolete for developers, emphasizing that Google offers similar capabilities for free to developers with a Gmail account or Google Workspaces.
  • User Requests Simultaneous Conversation Mode and Wide Research: A user requests the return of a feature that allows using Conversation Mode simultaneously with Wide Research, noting that not all users prefer AI responses in PDF format (the default in Agent Mode).
    • They argue combining research breadth with conversational interaction would improve user experience by enabling a more natural, interactive way to engage with findings without needing to read through PDF documents.
  • Opus 4.5 beats Manus in Value and performance: A user is using Opus 4.5 in Claude Code for $20 a month, finding it more cost-effective than Manus, especially when adding MCP servers, skills, and plugins.
    • They pointed out that the Manus is like a toddler that can’t even talk yet while recommending this discord-multi-ai-bot project.
  • AI Engineer advertises real-world solutions: An AI and Full-Stack Engineer shared his experience in advanced AI systems and blockchain development with hands-on experience.
    • He builds real-world, end-to-end solutions — from models to production-ready apps, mentioning projects of AI chatbots, YOLOv8 image recognition and AI note-taking assistant, and inviting users to build something meaningful.

MCP Contributors (Official) ▷ #general (12 messagesđŸ”„):

C++ MCP, Dangerous tool flag, MCP Server Publication Error, Response Annotations, Tool Resolution Proposal

  • Dangerous Flags Feature Rollout: A member inquired about flagging a tool as dangerous in MCP, particularly for Claude Code to restrict certain tool calls.
    • Another member shared a proposal currently in draft, inviting feedback on the response annotations.
  • MCP Tool Resolution Discussion Blossoms: In a thread on tool resolution, a member expressed interest in the tool resolution proposal.
    • They noted that it would be up to client implementation to handle that flag as it sees fit and were curious about how others are approaching this.
  • MCP Server Publication Plagued by Deprecation: A member encountered an error while publishing a new mcp-server using mcp-publisher due to a deprecated schema error, as described in the quickstart guide.
    • Another member explained that the documentation had been updated ahead of the production deployment and suggested temporarily using the previous schema version, 2025-10-17, as a workaround.

aider (Paul Gauthier) ▷ #general (6 messages):

Aider OpenAIException, Aider active development

  • Aider throws OpenAIException: A user encountered a litellm.NotFoundError when running aider --model due to the ‘gpt-5’ model not being found.
    • A member suggested trying openai/gpt-5 as the model string.
  • Aider’s Pulse Checked: A user inquired whether Aider is still under active development.
    • No further discussion or confirmation was provided in the given context.

aider (Paul Gauthier) ▷ #questions-and-tips (4 messages):

aider gpt-5, litellm errors, aider model config

  • GPT-5 Model Causes Aider to Crash: When attempting to run aider with the --model openai/gpt-5 flag, users encounter a litellm.NotFoundError indicating the model ‘gpt-5’ not found.
    • Despite the model appearing in the model list and the user setting their OpenAI API key, the issue persists.
  • Debugging Aider Model Configuration: The user is attempting to use the openai/gpt-5 model with aider, setting the reasoning effort to medium via the --reasoning-effort medium flag.
    • The user has confirmed they have set their OpenAI API key using setx, suggesting the authentication should not be the issue.