I am not saying to expand the minimized expression, I am just saying to minimize the number of NAND gates required to perform complementation. For example, in the case of XOR, after performing double complementation, as you pointed out, we will require 5 NAND gates if we consider the expression directly.
2 NAND gates for A' and B' explicitly, but we can work around this problem by trying to avoid using A' and B' and to use (AB)', which brings down the total number of gates to 4.
actually, I think it depends on expression to expression, in the particular case of XOR the case is such that we can work around that, but not so much, for every other case.