You have a single hidden-layer neural network for a binary classification task. The input is \(X \in \mathbb{R}^{n \times m}\), output \(\hat{y} \in \mathbb{R}^{1 \times m}\), and true label \(y \in \mathbb{R}^{1 \times m}\). The forward propagation equations are: \[ \begin{align*} z^{[1]} & = W^{[1]}X + b^{[1]} \\ a^{[1]} & = \sigma(z^{[1]}) \\ \hat{y} & = a^{[1]} \\ J & = -\frac{1}{m} \sum_{i=1}^{m} \left( y^{(i)} \log(\hat{y}[i]) + (1 - y^{(i)}) \log(1 - \hat{y}[i]) \right) \end{align*} \] Write the expression for \(\frac{\partial J}{\partial W^{[1]}}\) as a matrix product of two terms.
A) $\frac{\partial J}{\partial W^{[1]}} = X \cdot (\hat{y} - y)^T$
B) $\frac{\partial J}{\partial W^{[1]}} = (\hat{y} - y) \cdot X^T$
C) $\frac{\partial J}{\partial W^{[1]}} = X^T \cdot (\hat{y} - y)$
D) $\frac{\partial J}{\partial W^{[1]}} = (\hat{y} - y) \cdot \sigma'(z^{[1]}) \cdot X^T$