250 views
0 votes
0 votes
You have a single hidden-layer neural network for a binary classification task. The input is \(X \in \mathbb{R}^{n \times m}\), output \(\hat{y} \in \mathbb{R}^{1 \times m}\), and true label \(y \in \mathbb{R}^{1 \times m}\). The forward propagation equations are: \[ \begin{align*} z^{[1]} & = W^{[1]}X + b^{[1]} \\ a^{[1]} & = \sigma(z^{[1]}) \\ \hat{y} & = a^{[1]} \\ J & = -\frac{1}{m} \sum_{i=1}^{m} \left( y^{(i)} \log(\hat{y}[i]) + (1 - y^{(i)}) \log(1 - \hat{y}[i]) \right) \end{align*} \] Write the expression for \(\frac{\partial J}{\partial W^{[1]}}\) as a matrix product of two terms.

 

 

A) $\frac{\partial J}{\partial W^{[1]}} = X \cdot (\hat{y} - y)^T$

B) $\frac{\partial J}{\partial W^{[1]}} = (\hat{y} - y) \cdot X^T$

C) $\frac{\partial J}{\partial W^{[1]}} = X^T \cdot (\hat{y} - y)$

D) $\frac{\partial J}{\partial W^{[1]}} = (\hat{y} - y) \cdot \sigma'(z^{[1]}) \cdot X^T$

1 Answer

0 votes
0 votes
as per my calculation answer is C)  , let me know if it is wrong

Related questions

232
views
1 answers
0 votes
rajveer43 asked Jan 27
232 views
What is Error Analysis?(i) The process of analyzing the performance of a model through metrics such as precision, recall or F1-score.(ii) The process of scanning mis-clas...
563
views
1 answers
0 votes
rajveer43 asked Jan 14
563 views
Suppose you have a three-class problem where class label \( y \in \{0, 1, 2\} \), and each training example \( \mathbf{X} \) has 3 binary attributes \( X_1, X_2, X_3 \in ...
341
views
1 answers
0 votes
rajveer43 asked Jan 13
341 views
After applying a regularization penalty in linear regression, you find that some of the coefficients of $w$ are zeroed out. Which of the following penalties might have be...
173
views
0 answers
0 votes
rajveer43 asked Jan 13
173 views
Using the same data as above \( \mathbf{X} = [-3, 5, 4] \) and \( \mathbf{Y} = [-10, 20, 20] \), assuming a ridge penalty \( \lambda = 50 \), what ratio versus the MLE es...