符号表示:
$ w_{kj}^l:l-1层j节点到l层k节点的权重;z_k^l: l层k节点的输入值;\alpha_k^l: l层k节点的输出 $
$ z^l=w^la^{l-1}+b^l; a^l=\sigma(z^l) $
① BP网络前向传播计算
$ z^l = w^la^{l-1}+b^l $
$ a^l = \sigma(z^l) = \frac{1}{1 + e^{-z^l}} $
② BP网络后向传播求偏导
$ C=\frac{1}{2n} \sum_x |y(x)-a^L(x)|^2 $
$$\Delta W_{kj}^l = \frac{\partial C}{\partial w_{kj}^l} = \frac{\partial C}{\partial z^l_k} * \frac{\partial z^l_k}{\partial w^l_{kj}} =
\delta^l_k \frac{\partial z^l_k}{\partial w^l_{kj}}$$
其中,$\delta^l_k \equiv \frac{\partial C}{\partial z^l_k}$
又, $\frac{\partial z^l_k}{\partial w^l_{kj}} = a^{l-1}_j$
所以,$\Delta W^l_{kj} = \delta^l_k a^{l-1}_j$
同理,$\Delta b^l_k = \delta^l_k$
$$ \delta^l_k = \frac{\partial C}{\partial z^l_k} = \sum_j \frac{\partial C}{\partial z^{l+1}_j} \frac{\partial z^{l+1}_j}{\partial z^l_k} =
\sum_j \delta^{l+1}_j \frac{\partial z^{l+1}_j}{\partial z^l_k} $$
又,$ \frac{\partial z^{l+1}_j}{\partial z^l_k} = w^{l+1}_{jk} \sigma^\prime(z^l_k) $
所以,
$ \delta^l_k = \sum_j \delta^{l+1}_j w^{l+1}_{jk} \sigma^\prime(z^l_k) $
$ \Downarrow $ vectorize
$ \delta^l = (w^{l+1})^T \delta^{l+1} \bigodot \sigma^\prime(z^l) $
对于最后一层L层,
$ \delta^L = \frac{\partial C}{\partial z^L} = (y - a^L) \bigodot a^\prime $
Backpropagation Algorithm
- Input x: Set the corresponding activetion $a$ for the input layer.
- Feedforward: For each l=2, 3, …, L compute $z^l = w^la^{l-1} + b^l$ and $a^l = \sigma(z^l)$.
- Output Error $\delta^L$: Compute the vector $\delta^L = \Delta_aC \bigodot \sigma\prime(z^L)$.
- Backpropagate the error: For each l=L-1, L-2, …, 2 compute $\delta^l = (w^{l+1})^T \delta^{l+1} \bigodot \sigma\prime(z^l)$.
- Output: The gradient of the cost function in given by $\frac{\partial C}{\partial w_{kj}^l} = a^{l-1}_j \delta_k^l$ and $ \frac{\partial C}{\partial b_k^l} = \delta_k^l$.