BP算法

符号表示:

$ w_{kj}^l:l-1层j节点到l层k节点的权重;z_k^l: l层k节点的输入值;\alpha_k^l: l层k节点的输出 $

$ z^l=w^la^{l-1}+b^l; a^l=\sigma(z^l) $

① BP网络前向传播计算

$ z^l = w^la^{l-1}+b^l $

$ a^l = \sigma(z^l) = \frac{1}{1 + e^{-z^l}} $

② BP网络后向传播求偏导

$ C=\frac{1}{2n} \sum_x |y(x)-a^L(x)|^2 $

$$\Delta W_{kj}^l = \frac{\partial C}{\partial w_{kj}^l} = \frac{\partial C}{\partial z^l_k} * \frac{\partial z^l_k}{\partial w^l_{kj}} =
\delta^l_k \frac{\partial z^l_k}{\partial w^l_{kj}}$$

其中,$\delta^l_k \equiv \frac{\partial C}{\partial z^l_k}$

又, $\frac{\partial z^l_k}{\partial w^l_{kj}} = a^{l-1}_j$

所以,$\Delta W^l_{kj} = \delta^l_k a^{l-1}_j$

同理,$\Delta b^l_k = \delta^l_k$

$$ \delta^l_k = \frac{\partial C}{\partial z^l_k} = \sum_j \frac{\partial C}{\partial z^{l+1}_j} \frac{\partial z^{l+1}_j}{\partial z^l_k} =
\sum_j \delta^{l+1}_j \frac{\partial z^{l+1}_j}{\partial z^l_k} $$

又,$ \frac{\partial z^{l+1}_j}{\partial z^l_k} = w^{l+1}_{jk} \sigma^\prime(z^l_k) $

所以,
$ \delta^l_k = \sum_j \delta^{l+1}_j w^{l+1}_{jk} \sigma^\prime(z^l_k) $
$ \Downarrow $ vectorize
$ \delta^l = (w^{l+1})^T \delta^{l+1} \bigodot \sigma^\prime(z^l) $

对于最后一层L层,
$ \delta^L = \frac{\partial C}{\partial z^L} = (y - a^L) \bigodot a^\prime $

Backpropagation Algorithm

  1. Input x: Set the corresponding activetion $a$ for the input layer.
  2. Feedforward: For each l=2, 3, …, L compute $z^l = w^la^{l-1} + b^l$ and $a^l = \sigma(z^l)$.
  3. Output Error $\delta^L$: Compute the vector $\delta^L = \Delta_aC \bigodot \sigma\prime(z^L)$.
  4. Backpropagate the error: For each l=L-1, L-2, …, 2 compute $\delta^l = (w^{l+1})^T \delta^{l+1} \bigodot \sigma\prime(z^l)$.
  5. Output: The gradient of the cost function in given by $\frac{\partial C}{\partial w_{kj}^l} = a^{l-1}_j \delta_k^l$ and $ \frac{\partial C}{\partial b_k^l} = \delta_k^l$.