# Introduction

The element-wise operations have been discussed in the last post Maths in a Neural Network: Element-wise. This post will focus on how to represent all the equations found in the previous post in vectors.

# 1 Feed-forward

Let’s consider a 2-3-2 network.

## 1.1 Element-wise operations

$a^{(l)}_p = f^{(l)}(z^{(l)}_p)\quad (1)$

$z^{(l)}_p = \sum_q w^{(l)}_{(qp)} a^{(l-1)}_q \quad (2)$

## 1.2 Vectorization

Applying equation (1):

$A^{(2)}= f^{(2)}(Z^{(2)})$

Applying equation (2):

$\begin{array}{lcl} \begin{bmatrix}z^{(2)}_0 \\z^{(2)}_1 \end{bmatrix} & = & \begin{bmatrix} w^{(2)}_{00} a^{(1)}_0 + w^{(2)}_{10} a^{(1)}_1 + w^{(2)}_{20} a^{(1)}_2 \\ w^{(2)}_{01} a^{(1)}_0 + w^{(2)}_{11} a^{(1)}_1 + w^{(2)}_{21} a^{(1)}_2 \\ \end{bmatrix} \\ \begin{bmatrix}z^{(2)}_0 \\z^{(2)}_1 \end{bmatrix} & = & \begin{bmatrix} w^{(2)}_{00} & w^{(2)}_{01} \\ w^{(2)}_{10} & w^{(2)}_{11} \\ w^{(2)}_{20} & w^{(2)}_{21} \\ \end{bmatrix} ^T \begin{bmatrix} a^{(1)}_0 \\ a^{(1)}_1 \\ a^{(1)}_2 \end{bmatrix} \\ Z^{(2)} & = & W^{(2) T} A^{(1)} \end{array}$

These can be generalized for more than 3 layers:

$A^{(l)}= f^{(l)}(Z^{(l)})\quad (3)$

$Z^{(l)} = W^{(l) T} A^{(l-1)} \quad (4)$

# 2 Weight Update

Let’s consider a 2-3-2 network.

## 2.1 Element-wise operations

For output layer:

$w^{(2)'}_{jk} = w^{(2)}_{jk} - \alpha \frac{\partial e_k}{\partial w^{(2)}_{jk}} \quad (5)$

$\frac{\partial e_k}{\partial w^{(2)}_{jk}} = \frac{\partial e_k}{\partial a^{(2)}_k} \frac{\partial a^{(2)}_k}{\partial z^{(2)}_k} \frac{\partial z^{(2)}_k}{\partial w^{(2)}_{jk}} = \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) a^{(1)}_j \quad (6)$

For hidden layers:

$w^{(1)'}_{qp} = w^{(0)}_{ij} - \alpha \sum_k \frac{\partial e_k}{\partial w^{(1)}_{ij}} \quad (7)$

$\sum_k\frac{\partial e_k}{\partial w^{(1)}_{ij}} = \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} \frac{\partial a^{(2)}_k}{\partial z^{(2)}_k} \frac{\partial z^{(2)}_k}{\partial a^{(1)}_j} \frac{\partial a^{(1)}_j}{\partial z^{(1)}_j} \frac{\partial z^{(1)}_j}{\partial w^{(1)}_{ij}} = \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(1)}_{jk} f^{(1)'}(z^{(1)}_j) a^{(0)}_i \quad (8)$

## 2.1 Vectorization

For output layer:

Lets vectorize the weight update equation, applying equation (5):
$\begin{bmatrix} w^{(2)'}_{00} & w^{(2)'}_{01} \\ w^{(2)'}_{10} & w^{(2)'}_{11} \\ w^{(2)'}_{20} & w^{(2)'}_{21} \\ \end{bmatrix} = \begin{bmatrix} w^{(2)}_{00} & w^{(2)}_{01} \\ w^{(2)}_{10} & w^{(2)}_{11} \\ w^{(2)}_{20} & w^{(2)}_{21} \\ \end{bmatrix} - \alpha \begin{bmatrix} \frac{\partial e_0}{\partial w^{(2)}_{00}} & \frac{\partial e_1}{\partial w^{(2)}_{01}} \\ \frac{\partial e_0}{\partial w^{(2)}_{10}} & \frac{\partial e_1}{\partial w^{(2)}_{11}} \\ \frac{\partial e_0}{\partial w^{(2)}_{20}} & \frac{\partial e_1}{\partial w^{(2)}_{21}} \\ \end{bmatrix}\\$

Let’s represent the weight as a mitrix and apply equation (6):

$W^{(2)'} = W^{(2)} - \alpha \begin{bmatrix} \frac{\partial e_0}{\partial a^{(2)}_0} f^{'(2)}(z^{(2)}_0) a^{(1)}_0 && \frac{\partial e_1}{\partial a^{(2)}_1} f^{'(2)}(z^{(2)}_1) a^{(1)}_0 \\ \frac{\partial e_0}{\partial a^{(2)}_0} f^{'(2)}(z^{(2)}_0) a^{(1)}_1 && \frac{\partial e_1}{\partial a^{(2)}_1} f^{'(2)}(z^{(2)}_1) a^{(1)}_1 \\ \frac{\partial e_0}{\partial a^{(2)}_0} f^{'(2)}(z^{(2)}_0) a^{(1)}_2 && \frac{\partial e_1}{\partial a^{(2)}_1} f^{'(2)}(z^{(2)}_1) a^{(1)}_2 \end{bmatrix}$

$W^{(2)'} = W^{(2)} - \alpha A^{(1)} (E^{'}(A^{(2)}) \diamond f^{'(2)}(Z^{(2)}))^T$

Where:

$E^{'}(A^{(2)}) = \begin{bmatrix} \frac{\partial e_0}{\partial a^{(2)}_0} \\ \frac{\partial e_1}{\partial a^{(2)}_1} \end{bmatrix}$

Define backprogated value delta:

$\delta^{(2)} = E^{'}(A^{(L)}) \diamond f^{'(L)}(Z^{(L)}) \quad (9)$

$W^{(2)'} = W^{(2)} - \alpha A^{(1)} \delta^{(2)T} \quad (10)$

For hidden layers:

Applying equation (7):

$\begin{bmatrix} w^{(1)'}_{00} & w^{(1)'}_{01} && w^{(1)'}_{02} \\ w^{(1)'}_{10} & w^{(1)'}_{11} && w^{(1)'}_{12}\\ \end{bmatrix} = \begin{bmatrix} w^{(1)}_{00} & w^{(1)}_{01} && w^{(1)}_{02} \\ w^{(1)}_{10} & w^{(1)}_{11} && w^{(1)}_{12}\\ \end{bmatrix} - \alpha \begin{bmatrix} \sum_k \frac{\partial e_k}{\partial w^{(1)}_{00}} && \sum_k \frac{\partial e_k}{\partial w^{(1)}_{01}} && \sum_k \frac{\partial e_k}{\partial w^{(1)}_{02}} \\ \sum_k \frac{\partial e_k}{\partial w^{(0)}_{10}} && \sum_k \frac{\partial e_k}{\partial w^{(1)}_{11}} && \sum_k \frac{\partial e_k}{\partial w^{(1)}_{12}} \end{bmatrix}$

Let’s represent the weight as a mitrix and apply equation (8):

$W^{(1)'} = W^{(1)} - \alpha \begin{bmatrix} \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(2)}_{0k} f^{(1)'}(z^{(1)}_0) a^{(0)}_0 && .... && \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(2)}_{2k} f^{(1)'}(z^{(1)}_2) a^{(0)}_0 \\ \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(2)}_{0k} f^{(1)'}(z^{(1)}_0) a^{(0)}_1 && ... && \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(2)}_{2k} f^{(1)'}(z^{(1)}_2) a^{(0)}_1 \end{bmatrix}$

$W^{(1)'} = W^{(1)} - \alpha A^{(0)} (f^{(1)'}(Z^{(1)}) \diamond W^{(2)} E^{'}(A^{(2)}) \diamond f^{(2)'}(Z^{(2)})^T$

$W^{(1)'} = W^{(1)} - \alpha A^{(0)} (f^{(1)'}(Z^{(1)}) \diamond W^{(2)} \delta^{(2)})^T$

Define backprogated value delta:

$\delta^{(1)} = f^{(1)'}(Z^{(1)}) \diamond W^{(2)} \delta^{(2)} \quad (11)$

$W^{(1)'} = W^{(1)} - \alpha A^{(0)} \delta^{(1)T} \quad (12)$

These can be generalized for more than 3 layers, total L – 1 layers:

Output layer:

$\delta^{(L)} = E^{'}(A^{(L)}) \diamond f^{'(L)}(Z^{(L)}) \quad (13) \\ W^{(L)'} = W^{(L)} - \alpha A^{(L-1)} (\delta^{(L)})^T\quad (14)$

Hidden layer:

$\delta^{(l)} = f^{(l)'}(Z^{(l)}) \diamond W^{(l+1)} \delta^{(l+1)}\quad (15) \\ W^{(l)'} = W^{(l)} - \alpha A^{(l-1)} \delta^{(l)T} \quad (16)$

# 3 Summary

## 3.1 Feed-forward:

$A^{(l)}= f^{(l)}(Z^{(l)})$

$Z^{(l)} = W^{(l) T} A^{(l-1)}$

## 3.2 Weight update:

Output layer:

$\delta^{(L)} = E^{'}(A^{(L)}) \diamond f^{'(L)}(Z^{(L)}) \\ W^{(L)'} = W^{(L)} - \alpha A^{(L-1)} (\delta^{(L)})^T$

Hidden layer:

$\delta^{(l)} = f^{(l)'}(Z^{(l)}) \diamond W^{(l+1)} \delta^{(l+1)} \\ W^{(l)'} = W^{(l)} - \alpha A^{(l-1)} \delta^{(l)T}$

# Next

1. Maths in a Neural Network: Element-wise
2. Maths in a Neural Network: Vectorization
3. Code a Neural Network with Numpy
4. Maths in a Neural Network: Batch Training