# Introduction

The element-wise operations have been discussed in the last post Maths in a Neural Network: Element-wise. This post will focus on how to represent all the equations found in the previous post in vectors.

# 1 Feed-forward

Let’s consider a 2-3-2 network.

## 1.1 Element-wise operations $a^{(l)}_p = f^{(l)}(z^{(l)}_p)\quad (1)$ $z^{(l)}_p = \sum_q w^{(l)}_{(qp)} a^{(l-1)}_q \quad (2)$

## 1.2 Vectorization

Applying equation (1): $A^{(2)}= f^{(2)}(Z^{(2)})$

Applying equation (2): $\begin{array}{lcl} \begin{bmatrix}z^{(2)}_0 \\z^{(2)}_1 \end{bmatrix} & = & \begin{bmatrix} w^{(2)}_{00} a^{(1)}_0 + w^{(2)}_{10} a^{(1)}_1 + w^{(2)}_{20} a^{(1)}_2 \\ w^{(2)}_{01} a^{(1)}_0 + w^{(2)}_{11} a^{(1)}_1 + w^{(2)}_{21} a^{(1)}_2 \\ \end{bmatrix} \\ \begin{bmatrix}z^{(2)}_0 \\z^{(2)}_1 \end{bmatrix} & = & \begin{bmatrix} w^{(2)}_{00} & w^{(2)}_{01} \\ w^{(2)}_{10} & w^{(2)}_{11} \\ w^{(2)}_{20} & w^{(2)}_{21} \\ \end{bmatrix} ^T \begin{bmatrix} a^{(1)}_0 \\ a^{(1)}_1 \\ a^{(1)}_2 \end{bmatrix} \\ Z^{(2)} & = & W^{(2) T} A^{(1)} \end{array}$

These can be generalized for more than 3 layers: $A^{(l)}= f^{(l)}(Z^{(l)})\quad (3)$ $Z^{(l)} = W^{(l) T} A^{(l-1)} \quad (4)$

# 2 Weight Update

Let’s consider a 2-3-2 network.

## 2.1 Element-wise operations

For output layer: $w^{(2)'}_{jk} = w^{(2)}_{jk} - \alpha \frac{\partial e_k}{\partial w^{(2)}_{jk}} \quad (5)$ $\frac{\partial e_k}{\partial w^{(2)}_{jk}} = \frac{\partial e_k}{\partial a^{(2)}_k} \frac{\partial a^{(2)}_k}{\partial z^{(2)}_k} \frac{\partial z^{(2)}_k}{\partial w^{(2)}_{jk}} = \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) a^{(1)}_j \quad (6)$

For hidden layers: $w^{(1)'}_{qp} = w^{(0)}_{ij} - \alpha \sum_k \frac{\partial e_k}{\partial w^{(1)}_{ij}} \quad (7)$ $\sum_k\frac{\partial e_k}{\partial w^{(1)}_{ij}} = \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} \frac{\partial a^{(2)}_k}{\partial z^{(2)}_k} \frac{\partial z^{(2)}_k}{\partial a^{(1)}_j} \frac{\partial a^{(1)}_j}{\partial z^{(1)}_j} \frac{\partial z^{(1)}_j}{\partial w^{(1)}_{ij}} = \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(1)}_{jk} f^{(1)'}(z^{(1)}_j) a^{(0)}_i \quad (8)$

## 2.1 Vectorization

For output layer:

Lets vectorize the weight update equation, applying equation (5): $\begin{bmatrix} w^{(2)'}_{00} & w^{(2)'}_{01} \\ w^{(2)'}_{10} & w^{(2)'}_{11} \\ w^{(2)'}_{20} & w^{(2)'}_{21} \\ \end{bmatrix} = \begin{bmatrix} w^{(2)}_{00} & w^{(2)}_{01} \\ w^{(2)}_{10} & w^{(2)}_{11} \\ w^{(2)}_{20} & w^{(2)}_{21} \\ \end{bmatrix} - \alpha \begin{bmatrix} \frac{\partial e_0}{\partial w^{(2)}_{00}} & \frac{\partial e_1}{\partial w^{(2)}_{01}} \\ \frac{\partial e_0}{\partial w^{(2)}_{10}} & \frac{\partial e_1}{\partial w^{(2)}_{11}} \\ \frac{\partial e_0}{\partial w^{(2)}_{20}} & \frac{\partial e_1}{\partial w^{(2)}_{21}} \\ \end{bmatrix}\\$

Let’s represent the weight as a mitrix and apply equation (6): $W^{(2)'} = W^{(2)} - \alpha \begin{bmatrix} \frac{\partial e_0}{\partial a^{(2)}_0} f^{'(2)}(z^{(2)}_0) a^{(1)}_0 && \frac{\partial e_1}{\partial a^{(2)}_1} f^{'(2)}(z^{(2)}_1) a^{(1)}_0 \\ \frac{\partial e_0}{\partial a^{(2)}_0} f^{'(2)}(z^{(2)}_0) a^{(1)}_1 && \frac{\partial e_1}{\partial a^{(2)}_1} f^{'(2)}(z^{(2)}_1) a^{(1)}_1 \\ \frac{\partial e_0}{\partial a^{(2)}_0} f^{'(2)}(z^{(2)}_0) a^{(1)}_2 && \frac{\partial e_1}{\partial a^{(2)}_1} f^{'(2)}(z^{(2)}_1) a^{(1)}_2 \end{bmatrix}$ $W^{(2)'} = W^{(2)} - \alpha A^{(1)} (E^{'}(A^{(2)}) \diamond f^{'(2)}(Z^{(2)}))^T$

Where: $E^{'}(A^{(2)}) = \begin{bmatrix} \frac{\partial e_0}{\partial a^{(2)}_0} \\ \frac{\partial e_1}{\partial a^{(2)}_1} \end{bmatrix}$

Define backprogated value delta: $\delta^{(2)} = E^{'}(A^{(L)}) \diamond f^{'(L)}(Z^{(L)}) \quad (9)$ $W^{(2)'} = W^{(2)} - \alpha A^{(1)} \delta^{(2)T} \quad (10)$

For hidden layers:

Applying equation (7): $\begin{bmatrix} w^{(1)'}_{00} & w^{(1)'}_{01} && w^{(1)'}_{02} \\ w^{(1)'}_{10} & w^{(1)'}_{11} && w^{(1)'}_{12}\\ \end{bmatrix} = \begin{bmatrix} w^{(1)}_{00} & w^{(1)}_{01} && w^{(1)}_{02} \\ w^{(1)}_{10} & w^{(1)}_{11} && w^{(1)}_{12}\\ \end{bmatrix} - \alpha \begin{bmatrix} \sum_k \frac{\partial e_k}{\partial w^{(1)}_{00}} && \sum_k \frac{\partial e_k}{\partial w^{(1)}_{01}} && \sum_k \frac{\partial e_k}{\partial w^{(1)}_{02}} \\ \sum_k \frac{\partial e_k}{\partial w^{(0)}_{10}} && \sum_k \frac{\partial e_k}{\partial w^{(1)}_{11}} && \sum_k \frac{\partial e_k}{\partial w^{(1)}_{12}} \end{bmatrix}$

Let’s represent the weight as a mitrix and apply equation (8): $W^{(1)'} = W^{(1)} - \alpha \begin{bmatrix} \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(2)}_{0k} f^{(1)'}(z^{(1)}_0) a^{(0)}_0 && .... && \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(2)}_{2k} f^{(1)'}(z^{(1)}_2) a^{(0)}_0 \\ \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(2)}_{0k} f^{(1)'}(z^{(1)}_0) a^{(0)}_1 && ... && \sum_k \frac{\partial e_k}{\partial a^{(2)}_k} f^{(2)'}(z^{(2)}_k) w^{(2)}_{2k} f^{(1)'}(z^{(1)}_2) a^{(0)}_1 \end{bmatrix}$ $W^{(1)'} = W^{(1)} - \alpha A^{(0)} (f^{(1)'}(Z^{(1)}) \diamond W^{(2)} E^{'}(A^{(2)}) \diamond f^{(2)'}(Z^{(2)})^T$ $W^{(1)'} = W^{(1)} - \alpha A^{(0)} (f^{(1)'}(Z^{(1)}) \diamond W^{(2)} \delta^{(2)})^T$

Define backprogated value delta: $\delta^{(1)} = f^{(1)'}(Z^{(1)}) \diamond W^{(2)} \delta^{(2)} \quad (11)$ $W^{(1)'} = W^{(1)} - \alpha A^{(0)} \delta^{(1)T} \quad (12)$

These can be generalized for more than 3 layers, total L – 1 layers:

Output layer: $\delta^{(L)} = E^{'}(A^{(L)}) \diamond f^{'(L)}(Z^{(L)}) \quad (13) \\ W^{(L)'} = W^{(L)} - \alpha A^{(L-1)} (\delta^{(L)})^T\quad (14)$

Hidden layer: $\delta^{(l)} = f^{(l)'}(Z^{(l)}) \diamond W^{(l+1)} \delta^{(l+1)}\quad (15) \\ W^{(l)'} = W^{(l)} - \alpha A^{(l-1)} \delta^{(l)T} \quad (16)$

# 3 Summary

## 3.1 Feed-forward: $A^{(l)}= f^{(l)}(Z^{(l)})$ $Z^{(l)} = W^{(l) T} A^{(l-1)}$

## 3.2 Weight update:

Output layer: $\delta^{(L)} = E^{'}(A^{(L)}) \diamond f^{'(L)}(Z^{(L)}) \\ W^{(L)'} = W^{(L)} - \alpha A^{(L-1)} (\delta^{(L)})^T$

Hidden layer: $\delta^{(l)} = f^{(l)'}(Z^{(l)}) \diamond W^{(l+1)} \delta^{(l+1)} \\ W^{(l)'} = W^{(l)} - \alpha A^{(l-1)} \delta^{(l)T}$

# Next

1. Maths in a Neural Network: Element-wise
2. Maths in a Neural Network: Vectorization
3. Code a Neural Network with Numpy
4. Maths in a Neural Network: Batch Training