Notes: Neural Network Basics

Weighted inputs summed and put through activation function
AKA Transfer function
Maps summed weighted input to output of neuron
Historically used with threshold, ie. summed input > 0.5 then activate (binary output, 1.0 or 0.0)
Sigmoid function outputs value between 0.0 and 1.0 with S shaped distribution
Tanh function outputs S shaped dist. over -1 to 1

Networks

Often drawn one neuron per input datapoint. These just pass the value into the next layer, and do not have a transformative activation function

(Not directly exposed to input)
Deep learning can refer to a large number of hidden layers in the neural net being used

May have an activation function and output float values or a binary outcome
May have multiple neurons producing values that are then used for Multiple-Class Classification (ie. Iris problem)

Data must be numerical. Categorical data can be transformed into numerical data for this purpose
Data must be scaled in a consistent way. Often normalised between 0 and 1, or standardised to have a mean of 0 and standard deviation of 1.

Classical and still preferred training methodd
One row exposed to the NN at a time which processes it upwards/forwards (a forward pass).
Output of NN is compared to expected output, and error is calculated.
Error propagated back through the NN one layer at a time and weights are updated according to how much they contributed to the error.
This is called Backpropagation - Wikipedia
This is repeated for all rows of training data. One session of training is called an ‘Epoch’, and there may be many of these, up to thousands.

Backpropagation on every example is called ‘online learning’. This can result in very fast but also chaotic changes to the network.
Alternatively, errors can be saved up till the end of the Epoch and the NN can be updated at the end. This is called ‘Batch’ training and is often more stable
Might break up datasets into 10s or 100s for this purpose, for an intermediate solution
Amount by which weights are updated is controlled by config parameters called the ‘learning rate’ AKA ‘step size’. Often very small weight sizes are used such as 0.1 or 0.01 or smaller.
‘Momentum’ can be used to update weights in the same direction even when there is less error detected (getting closer).
Learning rate can be decayed over epochs to make larger changes at first and fine tune later, as part of a training schedule.