Perceptron
A neuron receives signals from multiple inputs (inputs are weighted), and if the overall signal is above a threshold, the neural fires. A perceptron models this with:
- Inputs
- Parameters (weights)
, and - Output (activation function):
Sometimes, bias is represented as another weight
For this course,
A perceptron can be seen as a predicate: given a vector
The function partitions the input space into two sections: if there are two inputs, the decision boundary will be a straight line:
If there are three inputs, the decision boundary is a plane. For n dimensions, it will be a hyperplane.
The vector,
Learning
Given a data set - a collection of training vectors of the form
-
Randomly initialize the weights and bias
-
While there are mis-classifications and the number of iterations is less than the maximum number of epochs:
- For each training example, classify the example. If it is misclassified, then update the weights, where:
-
is the learning rate -
is the value for input -
is the actual output -
is the current prediction (perceptron output):
-
- Update the weights/bias using:
-
-
(the same equation can be used for bias as above if it is represented as a virtual input)
-
- For each training example, classify the example. If it is misclassified, then update the weights, where:
If examples are linearly separable, the weights and bias will, in finite time, converge to values that produce a perfect separation.
If
Multi-Layer Perceptrons
Motivation
y
^
|
| false true
|
| true false
|-----------------> x
A single perceptron cannot partition these four points into a decision boundary. However, this can be done with multi-layer perceptrons.
Define two perceptrons,
y P_1 y P_2
^ ^
| ----------- |
| false(0) | true(1) | false(1) true(1)
|---------- | -----------
| true(1) false(1) | true(1) | false(0)
|-------------------> x |--------------------> x
Now, pass the output of the perceptrons as input to another perceptron
P2 P_3
^
| --------
| (0, 1) | (1, 1) <- two points superimposed
| ----------
| (1, 0) |
|---------------------> P1
Now, this perceptron can form a decision boundary that correctly partitions the input space.
Description
The feed-forward networks we are dealing with arranges perceptrons into layers where:
- Adjacent layers are fully connected: all outputs from one layer are used as inputs for each perceptron in the next layer
- There are no backwards connections (DAG)
- Layers are not skipped
Some notes:
- The first layer contains only input nodes
- Hence, the number of input nodes is the number of inputs in the problem domain
- The last layer is called the output layer
- A network with only one layer is an identify function
- Weights and biases are between layers
- Between layer
and : - The number of weights is
- The number of biases is
.
- The number of weights is
- Between layer
- Layers between the input and output are called hidden layers
As more layers/neurons are added, the complexity of the boundary shape(s) can increase. If you have two dimensions:
- With 2 layers (one perceptron - the first layer contains the input nodes), a straight line can be formed
- With 3 layers, any polygon can be formed
- The number of perceptrons in the hidden layer determines the number of sides of the polygon
- If are two perceptrons, not a polygon but two intersecting lines
- With 4 layers, multiple polygons can be formed
- Polygons within polygons etc.
Multi-class Classification
This can be done by having multiple numeric outputs and picking the node with the largest value.
Outputting numeric values instead of a Boolean requires the Sigmoid function:
Error Function
The mean squared error is typically used, where
The weights can be updated incrementally:
Typical Architecture
The number of input nodes is determined by the number of attributes and the number of outputs is determined by the number of classes. A single hidden layer is enough for many classification tasks.
Guidelines:
- Use as few hidden layers/nodes as possible
- Forces better generalization
- Fewer weights need to be found, reducing training time and cost
- Too many nodes may lead to overfitting
- For the hidden layer
- Make a guess at how many nodes you need - a number between the number of input and output nodes
- If unsuccessful, increase the number of nodes
- Is successful, reduce the number of nodes to force better generalization