Learning: improving behavior based experience. This could be:
- The range of behaviors increasing
- The accuracy of its tasks increasing
- The speed at which it executes its tasks is faster
Components of a learning problem:
- Task: the behavior/task being improved e.g. classification
- Data: experiences used to improve the performance of its tasks
- Measure of improvement: a way of measuring the performance/improvement e.g. accuracy of classification
Learning architecture:
- The learner is fed experiences/data and background knowledge/bias
- The model is what does the reasoning/prediction
- The reasoner is fed the problem/task, and outputs an answer/performance
Supervised Learning
Given the following as input:
- A set of input attributes/features/random variables:
- A target feature
(discrete class value or continuous/real value) - what is being predicted - A set of training examples/instances where the value of the input and target variables are given
This is fed to a learning algorithm to build a predictive model that takes a new instance and returns/predicts the value for the target feature.
For continuous target variables, regression is used.
Measuring performance
Common performance measures:
In binary classification problems, one class is called positive (p) and the other negative (n).
Common performance measures for regression:
- Mean squared error (MSE):
- Mean absolute error
Training and Test Sets
A set (multi-set) of examples is divided into training and test examples; this is required as the model can overfit the training data, giving high performance on the training data but low performance on unseen data.
The more complex the model is, the lower the error on the training data.
A general pattern is that at a certain complexity, increasing the complexity of the model increases the error on the test data.
Naïve Bayes Model
Where:
- Features
are independent given the class variable -
: prior distribution of -
: likelihood conditional distributions -
: posterior distribution
Conditional probabilities can be estimated from labelled data.
Find
Problem: hard to learn
Thus, assume input features are conditionally independent given the class model.
Example: Building a Classifier
Determine if the patient is susceptible to heart disease (y/n) given family history (t/f), fasting blood sugar level (l/h), BMI (l, n, h).
Model it as
The class can take two values, so there are two tables per feature and two rows for
NB: in the quiz, you only store value for when class is true.
To calculate
Laplace Smoothing
Zero counts in small data sets lead to zero probabilities - this is too strong a claim based on only a sample.
To fix this, add a non-negative pseudo-count to the counts - this can reduce the confidence.
- e.g.
is the number of examples in the dataset where and -
is the number of examples in the dataset
Given these:
This is equivalent to:
The greater the pseudo-count is, the closer the probabilities will even out (closer to
Parametric vs Non-Parametric Models
Parametric models described with a set of parameters; learning means finding the optimal values for these parameters.
Non-parametric models are not characterized by parameters - a family of this is called instance-based learning:
- Instance-based learning is based on memorization of the dataset
- The cost of learning is 0; all the cost is in the computation of the prediction
- It is called lazy-learning: learning is put off until it is required
k-Nearest Neighbors
An example of a instance-based learning algorithm is k-nearest neighbors:
- It uses the local neighborhood to obtain a prediction - the k memorized examples most similar to the one being classified is retrieved
- A distance function is used to compare similarity (e.g. Euclidean or Manhattan distance)
- If the distance function is changed, how examples are classified changes
Training only requires storing all the examples.
Prediction:
- Let
be the k most similar examples to -
; given the k nearest neighbors to , calculate which value it should have
If k is too high, it will be under-fit.
Geometrically, each data point