Notes on k-nearest neighbours

Dec, 1 2012.

In this post the $k$ -nearest neighbour algorithm is briefly introduced.

The principle of the $k$ -nearest neighbour algorithm is that instances belonging to the same class will generally exist in close proximity to one another within the $r$ dimensional space, where each dimension corresponds to one of the $r$ attributes in the feature vector $\vec{x}$ . The multidimensional space is populated by the reference set of labelled training instances $(\vec{x}^{(t)},y^{(t)})$ . A distance measure $D(\vec{x}^{(t)},\tilde{\vec{x}})$ is used to measure the proximity between each instance in the reference set and an unlabelled instance $\tilde{\vec{x}}$ . The unlabelled instance is classified by observing the class labels of its closest $k$ neighbours within the reference set, and assigning a class label by popularity vote or distance weighted popularity vote over the set of observed class labels.

The figure shows how the $k$ -nearest neighbour algorithm performs classification. Within the two-dimensional feature space a $3$ -nearest neighbour algorithm will classify instance $\tilde{\vec{x}}_1$ as belonging to group $A$ . This classification is done by observing that all three closest training set instances have the class label $A$ . The instance $\tilde{\vec{x}}_2$ is classified as belonging to group $B$ because two out of three neighbouring training instances have the class label $B$ .

Lazy supervised learning

The $k$ -nearest neighbour algorithm is called a lazy supervised learning technique because classification is based directly on the reference set, and not on an induced function $h$ . Therefore, classification using this technique can be computationally and memory intensive, especially when the reference set is large. Considering that the computational and memory requirements of the $k$ -nearest neighbour algorithm grow with the number of instances in the reference set, several instance selection methods can be applied to reduce the size of the reference set ¹. Instance selection methods, also known as prototyping methods, try to reduce the number of instances used in the reference set below that of the original training set. Reducing the size of the reference set not only decreases the computational and memory requirements of the $k$ -nearest neighbour algorithm but also filters out the noise in the original training set ².

Prototyping to reduce computational and memory requirements

Prototyping assumes that a small number of instances can be found to represent the entire set of training instances. In general, one prototype per class can be constructed by averaging over all the example instances belonging to the same class. Alternatively, multiple prototypes can be constructed by looking for subsets of homogeneous instances within the set of instances belonging to the same class. Averaging over each identified subset produces a prototype. Clustering, an unsupervised learning technique, can be applied here to find homogeneous groups of instances. An introduction to clustering techniques can be found in³.

Single hyper-paramater

The $k$ -nearest neighbour algorithm has the advantage of only having one hyper-parameter, the number of nearest neighbours $k$ , that needs to be optimized. However, it has been shown that this supervised learning technique is intolerant to noisy data, intolerant to redundant features and sensitive to the choice of similarity measure used². Therefore, though the $k$ -nearest neighbour algorithm is very simple to understand and implement, its performance varies considerably across implementations.

Notes

Originally published as part of Wilgenbus, E.F., 2013. The file fragment classification problem: a combined neural network and linearprogramming discriminant model approach. Masters thesis, North West University.

References

Reinartz, T., 2002. A unifying view on instance selection. Data Mining and Knowledge Discovery 6 (2), 191-210. ↩
Aha, D. W., Kibler, D., Albert, M. K., 1991. Instance-based learning algorithms. Machine learning 6 (1), 37-66. ↩ ↩²
Nisbet, R., Elder, J., Miner, G., 2009. Handbook of Staistical Analysis and Data Mining Applications. Academic Press/Elsevier. ↩

Lazy supervised learning

Prototyping to reduce computational and memory requirements

Single hyper-paramater

Notes

References

Footnotes