The K-Nearest Neighbour algorithm is similar to the Nearest Neighbour algorithm, except that it looks at the closest K instances to the unclassified instance. The class of the new instance is then given by the class with the highest frequency of those K instances. This is useful because the influence of anomalous instances is reduced.

Try this out below. If you diagnose 5 No's then the diagnosis will be 'Strepthroat', compared with the diagnosis of 'Allergy' with the standard Nearest Neighbour algorithm.

Choosing K K = 1 will be the same as nearest neighbour, as it only looks at the 1st closest. K = N (where N is the number of training instances) would be bad because it would base the classification on the class frequency of all the instances, not just the closest ones. So there must be an optimal value of K. Try changing K to see what happens.

Patient ID Sore Throat Fever Swollen Glands Congestion Headache Diagnosis Distance
1 Yes Yes Yes Yes Yes Strepthroat
2 No No No Yes Yes Allergy
3 Yes Yes No Yes No Cold
4 Yes No Yes No No Strepthroat
5 No Yes No Yes No Cold
6 No No No Yes No Allergy
7 No No Yes No No Strepthroat
8 Yes No No Yes Yes Allergy
9 No Yes No Yes Yes Cold
10 Yes Yes No Yes Yes Cold

View Page Source

Back to Data Mining

It's not what you know, it's