# Nearest Neighbours without *k*

## Summary

In data mining, the k-Nearest-Neighbours (kNN) method for classification is simple and effective [3, 1]. The success of kNN in classification is dependent on the selection of a “good” value for *k*, so in a sense kNN is biased by *k*. However, it is unclear what is a universally good value for *k*.

We propose to solve this *choice-of-k* issue by an alternative formalism which uses a sequence of values for *k*. Each value for *k* defines a *neighbourhood* for a data record — a set of *k* nearest neighbours, which contains some degree of support for each class with respect to the data record. It is our aim to select a set of neighbourhoods and aggregate their supports to create a classifier less biased by *k*. in print To this end we use a probability function *G*, which is defined in terms of a *mass Junction* for *events* weighted by a measurement of events. A mass function is an assignment of basic probability to events.

In the case of classification, events can be interpreted as neighbourhoods, and the mass function can be interpreted in terms of class proportions in neighbourhoods. Therefore, a mass function represents degrees of support for a class in various neighbourhoods.

We show that under this specification *G* is a linear function of the conditional probability of classes given a data record, which can be used directly for classification. Based on these findings we propose a new classification procedure.

Experiment shows that this classification procedure is indeed less biased by *k*, and that it displays a saturating property as the number of neighbourhoods increases. Experiment further shows that the performance of our classification procedure at saturation is comparable to the best performance of kNN.

Consequently, when we use kNN for classification we do not need to be concerned with *k;* instead, we need to select a set of neighbourhoods and apply the procedure presented here.

## Key words

k-nearest neighbour data mining and knowledge discovery Dempster-Shafer theory contextual probability pignistic probability## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Atkeson, C. G., Moore, A. W., and Schaal, S. (1997). Locally weighted learning.
*Artificial Intelligence Review*, 11(1–5):11–73.CrossRefGoogle Scholar - 2.Han, J. and Kamber, M. (2000).
*Data Mining: Concepts and Techniques*. Morgan Kaufmann.Google Scholar - 3.Hand, D., Mannila, H., and Smyth, P. (2001).
*Principles of Data Mining*. The MIT Press.Google Scholar - 4.Smets, P. and Kennes, R. (1994). The transferable belief model.
*Artificial Intelligence*, 66(2):191–234.zbMATHMathSciNetCrossRefGoogle Scholar