Nearest Neighbours without k

Wang, Hui; Düntsch, Ivo; Gediga, Günther; Guo, Gongde

doi:10.1007/3-540-32370-8_12

Hui Wang⁶,
Ivo Düntsch⁷,
Günther Gediga⁷ &
…
Gongde Guo⁶

Part of the book series: Advances in Soft Computing ((AINSC,volume 28))

545 Accesses
1 Citations

Summary

In data mining, the k-Nearest-Neighbours (kNN) method for classification is simple and effective [3, 1]. The success of kNN in classification is dependent on the selection of a “good” value for k, so in a sense kNN is biased by k. However, it is unclear what is a universally good value for k.

We propose to solve this choice-of-k issue by an alternative formalism which uses a sequence of values for k. Each value for k defines a neighbourhood for a data record — a set of k nearest neighbours, which contains some degree of support for each class with respect to the data record. It is our aim to select a set of neighbourhoods and aggregate their supports to create a classifier less biased by k. in print To this end we use a probability function G, which is defined in terms of a mass Junction for events weighted by a measurement of events. A mass function is an assignment of basic probability to events.

In the case of classification, events can be interpreted as neighbourhoods, and the mass function can be interpreted in terms of class proportions in neighbourhoods. Therefore, a mass function represents degrees of support for a class in various neighbourhoods.

We show that under this specification G is a linear function of the conditional probability of classes given a data record, which can be used directly for classification. Based on these findings we propose a new classification procedure.

Experiment shows that this classification procedure is indeed less biased by k, and that it displays a saturating property as the number of neighbourhoods increases. Experiment further shows that the performance of our classification procedure at saturation is comparable to the best performance of kNN.

Consequently, when we use kNN for classification we do not need to be concerned with k; instead, we need to select a set of neighbourhoods and apply the procedure presented here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atkeson, C. G., Moore, A. W., and Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11(1–5):11–73.
Article Google Scholar
Han, J. and Kamber, M. (2000). Data Mining: Concepts and Techniques. Morgan Kaufmann.
Google Scholar
Hand, D., Mannila, H., and Smyth, P. (2001). Principles of Data Mining. The MIT Press.
Google Scholar
Smets, P. and Kennes, R. (1994). The transferable belief model. Artificial Intelligence, 66(2):191–234.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, University of Ulster at Jordanstown, Northern Ireland, BT37 0QB, UK
Hui Wang & Gongde Guo
Department of Computer Science, Brock University, St. Catherines, Ontario, L2S 3AI, Canada
Ivo Düntsch & Günther Gediga

Authors

Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ivo Düntsch
View author publications
You can also search for this author in PubMed Google Scholar
Günther Gediga
View author publications
You can also search for this author in PubMed Google Scholar
Gongde Guo
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Düntsch, I., Gediga, G., Guo, G. (2005). Nearest Neighbours without k . In: Monitoring, Security, and Rescue Techniques in Multiagent Systems. Advances in Soft Computing, vol 28. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32370-8_12

Download citation

DOI: https://doi.org/10.1007/3-540-32370-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23245-2
Online ISBN: 978-3-540-32370-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics