Skip to main content

Part of the book series: Advances in Soft Computing ((AINSC,volume 28))

Summary

In data mining, the k-Nearest-Neighbours (kNN) method for classification is simple and effective [3, 1]. The success of kNN in classification is dependent on the selection of a “good” value for k, so in a sense kNN is biased by k. However, it is unclear what is a universally good value for k.

We propose to solve this choice-of-k issue by an alternative formalism which uses a sequence of values for k. Each value for k defines a neighbourhood for a data record — a set of k nearest neighbours, which contains some degree of support for each class with respect to the data record. It is our aim to select a set of neighbourhoods and aggregate their supports to create a classifier less biased by k. in print To this end we use a probability function G, which is defined in terms of a mass Junction for events weighted by a measurement of events. A mass function is an assignment of basic probability to events.

In the case of classification, events can be interpreted as neighbourhoods, and the mass function can be interpreted in terms of class proportions in neighbourhoods. Therefore, a mass function represents degrees of support for a class in various neighbourhoods.

We show that under this specification G is a linear function of the conditional probability of classes given a data record, which can be used directly for classification. Based on these findings we propose a new classification procedure.

Experiment shows that this classification procedure is indeed less biased by k, and that it displays a saturating property as the number of neighbourhoods increases. Experiment further shows that the performance of our classification procedure at saturation is comparable to the best performance of kNN.

Consequently, when we use kNN for classification we do not need to be concerned with k; instead, we need to select a set of neighbourhoods and apply the procedure presented here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atkeson, C. G., Moore, A. W., and Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11(1–5):11–73.

    Article  Google Scholar 

  2. Han, J. and Kamber, M. (2000). Data Mining: Concepts and Techniques. Morgan Kaufmann.

    Google Scholar 

  3. Hand, D., Mannila, H., and Smyth, P. (2001). Principles of Data Mining. The MIT Press.

    Google Scholar 

  4. Smets, P. and Kennes, R. (1994). The transferable belief model. Artificial Intelligence, 66(2):191–234.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, H., Düntsch, I., Gediga, G., Guo, G. (2005). Nearest Neighbours without k . In: Monitoring, Security, and Rescue Techniques in Multiagent Systems. Advances in Soft Computing, vol 28. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-32370-8_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-32370-8_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23245-2

  • Online ISBN: 978-3-540-32370-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics