An Analysis of Order Dependence in k-NN

  • David McSherry
  • Christopher Stretch
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6206)

Abstract

In classification based on k-NN with majority voting, the class assigned to a given problem is the one that occurs most frequently in the k most similar cases (or instances) in the dataset. However, different versions of k-NN may use different strategies to select the cases on which the solution is based when there are ties for the kth most similar case. One strategy is to break ties for the kth most similar case based on the ordering of cases in the dataset. We present an analysis of the order dependence introduced by this strategy and its effects on the algorithm’s performance.

Keywords

classification k-NN instance-based learning case-based reasoning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory 1, 21–27 (1967)CrossRefMATHGoogle Scholar
  2. 2.
    Ripley, B.D.: Pattern Classification and Neural Networks. Cambridge University Press, Cambridge (1996)CrossRefMATHGoogle Scholar
  3. 3.
    Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)MATHGoogle Scholar
  4. 4.
    Aha, D.W.: The Omnipresence of Case-Based Reasoning in Science and Application. Knowledge-Based Systems 11, 261–273 (1998)CrossRefGoogle Scholar
  5. 5.
    Wu, X., Kumar, V., Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z.-H., Steinbach, M., Hand, D., Steinberg, D.: Top 10 Algorithms in Data Mining. Knowledge and Information Systems 14, 1–37 (2008)CrossRefGoogle Scholar
  6. 6.
    Brooks, A.D.: knnflex: A More Flexible KNN, http://cran.r-project.org/web/packages/knnflex
  7. 7.
    R Development Core Team: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2009)Google Scholar
  8. 8.
    Zhua, M., Chena, W., Hirdes, J., Stolee, P.: The K-Nearest Neighbor Algorithm Predicted Rehabilitation Potential Better than Current Clinical Assessment Protocol. Journal of Clinical Epidemiology 60, 1015–1021 (2007)CrossRefGoogle Scholar
  9. 9.
    Langley, P.: Order Effects in Incremental Learning. In: Reimann, P., Spada, H. (eds.) Learning in Humans and Machines: Towards an Interdisciplinary Learning Science. Elsevier, Oxford (1995)Google Scholar
  10. 10.
    Leake, D., Whitehead, M.: Case Provenance: The Value of Remembering Case Sources. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 194–208. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    McSherry, D.: Diversity-Conscious Retrieval. In: Craw, S., Preece, A.D. (eds.) ECCBR 2002. LNCS (LNAI), vol. 2416, pp. 219–233. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  12. 12.
    Cendrowska, J.: PRISM: an Algorithm for Inducing Modular Rules. International Journal of Man-Machine Studies 27, 349–370 (1987)CrossRefMATHGoogle Scholar
  13. 13.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2007)Google Scholar
  14. 14.
    Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: 14th International Joint Conference on Artificial Intelligence, pp. 1137–1143. Morgan Kaufmann, San Mateo (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • David McSherry
    • 1
  • Christopher Stretch
    • 1
  1. 1.School of Computing and Information EngineeringUniversity of UlsterColeraineNorthern Ireland

Personalised recommendations