Machine Learning

, Volume 81, Issue 3, pp 229–256 | Cite as

Bayesian instance selection for the nearest neighbor rule

Article

Abstract

The nearest neighbors rules are commonly used in pattern recognition and statistics. The performance of these methods relies on three crucial choices: a distance metric, a set of prototypes and a classification scheme. In this paper, we focus on the second, challenging issue: instance selection. We apply a maximum a posteriori criterion to the evaluation of sets of instances and we propose a new optimization algorithm. This gives birth to Eva, a new instance selection method. We benchmark this method on real datasets and perform a multi-criteria analysis: we evaluate the compression rate, the predictive accuracy, the reliability and the computational time. We also carry out experiments on synthetic datasets in order to discriminate the respective contributions of the criterion and the algorithm, and to illustrate the advantages of Eva over the state-of-the-art algorithms. The study shows that Eva outputs smaller and more reliable sets of instances, in a competitive time, while preserving the predictive accuracy of the related classifier.

Keywords

Nearest neighbor Instance selection Voronoi tesselation Maximum a posteriori 

References

  1. Aha, D. W. (1992). Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36(2), 267–287. CrossRefGoogle Scholar
  2. Aha, D., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66. Google Scholar
  3. Asuncion, A., & Newman, D. (2007). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://www.ics.uci.edu/~mlearn/MLRepository.html.
  4. Berndt, D., & Clifford, J. (1996). Finding patterns in time series: a dynamic programming approach (Tech. rep.). Advances Knowledge Discovery Data Mining. Google Scholar
  5. Bhattacharya, B. K., Mukherjee, K., & Toussaint, G. T. (2005). Geometric decision rules for instance-based learning problems. In S. K. Pal, S. Bandyopadhyay, & S. Biswas (Eds.), Lecture notes in computer science : Vol. 3776. PReMI (pp. 60–69). Berlin: Springer. Google Scholar
  6. Brighton, H., & Mellish, C. (2002). Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery, 6(2), 153–172. MATHCrossRefMathSciNetGoogle Scholar
  7. Bunke, H. (2000). Recent developments in graph matching. In ICPR (pp. 2117–2124). Google Scholar
  8. Cameron-Jones, R. (1995). Instance selection by encoding length heuristic with random mutation hill climbing. In Proceedings of the eighth Australian joint conference on artificial intelligence (pp. 99–106). Google Scholar
  9. Chang, C. (1991). Finding prototypes for nearest neighbor classifiers. IEEE Transactions on Computers, 23(11), 1179–1184. CrossRefGoogle Scholar
  10. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 13, 21–27. MATHGoogle Scholar
  11. Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Berlin: Springer. MATHGoogle Scholar
  12. Duda, R., Hart, P., & Stork, D. (2001). Pattern classification. New York: Wiley. MATHGoogle Scholar
  13. Ferrandiz, S., & Boullé, M. (2006). Supervised evaluation of Voronoi partitions. Intelligent Data Analysis, 10(3), 269–284. Google Scholar
  14. Fix, E., & Hodges, J. (1951). Discriminatory analysis. Nonparametric discrimination: consistency properties (Technical Report 4). Project Number 21-49-004, USAF School of Aviation Medicine, Randolph Field, TX. Google Scholar
  15. Gates, G. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18(3), 431–433. CrossRefGoogle Scholar
  16. Grünwald, P., Myung, I., & Pitt, M. (2005). Advances in minimum description length: theory and applications. Cambridge: MIT Press. Google Scholar
  17. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. MATHCrossRefGoogle Scholar
  18. Hansen, P., & Mladenovic, N. (2001). Variable neighborhood search: principles and applications. European Journal of Operational Research, 130, 449–467. MATHCrossRefMathSciNetGoogle Scholar
  19. Hart, P. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515–516. CrossRefGoogle Scholar
  20. Jaromczyk, J., & Toussaint, G. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502–1517. CrossRefGoogle Scholar
  21. Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer. MATHGoogle Scholar
  22. Levenshtein, V. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10(8), 707–710. MathSciNetGoogle Scholar
  23. Liu, H., & Motoda, H. (2001). Instance selection and construction for data mining. Dordrecht: Kluwer. Google Scholar
  24. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In L. Cam, & Neyman (Eds.), Fifth Berkeley symposium on mathematical statistics and probability (pp. 281–297). Google Scholar
  25. Parzen, E. (1962). On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33(3), 1065–1076. MATHCrossRefMathSciNetGoogle Scholar
  26. Robert, C. (2001). The Bayesian choice: from decision-theoretic motivations to computational implementation. New York: Springer. Google Scholar
  27. Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 277–309. Google Scholar
  28. Sanchez, J. S., Pla, F., & Ferri, F. J. (1997). Prototype selection for the nearest neighbour rule through proximity graphs. Pattern Recognition Letters, 18(6), 507–513. CrossRefGoogle Scholar
  29. Scholkopf, B., & Smola, A. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press. Google Scholar
  30. Sebban, M., Nock, R., & Lallich, S. (2002). Stopping criterion for boosting-based data reduction techniques: from binary to multiclass problem. Journal of Machine Learning Research, 3, 863–885. CrossRefMathSciNetGoogle Scholar
  31. Stanfill, C., & Waltz, D. (1986). Toward memory-based reasoning. Communications of the ACM, 29, 1213–1228. CrossRefGoogle Scholar
  32. Toussaint, G. T., Bhattacharya, B., & Poulsen, R. (1985). The application of Voronoi diagrams to nonparametric decision rules. In Computer science and statistics: the interface (pp. 97–108). Google Scholar
  33. Toussaint, G. T., & Poulsen, R. (1975). Some new algorithms and software implementation methods for pattern recognition research. In Proceedings of the international computer software applications conference (pp. 55–63). Google Scholar
  34. Vapnik, V. (1996). The nature of statistical learning theory. New York: Springer. Google Scholar
  35. Wettschereck, D., & Dietterich, T. (1995). An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms. Machine Learning, 19(1), 5–27. Google Scholar
  36. Wilson, D. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics, 2, 408–421. MATHCrossRefGoogle Scholar
  37. Wilson, D., & Martinez, T. (1997a). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6(1), 1–34. MATHMathSciNetGoogle Scholar
  38. Wilson, D., & Martinez, T. (1997b). Instance pruning techniques. In D. Fisher (Ed.), Proceedings of the 14th international conference on machine learning (pp. 403–411). San Francisco: Morgan Kaufmann. Google Scholar
  39. Wilson, D., & Martinez, T. (2000). Reduction techniques for instance-based learning algorithms. Machine Learning, 38(3), 257–286. MATHCrossRefGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Orange LabsLannionFrance

Personalised recommendations