Advertisement

Geometric Decision Rules for Instance-Based Learning Problems

  • Binay Bhattacharya
  • Kaustav Mukherjee
  • Godfried Toussaint
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3776)

Abstract

In the typical nonparametric approach to classification in instance-based learning and data mining, random data (the training set of patterns) are collected and used to design a decision rule (classifier). One of the most well known such rules is the k-nearest neighbor decision rule (also known as lazy learning) in which an unknown pattern is classified into the majority class among the k-nearest neighbors in the training set. This rule gives low error rates when the training set is large. However, in practice it is desired to store as little of the training data as possible, without sacrificing the performance. It is well known that thinning (condensing) the training set with the Gabriel proximity graph is a viable partial solution to the problem. However, this brings up the problem of efficiently computing the Gabriel graph of large training data sets in high dimensional spaces. In this paper we report on a new approach to the instance-based learning problem. The new approach combines five tools: first, editing the data using Wilson-Gabriel-editing to smooth the decision boundary, second, applying Gabriel-thinning to the edited set, third, filtering this output with the ICF algorithm of Brighton and Mellish, fourth, using the Gabriel-neighbor decision rule to classify new incoming queries, and fifth, using a new data structure that allows the efficient computation of approximate Gabriel graphs in high dimensional spaces. Extensive experiments suggest that our approach is the best on the market.

Keywords

Decision Rule Voronoi Diagram Decision Boundary Proximity Graph Gabriel Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bhattacharya, B., Kaller, D.: Reference set thinning for the k-nearest neighbor decision rule. In: Proceedings of the 14th International Conference on Pattern Recognition, vol. 1 (1998)Google Scholar
  2. 2.
    Bhattacharya, B.K.: Application of computational geometry to pattern recognition problems. Ph.d. thesis, School of Computer Science, McGill University (1982)Google Scholar
  3. 3.
    Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, Heidelberg (1996)zbMATHGoogle Scholar
  5. 5.
    Houle, M.: SASH: A spatial approximation sample hierarchy for similarity search. Tech. Report RT-0517, IBM Tokyo Research Laboratory (2003)Google Scholar
  6. 6.
    Kulkarni, S.R., Lugosi, G., Venkatesh, S.S.: Learning pattern classification - a survey. IEEE Transactions on Information Theory 44, 2178–2206 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Merz, C.J., Murphy, P.M.: UCI repository of machine learning database, Internet, University of California. Department of Information and Computer Science, http://www.ics.uci.edu/mlearn/MLRepository.html
  8. 8.
    Mukherjee, K.: Application of the gabriel graph to instance-based learning. M.sc. project, School of Computing Science, Simon Fraser University (2004)Google Scholar
  9. 9.
    Psaltis, D., Snapp, R.R., Venkatesh, S.S.: On the finite sample performance of the nearest neighbor classifier. IEEE Transactions on Information Theory 40, 820–837 (1994)zbMATHCrossRefGoogle Scholar
  10. 10.
    Toussaint, G.T., Bhattacharya, B.K., Poulsen, R.S.: The application of Voronoi diagrams to nonparametric decision rules. In: Computer Science and Statistics: The Interface, Atlanta, pp. 97–108 (1985)Google Scholar
  11. 11.
    Toussaint, G.T., Poulsen, R.S.: Some new algorithms and software implementation methods for pattern recognition research. In: Proc. IEEE Int. Computer Software Applications Conf., Chicago, pp. 55–63 (1979)Google Scholar
  12. 12.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man and Cybernetics 2, 408–420 (1972)zbMATHCrossRefGoogle Scholar
  13. 13.
    Randall Wilson, D., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38, 257–286 (2000)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Binay Bhattacharya
    • 1
  • Kaustav Mukherjee
    • 1
  • Godfried Toussaint
    • 2
  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  2. 2.School of Computer ScienceMcGill UniversityMontréalCanada

Personalised recommendations