An Investigation of the Performance of Informative Samples Preservation Methods

  • Jianlin Xiong
  • Yuhua Li
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 124)

Abstract

Instance-based learning algorithms make prediction/generalization based on the stored instances. Storing all instances of large data size applications causes huge memory requirements and slows program execution speed; it may make the prediction process impractical or even impossible. Therefore researchers have made great efforts to reduce the data size of instance-based learning algorithms by selecting informative samples. This paper has two main purposes. First, it investigates recent developments in informative sample preservation methods and identifies five representative methods for use in this study. Second, the five selected methods are implemented in a standardized input-output interface so that the programs can be used by other researchers, their performance in terms of accuracy and reduction rates are compared on ten benchmark classification problems. K-nearest neighbor is employed as the classifier in the performance comparison.

Keywords

Instance-based learning subset selection pattern selection classification algorithms 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angiulli, F.: Fast nearest neighbor condensation for large data sets classification. IEEE Transactions on Knowledge and Data Engineering 19(11), 1450–1464 (2007)CrossRefGoogle Scholar
  2. 2.
    Hart, P.E.: The condensed nearest neighbor rule. IEEE Transactions on Information Theory 14, 515–516 (1968)CrossRefGoogle Scholar
  3. 3.
    Marchiori, E.: Hit miss networks with applications to instance selection. Journal of Machine Learning Research 9, 997–1017 (2008)MATHMathSciNetGoogle Scholar
  4. 4.
    Ritter, G.L., Woodruff, H.B., Lowry, S.R., Isenhour, T.L.: An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory 21, 665–669 (1975)CrossRefMATHGoogle Scholar
  5. 5.
    Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 6(6), 448–452 (1976)CrossRefMATHMathSciNetGoogle Scholar
  6. 6.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics 2(3), 408–421 (1972)CrossRefMATHGoogle Scholar
  7. 7.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38(3), 257–286 (2000)CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jianlin Xiong
    • 1
  • Yuhua Li
    • 1
  1. 1.School of Computing and Intelligent SystemsUniversity of UlsterLondonderryUnited Kingdom

Personalised recommendations