Soft Computing

, Volume 21, Issue 19, pp 5703–5715 | Cite as

An experimental study on rank methods for prototype selection

  • Jose J. Valero-Mas
  • Jorge Calvo-Zaragoza
  • Juan R. Rico-Juan
  • José M. Iñesta
Methodologies and Application

Abstract

Prototype selection is one of the most popular approaches for addressing the low efficiency issue typically found in the well-known k-Nearest Neighbour classification rule. These techniques select a representative subset from an original collection of prototypes with the premise of maintaining the same classification accuracy. Most recently, rank methods have been proposed as an alternative to develop new selection strategies. Following a certain heuristic, these methods sort the elements of the initial collection according to their relevance and then select the best possible subset by means of a parameter representing the amount of data to maintain. Due to the relative novelty of these methods, their performance and competitiveness against other strategies is still unclear. This work performs an exhaustive experimental study of such methods for prototype selection. A representative collection of both classic and sophisticated algorithms are compared to the aforementioned techniques in a number of datasets, including different levels of induced noise. Results report the remarkable competitiveness of these rank methods as well as their excellent trade-off between prototype reduction and achieved accuracy.

Keywords

k-Nearest Neighbour Data reduction Prototype selection Rank methods 

References

  1. Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464CrossRefGoogle Scholar
  2. Brighton H, Mellish C (1999) On the consistency of information filters for lazy learning algorithms. In: Zytkow J, Rauch J (eds) Principles of data mining and knowledge discovery. Lecture notes in computer science, vol 1704. Springer, Berlin, pp 283–288CrossRefGoogle Scholar
  3. Calvo-Zaragoza, J., Oncina, J.: Recognition of pen-based music notation: the HOMUS dataset. In: Proceedings of the 22nd international conference on pattern recognition. Stockholm, Sweden, pp 3038–3043 (2014)Google Scholar
  4. Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2016) Prototype generation on structural data using dissimilarity space representation. Neural Comput Appl. doi:10.1007/s00521-016-2278-8
  5. Cano J, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in kdd: an experimental study. IEEE Trans Evol Comput 7(6):561–575. doi:10.1109/TEVC.2003.819265 CrossRefGoogle Scholar
  6. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27CrossRefMATHGoogle Scholar
  7. Dasarathy BV, Sánchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools-synergy exploitation. Pattern Anal Appl 19–30 (2000)Google Scholar
  8. Derrac J, Cornelis C, García S, Herrera F (2012) Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Inf Sci 186(1):73–92. doi:10.1016/j.ins.2011.09.027 CrossRefGoogle Scholar
  9. Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, Upper Saddle RiverGoogle Scholar
  10. Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkMATHGoogle Scholar
  11. Eshelman LJ (1990) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: Proceedings of the first workshop on foundations of genetic algorithms. Bloomington Campus, Indiana, pp 265–283Google Scholar
  12. Freeman H (1961) On the encoding of arbitrary geometric configurations. In: IRE transactions on electronic computers EC-10(2), pp 260–268. doi:10.1109/TEC.1961.5219197
  13. Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435. doi:10.1109/TPAMI.2011.142 CrossRefGoogle Scholar
  14. García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, intelligent systems reference library, vol 72. Springer, Cham (2015). doi:10.1007/978-3-319-10247-4
  15. García-Pedrajas N, De Haro-García A (2014) Boosting instance selection algorithms. Knowl Based Syst 67:342–360. doi:10.1016/j.knosys.2014.04.021 CrossRefGoogle Scholar
  16. Gates G (1972) The reduced nearest neighbor rule (corresp.). IEEE Trans Inf Theory 18(3):431–433. doi:10.1109/TIT.1972.1054809 CrossRefGoogle Scholar
  17. Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory 14(3):515–516CrossRefGoogle Scholar
  18. Hull J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554. doi:10.1109/34.291440 CrossRefGoogle Scholar
  19. Nanni L, Lumini A (2011) Prototype reduction techniques: a comparison among different approaches. Exp Syst Appl 38(9):11820–11828. doi:10.1016/j.eswa.2011.03.070 CrossRefGoogle Scholar
  20. Natarajan N, Dhillon I, Ravikumar P, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204 (2013)Google Scholar
  21. Pekalska E, Duin RP, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39(2):189–208. doi:10.1016/j.patcog.2005.06.012 (Part Special Issue: Complexity Reduction)
  22. Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660Google Scholar
  23. Ritter G, Woodruff H, Lowry S, Isenhour T (2006) An algorithm for a selective nearest neighbor decision rule (corresp.). IEEE Trans Inf Theory 21(6):665–669. doi:10.1109/TIT.1975.1055464 CrossRefMATHGoogle Scholar
  24. Sakoe H, Chiba S (1990) Readings in speech recognition. In: Waibel A, Lee KF (eds) Readings in speech recognition, dynamic programming algorithm optimization for spoken word recognition. Morgan Kaufmann Publishers Inc., San Francisco, pp 159–165 (1990)Google Scholar
  25. Tomek I (1976) An experiment with the edited nearest-neighbor rule. In: IEEE transactions on SMC-6(6) systems, man and cybernetics, pp 448–452 (1976). doi:10.1109/TSMC.1976.4309523
  26. Tsai CF, Eberle W, Chu CY (2013) Genetic algorithms in feature and instance selection. Knowl Based Syst 39:240–247. doi:10.1016/j.knosys.2012.11.005 CrossRefGoogle Scholar
  27. Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J Assoc Comput Mach 21(1):168–173. doi:10.1145/321796.321811 MathSciNetCrossRefMATHGoogle Scholar
  28. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. In: IEEE transactions on systems, man and cybernetics SMC-2(3), pp 408–421 (1972). doi:10.1109/TSMC.1972.4309137
  29. Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Jose J. Valero-Mas
    • 1
  • Jorge Calvo-Zaragoza
    • 1
  • Juan R. Rico-Juan
    • 1
  • José M. Iñesta
    • 1
  1. 1.Departamento de Lenguajes y Sistemas InformáticosUniversidad de AlicanteAlicanteSpain

Personalised recommendations