Skip to main content
Log in

An experimental study on rank methods for prototype selection

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Prototype selection is one of the most popular approaches for addressing the low efficiency issue typically found in the well-known k-Nearest Neighbour classification rule. These techniques select a representative subset from an original collection of prototypes with the premise of maintaining the same classification accuracy. Most recently, rank methods have been proposed as an alternative to develop new selection strategies. Following a certain heuristic, these methods sort the elements of the initial collection according to their relevance and then select the best possible subset by means of a parameter representing the amount of data to maintain. Due to the relative novelty of these methods, their performance and competitiveness against other strategies is still unclear. This work performs an exhaustive experimental study of such methods for prototype selection. A representative collection of both classic and sophisticated algorithms are compared to the aforementioned techniques in a number of datasets, including different levels of induced noise. Results report the remarkable competitiveness of these rank methods as well as their excellent trade-off between prototype reduction and achieved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Given that this number of elements is highly dependent on the memory and computation capabilities of the system considered, we shall restrict ourselves to the definition by Garcia et al. (2012) in which this threshold is set to 2000 prototypes.

References

  • Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464

    Article  Google Scholar 

  • Brighton H, Mellish C (1999) On the consistency of information filters for lazy learning algorithms. In: Zytkow J, Rauch J (eds) Principles of data mining and knowledge discovery. Lecture notes in computer science, vol 1704. Springer, Berlin, pp 283–288

    Chapter  Google Scholar 

  • Calvo-Zaragoza, J., Oncina, J.: Recognition of pen-based music notation: the HOMUS dataset. In: Proceedings of the 22nd international conference on pattern recognition. Stockholm, Sweden, pp 3038–3043 (2014)

  • Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2016) Prototype generation on structural data using dissimilarity space representation. Neural Comput Appl. doi:10.1007/s00521-016-2278-8

  • Cano J, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in kdd: an experimental study. IEEE Trans Evol Comput 7(6):561–575. doi:10.1109/TEVC.2003.819265

    Article  Google Scholar 

  • Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  MATH  Google Scholar 

  • Dasarathy BV, Sánchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools-synergy exploitation. Pattern Anal Appl 19–30 (2000)

  • Derrac J, Cornelis C, García S, Herrera F (2012) Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Inf Sci 186(1):73–92. doi:10.1016/j.ins.2011.09.027

    Article  Google Scholar 

  • Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, Upper Saddle River

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Eshelman LJ (1990) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: Proceedings of the first workshop on foundations of genetic algorithms. Bloomington Campus, Indiana, pp 265–283

  • Freeman H (1961) On the encoding of arbitrary geometric configurations. In: IRE transactions on electronic computers EC-10(2), pp 260–268. doi:10.1109/TEC.1961.5219197

  • Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435. doi:10.1109/TPAMI.2011.142

    Article  Google Scholar 

  • García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, intelligent systems reference library, vol 72. Springer, Cham (2015). doi:10.1007/978-3-319-10247-4

  • García-Pedrajas N, De Haro-García A (2014) Boosting instance selection algorithms. Knowl Based Syst 67:342–360. doi:10.1016/j.knosys.2014.04.021

    Article  Google Scholar 

  • Gates G (1972) The reduced nearest neighbor rule (corresp.). IEEE Trans Inf Theory 18(3):431–433. doi:10.1109/TIT.1972.1054809

    Article  Google Scholar 

  • Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory 14(3):515–516

    Article  Google Scholar 

  • Hull J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554. doi:10.1109/34.291440

    Article  Google Scholar 

  • Nanni L, Lumini A (2011) Prototype reduction techniques: a comparison among different approaches. Exp Syst Appl 38(9):11820–11828. doi:10.1016/j.eswa.2011.03.070

    Article  Google Scholar 

  • Natarajan N, Dhillon I, Ravikumar P, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204 (2013)

  • Pekalska E, Duin RP, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39(2):189–208. doi:10.1016/j.patcog.2005.06.012 (Part Special Issue: Complexity Reduction)

  • Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660

  • Ritter G, Woodruff H, Lowry S, Isenhour T (2006) An algorithm for a selective nearest neighbor decision rule (corresp.). IEEE Trans Inf Theory 21(6):665–669. doi:10.1109/TIT.1975.1055464

    Article  MATH  Google Scholar 

  • Sakoe H, Chiba S (1990) Readings in speech recognition. In: Waibel A, Lee KF (eds) Readings in speech recognition, dynamic programming algorithm optimization for spoken word recognition. Morgan Kaufmann Publishers Inc., San Francisco, pp 159–165 (1990)

  • Tomek I (1976) An experiment with the edited nearest-neighbor rule. In: IEEE transactions on SMC-6(6) systems, man and cybernetics, pp 448–452 (1976). doi:10.1109/TSMC.1976.4309523

  • Tsai CF, Eberle W, Chu CY (2013) Genetic algorithms in feature and instance selection. Knowl Based Syst 39:240–247. doi:10.1016/j.knosys.2012.11.005

    Article  Google Scholar 

  • Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J Assoc Comput Mach 21(1):168–173. doi:10.1145/321796.321811

    Article  MathSciNet  MATH  Google Scholar 

  • Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. In: IEEE transactions on systems, man and cybernetics SMC-2(3), pp 408–421 (1972). doi:10.1109/TSMC.1972.4309137

  • Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34

Download references

Acknowledgments

This work has been supported by the Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through the FPU programme (UAFPU2014-5883), the Spanish Ministerio de Educación, Cultura y Deporte through a FPU Fellowship (Ref. AP2012-0939) and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds) and Consejería de Educación de la Comunidad Valenciana through project PROMETEO/2012/017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose J. Valero-Mas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Appendix: Partial results obtained

Appendix: Partial results obtained

This appendix breaks down the general results into the figures obtained by each single prototype selection algorithm and dataset studied. For a better understanding, each table corresponds to a different induced noise configuration of the three considered (Tables 3, 4, 5).

Table 4 Results in terms of classification accuracy and set size reduction obtained by the different datasets considered with when using a 20 % of induced noise
Table 5 Results in terms of classification accuracy and set size reduction obtained by the different datasets considered with when using a 40 % of induced noise

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Valero-Mas, J.J., Calvo-Zaragoza, J., Rico-Juan, J.R. et al. An experimental study on rank methods for prototype selection. Soft Comput 21, 5703–5715 (2017). https://doi.org/10.1007/s00500-016-2148-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2148-4

Keywords

Navigation