An experimental study on rank methods for prototype selection
Prototype selection is one of the most popular approaches for addressing the low efficiency issue typically found in the well-known k-Nearest Neighbour classification rule. These techniques select a representative subset from an original collection of prototypes with the premise of maintaining the same classification accuracy. Most recently, rank methods have been proposed as an alternative to develop new selection strategies. Following a certain heuristic, these methods sort the elements of the initial collection according to their relevance and then select the best possible subset by means of a parameter representing the amount of data to maintain. Due to the relative novelty of these methods, their performance and competitiveness against other strategies is still unclear. This work performs an exhaustive experimental study of such methods for prototype selection. A representative collection of both classic and sophisticated algorithms are compared to the aforementioned techniques in a number of datasets, including different levels of induced noise. Results report the remarkable competitiveness of these rank methods as well as their excellent trade-off between prototype reduction and achieved accuracy.
Keywordsk-Nearest Neighbour Data reduction Prototype selection Rank methods
This work has been supported by the Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through the FPU programme (UAFPU2014-5883), the Spanish Ministerio de Educación, Cultura y Deporte through a FPU Fellowship (Ref. AP2012-0939) and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds) and Consejería de Educación de la Comunidad Valenciana through project PROMETEO/2012/017.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants performed by any of the authors.
- Calvo-Zaragoza, J., Oncina, J.: Recognition of pen-based music notation: the HOMUS dataset. In: Proceedings of the 22nd international conference on pattern recognition. Stockholm, Sweden, pp 3038–3043 (2014)Google Scholar
- Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2016) Prototype generation on structural data using dissimilarity space representation. Neural Comput Appl. doi: 10.1007/s00521-016-2278-8
- Dasarathy BV, Sánchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools-synergy exploitation. Pattern Anal Appl 19–30 (2000)Google Scholar
- Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, Upper Saddle RiverGoogle Scholar
- Eshelman LJ (1990) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: Proceedings of the first workshop on foundations of genetic algorithms. Bloomington Campus, Indiana, pp 265–283Google Scholar
- Freeman H (1961) On the encoding of arbitrary geometric configurations. In: IRE transactions on electronic computers EC-10(2), pp 260–268. doi: 10.1109/TEC.1961.5219197
- García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, intelligent systems reference library, vol 72. Springer, Cham (2015). doi: 10.1007/978-3-319-10247-4
- Natarajan N, Dhillon I, Ravikumar P, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204 (2013)Google Scholar
- Pekalska E, Duin RP, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39(2):189–208. doi: 10.1016/j.patcog.2005.06.012 (Part Special Issue: Complexity Reduction)
- Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660Google Scholar
- Sakoe H, Chiba S (1990) Readings in speech recognition. In: Waibel A, Lee KF (eds) Readings in speech recognition, dynamic programming algorithm optimization for spoken word recognition. Morgan Kaufmann Publishers Inc., San Francisco, pp 159–165 (1990)Google Scholar
- Tomek I (1976) An experiment with the edited nearest-neighbor rule. In: IEEE transactions on SMC-6(6) systems, man and cybernetics, pp 448–452 (1976). doi: 10.1109/TSMC.1976.4309523
- Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. In: IEEE transactions on systems, man and cybernetics SMC-2(3), pp 408–421 (1972). doi: 10.1109/TSMC.1972.4309137
- Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34Google Scholar