An experimental study on rank methods for prototype selection

Valero-Mas, Jose J.; Calvo-Zaragoza, Jorge; Rico-Juan, Juan R.; Iñesta, José M.

doi:10.1007/s00500-016-2148-4

An experimental study on rank methods for prototype selection

Methodologies and Application
Published: 13 April 2016

Volume 21, pages 5703–5715, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jose J. Valero-Mas¹,
Jorge Calvo-Zaragoza¹,
Juan R. Rico-Juan¹ &
…
José M. Iñesta¹

423 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Prototype selection is one of the most popular approaches for addressing the low efficiency issue typically found in the well-known k-Nearest Neighbour classification rule. These techniques select a representative subset from an original collection of prototypes with the premise of maintaining the same classification accuracy. Most recently, rank methods have been proposed as an alternative to develop new selection strategies. Following a certain heuristic, these methods sort the elements of the initial collection according to their relevance and then select the best possible subset by means of a parameter representing the amount of data to maintain. Due to the relative novelty of these methods, their performance and competitiveness against other strategies is still unclear. This work performs an exhaustive experimental study of such methods for prototype selection. A representative collection of both classic and sophisticated algorithms are compared to the aforementioned techniques in a number of datasets, including different levels of induced noise. Results report the remarkable competitiveness of these rank methods as well as their excellent trade-off between prototype reduction and achieved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Study of Prototype Selection Algorithms for Nearest Neighbour in Class-Imbalanced Problems

A Density-Based Prototype Selection Approach

Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Notes

Given that this number of elements is highly dependent on the memory and computation capabilities of the system considered, we shall restrict ourselves to the definition by Garcia et al. (2012) in which this threshold is set to 2000 prototypes.

References

Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464
Article Google Scholar
Brighton H, Mellish C (1999) On the consistency of information filters for lazy learning algorithms. In: Zytkow J, Rauch J (eds) Principles of data mining and knowledge discovery. Lecture notes in computer science, vol 1704. Springer, Berlin, pp 283–288
Chapter Google Scholar
Calvo-Zaragoza, J., Oncina, J.: Recognition of pen-based music notation: the HOMUS dataset. In: Proceedings of the 22nd international conference on pattern recognition. Stockholm, Sweden, pp 3038–3043 (2014)
Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2016) Prototype generation on structural data using dissimilarity space representation. Neural Comput Appl. doi:10.1007/s00521-016-2278-8
Cano J, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in kdd: an experimental study. IEEE Trans Evol Comput 7(6):561–575. doi:10.1109/TEVC.2003.819265
Article Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Dasarathy BV, Sánchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools-synergy exploitation. Pattern Anal Appl 19–30 (2000)
Derrac J, Cornelis C, García S, Herrera F (2012) Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Inf Sci 186(1):73–92. doi:10.1016/j.ins.2011.09.027
Article Google Scholar
Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, Upper Saddle River
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Eshelman LJ (1990) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: Proceedings of the first workshop on foundations of genetic algorithms. Bloomington Campus, Indiana, pp 265–283
Freeman H (1961) On the encoding of arbitrary geometric configurations. In: IRE transactions on electronic computers EC-10(2), pp 260–268. doi:10.1109/TEC.1961.5219197
Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435. doi:10.1109/TPAMI.2011.142
Article Google Scholar
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, intelligent systems reference library, vol 72. Springer, Cham (2015). doi:10.1007/978-3-319-10247-4
García-Pedrajas N, De Haro-García A (2014) Boosting instance selection algorithms. Knowl Based Syst 67:342–360. doi:10.1016/j.knosys.2014.04.021
Article Google Scholar
Gates G (1972) The reduced nearest neighbor rule (corresp.). IEEE Trans Inf Theory 18(3):431–433. doi:10.1109/TIT.1972.1054809
Article Google Scholar
Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inf Theory 14(3):515–516
Article Google Scholar
Hull J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554. doi:10.1109/34.291440
Article Google Scholar
Nanni L, Lumini A (2011) Prototype reduction techniques: a comparison among different approaches. Exp Syst Appl 38(9):11820–11828. doi:10.1016/j.eswa.2011.03.070
Article Google Scholar
Natarajan N, Dhillon I, Ravikumar P, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204 (2013)
Pekalska E, Duin RP, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39(2):189–208. doi:10.1016/j.patcog.2005.06.012 (Part Special Issue: Complexity Reduction)
Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660
Ritter G, Woodruff H, Lowry S, Isenhour T (2006) An algorithm for a selective nearest neighbor decision rule (corresp.). IEEE Trans Inf Theory 21(6):665–669. doi:10.1109/TIT.1975.1055464
Article MATH Google Scholar
Sakoe H, Chiba S (1990) Readings in speech recognition. In: Waibel A, Lee KF (eds) Readings in speech recognition, dynamic programming algorithm optimization for spoken word recognition. Morgan Kaufmann Publishers Inc., San Francisco, pp 159–165 (1990)
Tomek I (1976) An experiment with the edited nearest-neighbor rule. In: IEEE transactions on SMC-6(6) systems, man and cybernetics, pp 448–452 (1976). doi:10.1109/TSMC.1976.4309523
Tsai CF, Eberle W, Chu CY (2013) Genetic algorithms in feature and instance selection. Knowl Based Syst 39:240–247. doi:10.1016/j.knosys.2012.11.005
Article Google Scholar
Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J Assoc Comput Mach 21(1):168–173. doi:10.1145/321796.321811
Article MathSciNet MATH Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. In: IEEE transactions on systems, man and cybernetics SMC-2(3), pp 408–421 (1972). doi:10.1109/TSMC.1972.4309137
Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34

Download references

Acknowledgments

This work has been supported by the Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through the FPU programme (UAFPU2014-5883), the Spanish Ministerio de Educación, Cultura y Deporte through a FPU Fellowship (Ref. AP2012-0939) and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds) and Consejería de Educación de la Comunidad Valenciana through project PROMETEO/2012/017.

Author information

Authors and Affiliations

Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Carretera San Vicente del Raspeig s/n, 03690, Alicante, Spain
Jose J. Valero-Mas, Jorge Calvo-Zaragoza, Juan R. Rico-Juan & José M. Iñesta

Authors

Jose J. Valero-Mas
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Calvo-Zaragoza
View author publications
You can also search for this author in PubMed Google Scholar
Juan R. Rico-Juan
View author publications
You can also search for this author in PubMed Google Scholar
José M. Iñesta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jose J. Valero-Mas.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Appendix: Partial results obtained

This appendix breaks down the general results into the figures obtained by each single prototype selection algorithm and dataset studied. For a better understanding, each table corresponds to a different induced noise configuration of the three considered (Tables 3, 4, 5).

Table 4 Results in terms of classification accuracy and set size reduction obtained by the different datasets considered with when using a 20 % of induced noise

Full size table

Table 5 Results in terms of classification accuracy and set size reduction obtained by the different datasets considered with when using a 40 % of induced noise

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valero-Mas, J.J., Calvo-Zaragoza, J., Rico-Juan, J.R. et al. An experimental study on rank methods for prototype selection. Soft Comput 21, 5703–5715 (2017). https://doi.org/10.1007/s00500-016-2148-4

Download citation

Published: 13 April 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00500-016-2148-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An experimental study on rank methods for prototype selection

Abstract

Access this article

Similar content being viewed by others

A Study of Prototype Selection Algorithms for Nearest Neighbour in Class-Imbalanced Problems

A Density-Based Prototype Selection Approach

Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Appendix: Partial results obtained

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An experimental study on rank methods for prototype selection

Abstract

Access this article

Similar content being viewed by others

A Study of Prototype Selection Algorithms for Nearest Neighbour in Class-Imbalanced Problems

A Density-Based Prototype Selection Approach

Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Appendix: Partial results obtained

Appendix: Partial results obtained

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation