A Self-generating Prototype Method Based on Information Entropy Used for Condensing Data in Classification Tasks

Manastarla, Alberto; Silva, Leandro A.

doi:10.1007/978-3-030-33607-3_22

Alberto Manastarla¹⁴ &
Leandro A. Silva¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11871))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1635 Accesses

Abstract

This paper presents a new self-generating prototype method based on information entropy to reduce the size of training datasets. The method accelerates the classifier training time without significantly decreasing the quality in the data classification task. The effectiveness of the proposed method is compared to the K-nearest neighbour classifier (kNN) and the genetic algorithm prototype selection (GA). kNN is a benchmark method used for data classification tasks, while GA is a prototype selection method that provides competitive optimisation of accuracy and the data reduction ratio. Considering thirty different public datasets, the results of the comparisons demonstrate that the proposed method outperforms kNN when using the original training set as well as the reduced training set obtained via GA prototype selection.

This study was financed in part by the Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES) - Finance Code 001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acampora, G., Tortora, G., Vitiello, A.: Applying SPEA2 to prototype selection for nearest neighbor classification. In: 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 003924–003929. IEEE (2016)
Google Scholar
Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Disc. 6(2), 153–172 (2002)
Article MathSciNet Google Scholar
Chen, C., Jóźwik, A.: A sample set condensation algorithm for the class sensitive artificial neural network. Pattern Recogn. Lett. 17(8), 819–823 (1996)
Article Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article Google Scholar
Deng, Z., Zhu, X., Cheng, D., Zong, M., Zhang, S.: Efficient KNN classification algorithm for big data. Neurocomputing 195, 143–148 (2016)
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)
MATH Google Scholar
Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Article Google Scholar
García, S., Molina, D., Lozano, M., Herrera, F.: A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 special session on real parameter optimization. J. Heuristics 15(6), 617 (2009)
Article Google Scholar
Ougiaroglou, S., Evangelidis, G., Dervos, D.A.: FHC: an adaptive fast hybrid method for K-NN classification. Logic J. IGPL 23(3), 431–450 (2015). https://doi.org/10.1093/jigpal/jzv015
Article MathSciNet Google Scholar
Pękalska, E., Duin, R.P., Paclík, P.: Prototype selection for dissimilarity-based classifiers. Pattern Recogn. 39(2), 189–208 (2006)
Article Google Scholar
Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008). https://doi.org/10.1109/TPAMI.2008.128
Article Google Scholar
Triguero, I., Derrac, J., Garcia, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42, 86–100 (2012). https://doi.org/10.1109/TSMCC.2010.2103939
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Article Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Article Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1) (2008). https://doi.org/10.1007/s10115-007-0114-2
Article Google Scholar
Wu, X., Kumar, V.: The Top Ten Algorithms in Data Mining. CRC Press, Boca Raton (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Electrical Engineering and Computing, Mackenzie Presbyterian University, Sao Paulo, SP, Brazil
Alberto Manastarla
Computing and Informatics Faculty, Electrical Engineering and Computing, Mackenzie Presbyterian University, Sao Paulo, SP, Brazil
Leandro A. Silva

Authors

Alberto Manastarla
View author publications
You can also search for this author in PubMed Google Scholar
Leandro A. Silva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alberto Manastarla .

Editor information

Editors and Affiliations

University of Manchester, Manchester, UK
Hujun Yin
Technical University of Madrid, Madrid, Spain
David Camacho
University of Birmingham, Birmingham, UK
Peter Tino
University of Huelva, Huelva, Spain
Antonio J. Tallón-Ballesteros
University of Exeter, Exeter, UK
Ronaldo Menezes
University of Manchester, Manchester, UK
Richard Allmendinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manastarla, A., Silva, L.A. (2019). A Self-generating Prototype Method Based on Information Entropy Used for Condensing Data in Classification Tasks. In: Yin, H., Camacho, D., Tino, P., Tallón-Ballesteros, A., Menezes, R., Allmendinger, R. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2019. IDEAL 2019. Lecture Notes in Computer Science(), vol 11871. Springer, Cham. https://doi.org/10.1007/978-3-030-33607-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-33607-3_22
Published: 18 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33606-6
Online ISBN: 978-3-030-33607-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics