Abstract
Machine learning is an important area of Artificial Intelligence. It has applications in almost all the fields of science. Supervised machine learning, for classification problems, involves training the classifiers with labeled data. There are many classifiers, each having its own strengths and weaknesses in terms of classification accuracy and the ability of dealing with noisy class labels in the training data. There is limited work reported in the literature on investigating the performance of classifiers under different levels of class noise in the training data. The current work aims to presents a thorough investigation on the effects of class mislabeling on the performance of different classifiers. Five commonly used classifiers; SVM, random forest, ANN, naïve Bayes, and KNN were investigated on a benchmark database of handwritten digit images. Classifiers were trained with different levels of labeling noise, ranging from low, to medium, to very high, and their recognition performances were evaluated and compared. The study led to some interesting observations which are presented in this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmad, I., Fink, G.A.: Training an Arabic handwriting recognizer without a handwritten training data set. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 476–480. IEEE (2015)
Ahmad, I., Mahmoud, S.A.: Arabic bank check processing: state of the art. J. Comput. Sci. Technol. 28(2), 285–299 (2013)
Al-Ohali, Y., Cheriet, M., Suen, C.: Databases for recognition of handwritten arabic cheques. Pattern Recognit. 36(1), 111–121 (2003)
Baird, H.S.: The state of the art of document image degradation modelling. In: Chaudhuri, B.B. (ed.) Digital Document Processing. ACVPR, pp. 261–279. Springer, London (2007). https://doi.org/10.1007/978-1-84628-726-8_12
Berthold, M.R., et al.: KNIME-the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor. Newsl. 11(1), 26–31 (2009)
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)
Gimenez, A., Andrés-Ferrer, J., Juan, A., et al.: Discriminative Bernoulli mixture models for handwritten digit recognition. In: 2011 International Conference on Document Analysis and Recognition, pp. 558–562. IEEE (2011)
Helali, M., Alneghaimish, A., Ahmad, I.: Handwritten digit recognition under constrained training conditions. IET Conference Proceedings pp. 35–36 (2017)
Kozielski, M., Nuhn, M., Doetsch, P., Ney, H.: Towards unsupervised learning for handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 549–554. IEEE (2014)
Mahmoud, S.A., et al.: KHATT: an open Arabic offline handwritten text database. Pattern Recognit. 47(3), 1096–1112 (2014)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)
Nazari, Z., Nazari, M., Danish, M.S.S., Kang, D.: Evaluation of class noise impact on performance of machine learning algorithms. IJCSNS 18(8), 149 (2018)
Richarz, J., Vajda, S., Grzeszick, R., Fink, G.A.: Semi-supervised learning for character recognition in historical archive documents. Pattern Recognit. 47(3), 1011–1020 (2014)
Sabzevari, M., Martínez-Muñoz, G., Suárez, A.: A two-stage ensemble method for the detection of class-label noise. Neurocomputing 275, 2374–2383 (2018)
Sáez, J.A., Luengo, J., Herrera, F.: Fuzzy rule based classification systems versus crisp robust learners trained in presence of class noise’s effects: a case of study. In: 2011 11th International Conference on Intelligent Systems Design and Applications, pp. 1229–1234. IEEE (2011)
Sáez, J.A., Luengo, J., Herrera, F.: Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176, 26–35 (2016)
Sherrod, P.H.: DTREG predictive modeling software (2003). http://www.dtreg.com
Tabik, S., Peralta, D., Herrera-Poyatos, A., Herrera, F.: A snapshot of image pre-processing for convolutional neural networks: case study of mnist. Int. J. Comput. Intell. Syst. 10(1), 555–568 (2017)
Varga, T., Bunke, H.: Perturbation models for generating synthetic training data in handwriting recognition. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition. SCI, vol. 90, pp. 333–360. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76280-5_13
Wienecke, M., Fink, G.A., Sagerer, G.: Toward automatic video-based whiteboard reading. Int. J. Doc. Anal. Recognit. (IJDAR) 7(2–3), 188–200 (2005)
Yuan, W., Guan, D., Zhu, Q., Ma, T.: Novel mislabeled training data detection algorithm. Neural Comput. Appl. 29(10), 673–683 (2018)
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)
Acknowledgment
The author would like to thank King Fahd University of Petroleum and Minerals (KFUPM) for supporting this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ahmad, I. (2019). Performance of Classifiers on Noisy-Labeled Training Data: An Empirical Study on Handwritten Digit Classification Task. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-20518-8_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20517-1
Online ISBN: 978-3-030-20518-8
eBook Packages: Computer ScienceComputer Science (R0)