Performance of Classifiers on Noisy-Labeled Training Data: An Empirical Study on Handwritten Digit Classification Task

Ahmad, Irfan

doi:10.1007/978-3-030-20518-8_35

Irfan Ahmad¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11507))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

2244 Accesses
2 Citations

Abstract

Machine learning is an important area of Artificial Intelligence. It has applications in almost all the fields of science. Supervised machine learning, for classification problems, involves training the classifiers with labeled data. There are many classifiers, each having its own strengths and weaknesses in terms of classification accuracy and the ability of dealing with noisy class labels in the training data. There is limited work reported in the literature on investigating the performance of classifiers under different levels of class noise in the training data. The current work aims to presents a thorough investigation on the effects of class mislabeling on the performance of different classifiers. Five commonly used classifiers; SVM, random forest, ANN, naïve Bayes, and KNN were investigated on a benchmark database of handwritten digit images. Classifiers were trained with different levels of labeling noise, ranging from low, to medium, to very high, and their recognition performances were evaluated and compared. The study led to some interesting observations which are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmad, I., Fink, G.A.: Training an Arabic handwriting recognizer without a handwritten training data set. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 476–480. IEEE (2015)
Google Scholar
Ahmad, I., Mahmoud, S.A.: Arabic bank check processing: state of the art. J. Comput. Sci. Technol. 28(2), 285–299 (2013)
Article Google Scholar
Al-Ohali, Y., Cheriet, M., Suen, C.: Databases for recognition of handwritten arabic cheques. Pattern Recognit. 36(1), 111–121 (2003)
Article Google Scholar
Baird, H.S.: The state of the art of document image degradation modelling. In: Chaudhuri, B.B. (ed.) Digital Document Processing. ACVPR, pp. 261–279. Springer, London (2007). https://doi.org/10.1007/978-1-84628-726-8_12
Chapter Google Scholar
Berthold, M.R., et al.: KNIME-the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor. Newsl. 11(1), 26–31 (2009)
Article MathSciNet Google Scholar
Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)
Article Google Scholar
Gimenez, A., Andrés-Ferrer, J., Juan, A., et al.: Discriminative Bernoulli mixture models for handwritten digit recognition. In: 2011 International Conference on Document Analysis and Recognition, pp. 558–562. IEEE (2011)
Google Scholar
Helali, M., Alneghaimish, A., Ahmad, I.: Handwritten digit recognition under constrained training conditions. IET Conference Proceedings pp. 35–36 (2017)
Google Scholar
Kozielski, M., Nuhn, M., Doetsch, P., Ney, H.: Towards unsupervised learning for handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 549–554. IEEE (2014)
Google Scholar
Mahmoud, S.A., et al.: KHATT: an open Arabic offline handwritten text database. Pattern Recognit. 47(3), 1096–1112 (2014)
Article MathSciNet Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)
Article Google Scholar
Nazari, Z., Nazari, M., Danish, M.S.S., Kang, D.: Evaluation of class noise impact on performance of machine learning algorithms. IJCSNS 18(8), 149 (2018)
Google Scholar
Richarz, J., Vajda, S., Grzeszick, R., Fink, G.A.: Semi-supervised learning for character recognition in historical archive documents. Pattern Recognit. 47(3), 1011–1020 (2014)
Article Google Scholar
Sabzevari, M., Martínez-Muñoz, G., Suárez, A.: A two-stage ensemble method for the detection of class-label noise. Neurocomputing 275, 2374–2383 (2018)
Article Google Scholar
Sáez, J.A., Luengo, J., Herrera, F.: Fuzzy rule based classification systems versus crisp robust learners trained in presence of class noise’s effects: a case of study. In: 2011 11th International Conference on Intelligent Systems Design and Applications, pp. 1229–1234. IEEE (2011)
Google Scholar
Sáez, J.A., Luengo, J., Herrera, F.: Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176, 26–35 (2016)
Article Google Scholar
Sherrod, P.H.: DTREG predictive modeling software (2003). http://www.dtreg.com
Tabik, S., Peralta, D., Herrera-Poyatos, A., Herrera, F.: A snapshot of image pre-processing for convolutional neural networks: case study of mnist. Int. J. Comput. Intell. Syst. 10(1), 555–568 (2017)
Article Google Scholar
Varga, T., Bunke, H.: Perturbation models for generating synthetic training data in handwriting recognition. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition. SCI, vol. 90, pp. 333–360. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76280-5_13
Chapter Google Scholar
Wienecke, M., Fink, G.A., Sagerer, G.: Toward automatic video-based whiteboard reading. Int. J. Doc. Anal. Recognit. (IJDAR) 7(2–3), 188–200 (2005)
Article Google Scholar
Yuan, W., Guan, D., Zhu, Q., Ma, T.: Novel mislabeled training data detection algorithm. Neural Comput. Appl. 29(10), 673–683 (2018)
Article Google Scholar
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)
Article Google Scholar

Download references

Acknowledgment

The author would like to thank King Fahd University of Petroleum and Minerals (KFUPM) for supporting this work.

Author information

Authors and Affiliations

Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Irfan Ahmad

Authors

Irfan Ahmad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irfan Ahmad .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Ignacio Rojas
University of Malaga, Malaga, Spain
Gonzalo Joya
Polytechnic University of Catalonia, Barcelona, Spain
Andreu Catala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmad, I. (2019). Performance of Classifiers on Noisy-Labeled Training Data: An Empirical Study on Handwritten Digit Classification Task. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-20518-8_35
Published: 16 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20517-1
Online ISBN: 978-3-030-20518-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics