Skip to main content

Performance of Classifiers on Noisy-Labeled Training Data: An Empirical Study on Handwritten Digit Classification Task

  • Conference paper
  • First Online:
Advances in Computational Intelligence (IWANN 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11507))

Included in the following conference series:

Abstract

Machine learning is an important area of Artificial Intelligence. It has applications in almost all the fields of science. Supervised machine learning, for classification problems, involves training the classifiers with labeled data. There are many classifiers, each having its own strengths and weaknesses in terms of classification accuracy and the ability of dealing with noisy class labels in the training data. There is limited work reported in the literature on investigating the performance of classifiers under different levels of class noise in the training data. The current work aims to presents a thorough investigation on the effects of class mislabeling on the performance of different classifiers. Five commonly used classifiers; SVM, random forest, ANN, naïve Bayes, and KNN were investigated on a benchmark database of handwritten digit images. Classifiers were trained with different levels of labeling noise, ranging from low, to medium, to very high, and their recognition performances were evaluated and compared. The study led to some interesting observations which are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmad, I., Fink, G.A.: Training an Arabic handwriting recognizer without a handwritten training data set. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 476–480. IEEE (2015)

    Google Scholar 

  2. Ahmad, I., Mahmoud, S.A.: Arabic bank check processing: state of the art. J. Comput. Sci. Technol. 28(2), 285–299 (2013)

    Article  Google Scholar 

  3. Al-Ohali, Y., Cheriet, M., Suen, C.: Databases for recognition of handwritten arabic cheques. Pattern Recognit. 36(1), 111–121 (2003)

    Article  Google Scholar 

  4. Baird, H.S.: The state of the art of document image degradation modelling. In: Chaudhuri, B.B. (ed.) Digital Document Processing. ACVPR, pp. 261–279. Springer, London (2007). https://doi.org/10.1007/978-1-84628-726-8_12

    Chapter  Google Scholar 

  5. Berthold, M.R., et al.: KNIME-the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor. Newsl. 11(1), 26–31 (2009)

    Article  MathSciNet  Google Scholar 

  6. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998)

    Article  Google Scholar 

  7. Gimenez, A., Andrés-Ferrer, J., Juan, A., et al.: Discriminative Bernoulli mixture models for handwritten digit recognition. In: 2011 International Conference on Document Analysis and Recognition, pp. 558–562. IEEE (2011)

    Google Scholar 

  8. Helali, M., Alneghaimish, A., Ahmad, I.: Handwritten digit recognition under constrained training conditions. IET Conference Proceedings pp. 35–36 (2017)

    Google Scholar 

  9. Kozielski, M., Nuhn, M., Doetsch, P., Ney, H.: Towards unsupervised learning for handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 549–554. IEEE (2014)

    Google Scholar 

  10. Mahmoud, S.A., et al.: KHATT: an open Arabic offline handwritten text database. Pattern Recognit. 47(3), 1096–1112 (2014)

    Article  MathSciNet  Google Scholar 

  11. Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)

    Article  Google Scholar 

  12. Nazari, Z., Nazari, M., Danish, M.S.S., Kang, D.: Evaluation of class noise impact on performance of machine learning algorithms. IJCSNS 18(8), 149 (2018)

    Google Scholar 

  13. Richarz, J., Vajda, S., Grzeszick, R., Fink, G.A.: Semi-supervised learning for character recognition in historical archive documents. Pattern Recognit. 47(3), 1011–1020 (2014)

    Article  Google Scholar 

  14. Sabzevari, M., Martínez-Muñoz, G., Suárez, A.: A two-stage ensemble method for the detection of class-label noise. Neurocomputing 275, 2374–2383 (2018)

    Article  Google Scholar 

  15. Sáez, J.A., Luengo, J., Herrera, F.: Fuzzy rule based classification systems versus crisp robust learners trained in presence of class noise’s effects: a case of study. In: 2011 11th International Conference on Intelligent Systems Design and Applications, pp. 1229–1234. IEEE (2011)

    Google Scholar 

  16. Sáez, J.A., Luengo, J., Herrera, F.: Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176, 26–35 (2016)

    Article  Google Scholar 

  17. Sherrod, P.H.: DTREG predictive modeling software (2003). http://www.dtreg.com

  18. Tabik, S., Peralta, D., Herrera-Poyatos, A., Herrera, F.: A snapshot of image pre-processing for convolutional neural networks: case study of mnist. Int. J. Comput. Intell. Syst. 10(1), 555–568 (2017)

    Article  Google Scholar 

  19. Varga, T., Bunke, H.: Perturbation models for generating synthetic training data in handwriting recognition. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition. SCI, vol. 90, pp. 333–360. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-76280-5_13

    Chapter  Google Scholar 

  20. Wienecke, M., Fink, G.A., Sagerer, G.: Toward automatic video-based whiteboard reading. Int. J. Doc. Anal. Recognit. (IJDAR) 7(2–3), 188–200 (2005)

    Article  Google Scholar 

  21. Yuan, W., Guan, D., Zhu, Q., Ma, T.: Novel mislabeled training data detection algorithm. Neural Comput. Appl. 29(10), 673–683 (2018)

    Article  Google Scholar 

  22. Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)

    Article  Google Scholar 

Download references

Acknowledgment

The author would like to thank King Fahd University of Petroleum and Minerals (KFUPM) for supporting this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irfan Ahmad .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ahmad, I. (2019). Performance of Classifiers on Noisy-Labeled Training Data: An Empirical Study on Handwritten Digit Classification Task. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20518-8_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20517-1

  • Online ISBN: 978-3-030-20518-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics