Skip to main content

Improving Accuracy of Document Image Classification Through Soft Voting Ensemble

  • Conference paper
  • First Online:
Smart Applications with Advanced Machine Learning and Human-Centred Problem Design (ICAIAME 2021)

Abstract

Convolutional Neural Networks (CNNs) work well on document image classification tasks, yielding prediction accuracies comparable to state of the art neural networks. In this work, we investigate the performance of CNN architectures namely NasNet Large, InceptionV3 and EfficientNetB3 which are pre-trained on the ImageNet for an efficient document image classification. Beyond that we ensemble these architectures to achieve a superior classification performance. As an ensemble method a simple and effective ensemble strategy called soft voting is utilized. The experiments are conducted on document images which are used in Kocaeli University application system to apply for master degree or undergraduate transfer between programs. The achieved experimental results show that, in terms of F-score, soft voting outperforms CNN architectures by achieving 94.04% even when the training data is limited.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abuelwafa S, Pedersoli M, Cheriet M (2019) Unsupervised exemplar-based learning for improved document image classification. IEEE Access 7:133738–133748

    Article  Google Scholar 

  2. Afzal MZ, Capobianco S, Malik MI, Marinai S, Breuel TM, Dengel A, Liwicki M (2015) Deepdocclassifier: document classification with deep convolutional neural network. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp 1111–1115. https://doi.org/10.1109/ICDAR.2015.7333933

  3. Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. CoRR arXiv preprint http://arxiv.org/abs/1704.03557

  4. Aissam J, Mustapha H, Hasbaoui A (2021) An improved document image classification using deep transfer learning and feature reduction. Int J Adv Trends Comput Sci Eng 10:549–557. https://doi.org/10.30534/ijatcse/2021/141022021

  5. Audebert N, Herold C, Slimani K, Vidal C (2019) Multimodal deep networks for text and image-based document classification. arXiv preprint arXiv:1907.06370

  6. Bakkali S, Ming Z, Coustaty M, Rusiñol M (2020) Cross-modal deep networks for document image classification. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2556–2560

    Google Scholar 

  7. Bakkali S, Ming Z, Coustaty M, Rusinol M (2020) Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 562–563

    Google Scholar 

  8. Bakkali S, Ming Z, Coustaty M, Rusiñol M (2021) Eaml: ensemble self-attention-based mutual learning network for document image classification. Int J Document Anal Recogn (IJDAR) 1–18

    Google Scholar 

  9. Cosma A, Ghidoveanu M, Panaitescu-Liess M, Popescu M (2020) Self-supervised representation learning on document images. In: International workshop on document analysis systems. Springer, Heidelberg, pp 103–117

    Google Scholar 

  10. Csurka G (2016) Document image classification, with a specific view on applications of patent images. CoRR http://arxiv.org/abs/1601.03295

  11. Csurka G, Larlus D, Gordo A, Almazán J (2016) What is the right way to represent document images? CoRR abs/1603.01076 http://arxiv.org/abs/1603.01076

  12. Das A, Roy S, Bhattacharya U (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. CoRR abs/1801.09321, http://arxiv.org/abs/1801.09321

  13. Dutta A, Garai A, Biswas S, Das AK (2021) Segmentation of text lines using multi-scale cnn from warped printed and handwritten document images. Int J Document Anal Recogn (IJDAR) 1–15

    Google Scholar 

  14. Fanany MI et al (2017) Handwriting recognition on form document using convolutional neural network and support vector machines (cnn-svm). In: 2017 5th international conference on information and communication technology (ICoIC7). IEEE, pp 1–6

    Google Scholar 

  15. Ferrando J, Domínguez JL, Torres J, García R, García D, Garrido D, Cortada J, Valero M (2020) Improving accuracy and speeding up document image classification through parallel systems. In: International conference on computational science. Springer, Heidelberg, pp 387–400

    Google Scholar 

  16. Han D, Liu Q, Fan W (2018) A new image classification method using cnn transfer learning and web data augmentation. Expert Syst Appl 95:43–56

    Article  Google Scholar 

  17. Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. CoRR abs/1502.07058, http://arxiv.org/abs/1502.07058

  18. Hassanpour M, Malek H (2019) Document image classification using squeezenet convolutional neural network. In: 2019 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp 1–4. https://doi.org/10.1109/ICSPIS48872.2019.9066032

  19. Hua Y, Huang Z, Guo J, Qiu W (2020) Attention-based graph neural network with global context awareness for document understanding. In: Proceedings of the 19th Chinese national conference on computational linguistics, pp 853–862. Chinese Information Processing Society of China, Haikou, China. https://aclanthology.org/2020.ccl-1.79

  20. Jain R, Wigington C (2019) Multimodal document image classification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp 71–77. https://doi.org/10.1109/ICDAR.2019.00021

  21. Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image classification. In: 2014 22nd international conference on pattern recognition, pp 3168–3172. https://doi.org/10.1109/ICPR.2014.546

  22. Kölsch A, Afzal MZ, Ebbecke M, Liwicki M (2017) Real-time document image classification using deep cnn and extreme learning machines. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 1318–1323. https://doi.org/10.1109/ICDAR.2017.217

  23. Kumar J, Ye P, Doermann D (2014) Structural similarity for document image classification and retrieval. Pattern Recogn Lett 43:119–126

    Google Scholar 

  24. Mahajan K, Sharma M, Vig L (2019) Character keypoint-based homography estimation in scanned documents for efficient information extraction. CoRR abs/1911.05870, http://arxiv.org/abs/1911.05870

  25. Mandivarapu JK, Bunch E, You Q, Fung G (2021) Efficient document image classification using region-based graph neural network. CoRR abs/2106.13802, https://arxiv.org/abs/2106.13802

  26. Mohsenzadegan K, Tavakkoli V, De Silva P, Kolli A, Kyamakya K, Pichler R, Bouwmeester O, Zupan R (2020) A convolutional neural network model for robust classification of document-images under real-world hard conditions. In: Developments of artificial intelligence technologies in computation and robotics: proceedings of the 14th international FLINS conference (FLINS 2020). World Scientific, pp 1023–1030

    Google Scholar 

  27. Nemcová K (2018) Document functional type classification. In: Horák A, Rychlý P., Rambousek A (eds) The 12th workshop on recent advances in Slavonic natural languages processing, RASLAN 2018, Karlova Studanka, Czech Republic, 7–9 Dec 2018. Tribun EU, pp 95–100

    Google Scholar 

  28. Noce L, Gallo I, Zamberletti A, Calefati A (2016) Embedded textual content for document image classification with convolutional neural networks. In: Proceedings of the 2016 ACM symposium on document engineering, pp 165–173

    Google Scholar 

  29. Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layerwise-trained deep convolutional neural networks for document image classification. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp 1273–1278. https://doi.org/10.1109/ICPR.2016.7899812

  30. Şahin S et al (2020) Dijital dokümanların anahtar kelime tabanlı doğrulanması

    Google Scholar 

  31. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

    Google Scholar 

  32. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114

    Google Scholar 

  33. Tensmeyer C, Martinez TR (2017) Analysis of convolutional neural networks for document image classification. CoRR abs/1708.03273, http://arxiv.org/abs/1708.03273

  34. Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M (2002) Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1192–1200

    Google Scholar 

  35. Yaman D, Eyiokur FI, Ekenel HK (2017) Comparison of convolutional neural network models for document image classification. In: 2017 25th signal processing and communications applications conference (SIU), pp 1–4. https://doi.org/10.1109/SIU.2017.7960562

  36. Zavalishin S, Bout A, Kurilin I, Rychagov M (2017) Document image classification on the basis of layout information. Electronic Imaging 78–86. https://doi.org/10.2352/ISSN.2470-1173.2017.2.VIPC-412

  37. Zhou Q, Wu, H (2018) Nlp at iest 2018: Bilstm-attention and lstm-attention via soft voting in emotion classification. In: Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 189–194

    Google Scholar 

  38. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the Kocaeli University Scientific Research and Development Support Program (BAP) in Turkey under project number FBA-2020-2152.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Semih Sevim .

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sevim, S., Omurca, S.İ., Ekinci, E. (2023). Improving Accuracy of Document Image Classification Through Soft Voting Ensemble. In: Smart Applications with Advanced Machine Learning and Human-Centred Problem Design. ICAIAME 2021. Engineering Cyber-Physical Systems and Critical Infrastructures, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-031-09753-9_13

Download citation

Publish with us

Policies and ethics