Improving Accuracy of Document Image Classification Through Soft Voting Ensemble

Sevim, Semih; Omurca, Sevinç İlhan; Ekinci, Ekin

doi:10.1007/978-3-031-09753-9_13

Part of the book series: Engineering Cyber-Physical Systems and Critical Infrastructures ((ECPSCI,volume 1))

Included in the following conference series:

The International Conference on Artificial Intelligence and Applied Mathematics in Engineering

538 Accesses

Abstract

Convolutional Neural Networks (CNNs) work well on document image classification tasks, yielding prediction accuracies comparable to state of the art neural networks. In this work, we investigate the performance of CNN architectures namely NasNet Large, InceptionV3 and EfficientNetB3 which are pre-trained on the ImageNet for an efficient document image classification. Beyond that we ensemble these architectures to achieve a superior classification performance. As an ensemble method a simple and effective ensemble strategy called soft voting is utilized. The experiments are conducted on document images which are used in Kocaeli University application system to apply for master degree or undergraduate transfer between programs. The achieved experimental results show that, in terms of F-score, soft voting outperforms CNN architectures by achieving 94.04% even when the training data is limited.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abuelwafa S, Pedersoli M, Cheriet M (2019) Unsupervised exemplar-based learning for improved document image classification. IEEE Access 7:133738–133748
Article Google Scholar
Afzal MZ, Capobianco S, Malik MI, Marinai S, Breuel TM, Dengel A, Liwicki M (2015) Deepdocclassifier: document classification with deep convolutional neural network. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp 1111–1115. https://doi.org/10.1109/ICDAR.2015.7333933
Afzal MZ, Kölsch A, Ahmed S, Liwicki M (2017) Cutting the error by half: investigation of very deep CNN and advanced training strategies for document image classification. CoRR arXiv preprint http://arxiv.org/abs/1704.03557
Aissam J, Mustapha H, Hasbaoui A (2021) An improved document image classification using deep transfer learning and feature reduction. Int J Adv Trends Comput Sci Eng 10:549–557. https://doi.org/10.30534/ijatcse/2021/141022021
Audebert N, Herold C, Slimani K, Vidal C (2019) Multimodal deep networks for text and image-based document classification. arXiv preprint arXiv:1907.06370
Bakkali S, Ming Z, Coustaty M, Rusiñol M (2020) Cross-modal deep networks for document image classification. In: 2020 IEEE International Conference on Image Processing (ICIP). IEEE, pp 2556–2560
Google Scholar
Bakkali S, Ming Z, Coustaty M, Rusinol M (2020) Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 562–563
Google Scholar
Bakkali S, Ming Z, Coustaty M, Rusiñol M (2021) Eaml: ensemble self-attention-based mutual learning network for document image classification. Int J Document Anal Recogn (IJDAR) 1–18
Google Scholar
Cosma A, Ghidoveanu M, Panaitescu-Liess M, Popescu M (2020) Self-supervised representation learning on document images. In: International workshop on document analysis systems. Springer, Heidelberg, pp 103–117
Google Scholar
Csurka G (2016) Document image classification, with a specific view on applications of patent images. CoRR http://arxiv.org/abs/1601.03295
Csurka G, Larlus D, Gordo A, Almazán J (2016) What is the right way to represent document images? CoRR abs/1603.01076 http://arxiv.org/abs/1603.01076
Das A, Roy S, Bhattacharya U (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. CoRR abs/1801.09321, http://arxiv.org/abs/1801.09321
Dutta A, Garai A, Biswas S, Das AK (2021) Segmentation of text lines using multi-scale cnn from warped printed and handwritten document images. Int J Document Anal Recogn (IJDAR) 1–15
Google Scholar
Fanany MI et al (2017) Handwriting recognition on form document using convolutional neural network and support vector machines (cnn-svm). In: 2017 5th international conference on information and communication technology (ICoIC7). IEEE, pp 1–6
Google Scholar
Ferrando J, Domínguez JL, Torres J, García R, García D, Garrido D, Cortada J, Valero M (2020) Improving accuracy and speeding up document image classification through parallel systems. In: International conference on computational science. Springer, Heidelberg, pp 387–400
Google Scholar
Han D, Liu Q, Fan W (2018) A new image classification method using cnn transfer learning and web data augmentation. Expert Syst Appl 95:43–56
Article Google Scholar
Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. CoRR abs/1502.07058, http://arxiv.org/abs/1502.07058
Hassanpour M, Malek H (2019) Document image classification using squeezenet convolutional neural network. In: 2019 5th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp 1–4. https://doi.org/10.1109/ICSPIS48872.2019.9066032
Hua Y, Huang Z, Guo J, Qiu W (2020) Attention-based graph neural network with global context awareness for document understanding. In: Proceedings of the 19th Chinese national conference on computational linguistics, pp 853–862. Chinese Information Processing Society of China, Haikou, China. https://aclanthology.org/2020.ccl-1.79
Jain R, Wigington C (2019) Multimodal document image classification. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp 71–77. https://doi.org/10.1109/ICDAR.2019.00021
Kang L, Kumar J, Ye P, Li Y, Doermann D (2014) Convolutional neural networks for document image classification. In: 2014 22nd international conference on pattern recognition, pp 3168–3172. https://doi.org/10.1109/ICPR.2014.546
Kölsch A, Afzal MZ, Ebbecke M, Liwicki M (2017) Real-time document image classification using deep cnn and extreme learning machines. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 1318–1323. https://doi.org/10.1109/ICDAR.2017.217
Kumar J, Ye P, Doermann D (2014) Structural similarity for document image classification and retrieval. Pattern Recogn Lett 43:119–126
Google Scholar
Mahajan K, Sharma M, Vig L (2019) Character keypoint-based homography estimation in scanned documents for efficient information extraction. CoRR abs/1911.05870, http://arxiv.org/abs/1911.05870
Mandivarapu JK, Bunch E, You Q, Fung G (2021) Efficient document image classification using region-based graph neural network. CoRR abs/2106.13802, https://arxiv.org/abs/2106.13802
Mohsenzadegan K, Tavakkoli V, De Silva P, Kolli A, Kyamakya K, Pichler R, Bouwmeester O, Zupan R (2020) A convolutional neural network model for robust classification of document-images under real-world hard conditions. In: Developments of artificial intelligence technologies in computation and robotics: proceedings of the 14th international FLINS conference (FLINS 2020). World Scientific, pp 1023–1030
Google Scholar
Nemcová K (2018) Document functional type classification. In: Horák A, Rychlý P., Rambousek A (eds) The 12th workshop on recent advances in Slavonic natural languages processing, RASLAN 2018, Karlova Studanka, Czech Republic, 7–9 Dec 2018. Tribun EU, pp 95–100
Google Scholar
Noce L, Gallo I, Zamberletti A, Calefati A (2016) Embedded textual content for document image classification with convolutional neural networks. In: Proceedings of the 2016 ACM symposium on document engineering, pp 165–173
Google Scholar
Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layerwise-trained deep convolutional neural networks for document image classification. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp 1273–1278. https://doi.org/10.1109/ICPR.2016.7899812
Şahin S et al (2020) Dijital dokümanların anahtar kelime tabanlı doğrulanması
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Google Scholar
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114
Google Scholar
Tensmeyer C, Martinez TR (2017) Analysis of convolutional neural networks for document image classification. CoRR abs/1708.03273, http://arxiv.org/abs/1708.03273
Xu Y, Li M, Cui L, Huang S, Wei F, Zhou M (2002) Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1192–1200
Google Scholar
Yaman D, Eyiokur FI, Ekenel HK (2017) Comparison of convolutional neural network models for document image classification. In: 2017 25th signal processing and communications applications conference (SIU), pp 1–4. https://doi.org/10.1109/SIU.2017.7960562
Zavalishin S, Bout A, Kurilin I, Rychagov M (2017) Document image classification on the basis of layout information. Electronic Imaging 78–86. https://doi.org/10.2352/ISSN.2470-1173.2017.2.VIPC-412
Zhou Q, Wu, H (2018) Nlp at iest 2018: Bilstm-attention and lstm-attention via soft voting in emotion classification. In: Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 189–194
Google Scholar
Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710
Google Scholar

Download references

Acknowledgements

This work has been supported by the Kocaeli University Scientific Research and Development Support Program (BAP) in Turkey under project number FBA-2020-2152.

Author information

Authors and Affiliations

Faculty of Engineering and Natural Sciences, Computer Engineering Department, Bandırma Onyedi Eylul University, Bandırma, Turkey
Semih Sevim
Faculty of Engineering, Computer Engineering Department, Kocaeli University, Kocaeli, Turkey
Sevinç İlhan Omurca
Faculty of Technology, Computer Engineering Department, Sakarya University of Applied Sciences, Sakarya, Turkey
Ekin Ekinci

Authors

Semih Sevim
View author publications
You can also search for this author in PubMed Google Scholar
Sevinç İlhan Omurca
View author publications
You can also search for this author in PubMed Google Scholar
Ekin Ekinci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Semih Sevim .

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sevim, S., Omurca, S.İ., Ekinci, E. (2023). Improving Accuracy of Document Image Classification Through Soft Voting Ensemble. In: Smart Applications with Advanced Machine Learning and Human-Centred Problem Design. ICAIAME 2021. Engineering Cyber-Physical Systems and Critical Infrastructures, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-031-09753-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-09753-9_13
Published: 01 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09752-2
Online ISBN: 978-3-031-09753-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Improving Accuracy of Document Image Classification Through Soft Voting Ensemble