A document image classification system fusing deep and machine learning models

Omurca, Sevinç İlhan; Ekinci, Ekin; Sevim, Semih; Edinç, Eren Berk; Eken, Süleyman; Sayar, Ahmet

doi:10.1007/s10489-022-04306-5

A document image classification system fusing deep and machine learning models

Published: 15 November 2022

Volume 53, pages 15295–15310, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Sevinç İlhan Omurca¹,
Ekin Ekinci ORCID: orcid.org/0000-0003-0658-592X²,
Semih Sevim³,
Eren Berk Edinç¹,
Süleyman Eken⁴ &
…
Ahmet Sayar¹

756 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Artificial Intelligence (AI) technologies are now widely employed to overcome human-induced faults in a variety of systems used in our daily lives, thanks to the digital transformation.One example of such systems is online document tracking systems (DTS). The DTS’s reliability and preferability are enhanced by automatic document classification and understanding features. Although automatic document classification systems can assist humans in document understanding tasks, most of of them are not designed to function with Portable Document Format (PDF), which contains text, tables or figures. In this study, we investigate separate ways to efficiently classify student documents that are uploaded in PDF format and are required for university education. We propose three possible techniques for this issue. The first approach is based on Optical Character Recognition (OCR) and traditional machine learning methods. The second is purely on deep learning. The third one is based on fusion of deep learning methods based on entropy. The proposed techniques can classify twelve distinct types of digital documents. The validity of the proposed methods has been verified by student affairs department of Kocaeli University in Turkey. The system has not only increased the efficiency of online document uploading steps for students, but also reduced the human cost for tracking the documents. The highest F-score (94.45%) is obtained by the ensemble of EfficientNetB3 and ExtraTree.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study of Engineered Features and Learning Features in Machine Learning - A Case Study in Document Classification

EDNets: Deep Feature Learning for Document Image Classification Based on Multi-view Encoder-Decoder Neural Networks

Deep Layout Analysis of Multi-lingual and Composite Documents

References

Mahajan K, Sharma M, Vig L (2019) Character keypoint-based homography estimation in scanned documents for efficient information extraction. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 4. IEEE, pp 25–30
Menhour H et al (2021) Searchable Turkish OCRed historical newspaper collection 1928–1942. In: Journal of information science. SAGE Publications Sage UK, London, England, p 01655515211000642
Eken S, Menhour H, Küksal K (2019) DoCA: a content-based automatic classification system over digital documents. IEEE Access 7:97996–98004
Article Google Scholar
Blanke T, Bryant M, Hedges M (2012) Ocropodium: open source OCR for small-scale historical archives. J Inf Sci 38(1):76–86
Article Google Scholar
Hua Y et al (2020) Attention-based graph neural network with global context awareness for document understanding. In: China national conference on Chinese computational linguistics, Springer, pp 45–56
Xu Y et al (2020) Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & Data Mining, pp 1192–1200
Mathew M et al (2021) Asking questions on handwritten document collections. Int J Doc Anal Recog (IJDAR) 24(3):235–249
Article Google Scholar
Elanwar R et al (2021) Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model. Int J Doc Anal Recog (IJDAR) 24(4):349–362
Article Google Scholar
Liu L et al (2021) Document image classification: progress over two decades. Neurocomputing 453:223–240
Article Google Scholar
Rouhou AC et al (2022) Transformer-based approach for joint handwriting and named entity recognition in historical document. Pattern Recog Lett 155:128–134
Article Google Scholar
Kumar J, Ye P, Doermann D (2014) Structural similarity for document image classification and retrieval. Pattern Recog Lett 43:119–126
Article Google Scholar
Kang L et al (2014) Convolutional neural networks for document image classification. In: 2014 22nd international conference on pattern recognition, IEEE, pp –3172
Afzal MZ et al (2015) Deepdocclassifier: document classification with deep convolutional neural network. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1111–1115
Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 991–995
Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layerwise-trained deep convolutional neural networks for document image classification. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 1273–1278
Csurka G (2017) Document image classification, with a specific view on applications of patent images. In: Current challenges in patent information retrieval, Springer, pp 325–350
Tensmeyer C, Martinez T (2017) Analysis of convolutional neural networks for document image classification. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 388–393
Yaman D, Eyiokur FI, Ekenel HK (2017) Comparison of convolutional neural network models for document image classification. In: 2017 25th signal processing and communications applications conference (SIU), IEEE, pp 1–4
Afzal MZ et al (2017) Cutting the error by half: investigation of very deep cnn and advanced training strategies for document image classification. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 883–888
Zavalishin S et al (2017) Document image classification on the basis of layout information. Electr Imaging 2017(2):78–86
Article Google Scholar
Kölsch A et al (2017) Real-time document image classification using deep CNN and extreme learning machines. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1318–1323
Das A et al (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: 2018 24th international conference on pattern recognition (ICPR), IEEE, pp 3180–3185
Hassanpour M, Malek H (2019) Document Image Classification using SqueezeNet Convolutional Neural Network. In: 2019 5th Iranian conference on signal processing and intelligent systems (ICSPIS), IEEE, pp 1–4
Mohsenzadegan K et al (2020) A convolutional neural network model for robust classification of document-images under real-world hard conditions. In: Developments of artificial intelligence technologies in computation and robotics: proceedings of the 14th international FLINS conference (FLINS 2020), World Scientific, pp 1023–1030
Siddiqui SA, Dengel A, Ahmed S (2021) Self-supervised representation learning for document image classification. IEEE Access 9:164358–164367
Article Google Scholar
Liu Y, Soh L-K, Lorang E (2021) Investigating coupling preprocessing with shallow and deep convolutional neural networks in document image classification. J Electron Imaging 30(4):043024
Article Google Scholar
Şahin S et al (2020) Dijital Dokümanların Anahtar Kelime Tabanlı Doğrulanması. In: Proceedings of the 6. Ulusal Yüksek Başarımlı Hesaplama Konferansı (in Turkish), pp 1–6
Noce L et al (2016) Embedded textual content for document image classification with convolutional neural networks. In: Proceedings of the 2016 ACM symposium on document engineering, pp 165–173
Audebert N et al (2019) Multimodal deep networks for text and image-based document classification. In: Joint european conference on machine learning and knowledge discovery in databases, Springer, pp 427–443
Jain R, Wigington C (2019) Multimodal Document Image Classification. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 71–77
Asim MN et al (2019) Two stream deep network for document image classification. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1410–1416
Bakkali S et al (2020) Cross-modal deep networks for document image classification. In: 2020 ieee international conference on image processing (ICIP), IEEE, pp 2556–2560
Ferrando J et al (2020) Improving accuracy and speeding up Document Image Classification through parallel systems. In: International conference on computational science, Springer, pp 387–400
Cosma A et al (2020) Self-supervised Representation Learning on Document Images. In: International workshop on document analysis systems, Springer, pp 103–117
Bakkali S et al (2020) Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 562–563
Goodrum H, Roberts K, Bernstam EV (2020) Automatic classification of scanned electronic health record documents. Int J Med Inf 144:104302
Article Google Scholar
Bakkali S et al (2021) EAML: ensemble self-attention-based mutual learning network for document image classification. Int J Doc Anal Recog (IJDAR) 24(3):251–268
Prieto JR et al (2021) Textual-content-based classification of bundles of untranscribed manuscript images
Kay A (2007) Tesseract: an open-source optical character recognition engine. Linux J 2007(159):2
Google Scholar
Tang B, Kay S, He H (2016) Toward optimal feature selection in naive Bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521
Article Google Scholar
Farisi AA, Sibaroni Y, Al Faraby S (2019) Sentiment analysis on hotel reviews using Multinomial Naıve Bayes classifier. J Phys Conf Ser 1192(1):012024
Google Scholar
GoudjilMet al (2018) A novel active learning method using SVM for text classification. Int J Autom Comput 15(3):290–298
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
MathSciNet MATH Google Scholar
Huang G et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Zoph B et al (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710
Radhika K et al (2020) Performance analysis of NASNet on unconstrained ear recognition. In: Nature inspired computing for data science, Springer, pp 57–82
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, PMLR, pp 6105–6114
He K et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Prashanth B, Mendu M, Thallapalli R (2021) Cloud based Machine learning with advanced predictive Analytics using Google Colaboratory. Materials today: proceedings, Elsevier
Tensorflow. https://www.tensorflow.org/. Accessed18June2021
ScikitLearn. https://scikit-learn.org/stable/. Accessed18June2021
Zhang J, Guo H, Chen Z (2021) A concatenated approach based on transfer learning and PCA for classifying bees and wasps. J Phys Conf Ser 1873(1):012058
Google Scholar

Download references

Acknowledgements

We thank Ayhan Gültekin (Kocaeli University) for providing the experimental dataset.

Funding

This work has been supported by the Kocaeli University Scientific Researchand Development Support Program (BAP) in Turkey under project number FBA-2020-2152.

Author information

Authors and Affiliations

Department of Computer Engineering, Kocaeli University, Kocaeli, Turkey
Sevinç İlhan Omurca, Eren Berk Edinç & Ahmet Sayar
Department of Computer Engineering, Sakarya University of Applied Sciences, Sakarya, Turkey
Ekin Ekinci
Department of Computer Engineering, Bandırma Onyedi Eylül University, Balıkesir, Turkey
Semih Sevim
Department of Information Systems Engineering, Kocaeli University, Kocaeli, Turkey
Süleyman Eken

Authors

Sevinç İlhan Omurca
View author publications
You can also search for this author in PubMed Google Scholar
Ekin Ekinci
View author publications
You can also search for this author in PubMed Google Scholar
Semih Sevim
View author publications
You can also search for this author in PubMed Google Scholar
Eren Berk Edinç
View author publications
You can also search for this author in PubMed Google Scholar
Süleyman Eken
View author publications
You can also search for this author in PubMed Google Scholar
Ahmet Sayar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ekin Ekinci.

Ethics declarations

Conflict of Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Omurca, S.İ., Ekinci, E., Sevim, S. et al. A document image classification system fusing deep and machine learning models. Appl Intell 53, 15295–15310 (2023). https://doi.org/10.1007/s10489-022-04306-5

Download citation

Accepted: 27 October 2022
Published: 15 November 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10489-022-04306-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A document image classification system fusing deep and machine learning models

Abstract

Access this article

Similar content being viewed by others

Study of Engineered Features and Learning Features in Machine Learning - A Case Study in Document Classification

EDNets: Deep Feature Learning for Document Image Classification Based on Multi-view Encoder-Decoder Neural Networks

Deep Layout Analysis of Multi-lingual and Composite Documents

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A document image classification system fusing deep and machine learning models

Abstract

Access this article

Similar content being viewed by others

Study of Engineered Features and Learning Features in Machine Learning - A Case Study in Document Classification

EDNets: Deep Feature Learning for Document Image Classification Based on Multi-view Encoder-Decoder Neural Networks

Deep Layout Analysis of Multi-lingual and Composite Documents

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation