Skip to main content

Advertisement

Log in

A document image classification system fusing deep and machine learning models

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Artificial Intelligence (AI) technologies are now widely employed to overcome human-induced faults in a variety of systems used in our daily lives, thanks to the digital transformation.One example of such systems is online document tracking systems (DTS). The DTS’s reliability and preferability are enhanced by automatic document classification and understanding features. Although automatic document classification systems can assist humans in document understanding tasks, most of of them are not designed to function with Portable Document Format (PDF), which contains text, tables or figures. In this study, we investigate separate ways to efficiently classify student documents that are uploaded in PDF format and are required for university education. We propose three possible techniques for this issue. The first approach is based on Optical Character Recognition (OCR) and traditional machine learning methods. The second is purely on deep learning. The third one is based on fusion of deep learning methods based on entropy. The proposed techniques can classify twelve distinct types of digital documents. The validity of the proposed methods has been verified by student affairs department of Kocaeli University in Turkey. The system has not only increased the efficiency of online document uploading steps for students, but also reduced the human cost for tracking the documents. The highest F-score (94.45%) is obtained by the ensemble of EfficientNetB3 and ExtraTree.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Mahajan K, Sharma M, Vig L (2019) Character keypoint-based homography estimation in scanned documents for efficient information extraction. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 4. IEEE, pp 25–30

  2. Menhour H et al (2021) Searchable Turkish OCRed historical newspaper collection 1928–1942. In: Journal of information science. SAGE Publications Sage UK, London, England, p 01655515211000642

  3. Eken S, Menhour H, Küksal K (2019) DoCA: a content-based automatic classification system over digital documents. IEEE Access 7:97996–98004

    Article  Google Scholar 

  4. Blanke T, Bryant M, Hedges M (2012) Ocropodium: open source OCR for small-scale historical archives. J Inf Sci 38(1):76–86

    Article  Google Scholar 

  5. Hua Y et al (2020) Attention-based graph neural network with global context awareness for document understanding. In: China national conference on Chinese computational linguistics, Springer, pp 45–56

  6. Xu Y et al (2020) Layoutlm: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & Data Mining, pp 1192–1200

  7. Mathew M et al (2021) Asking questions on handwritten document collections. Int J Doc Anal Recog (IJDAR) 24(3):235–249

    Article  Google Scholar 

  8. Elanwar R et al (2021) Extracting text from scanned Arabic books: a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model. Int J Doc Anal Recog (IJDAR) 24(4):349–362

    Article  Google Scholar 

  9. Liu L et al (2021) Document image classification: progress over two decades. Neurocomputing 453:223–240

    Article  Google Scholar 

  10. Rouhou AC et al (2022) Transformer-based approach for joint handwriting and named entity recognition in historical document. Pattern Recog Lett 155:128–134

    Article  Google Scholar 

  11. Kumar J, Ye P, Doermann D (2014) Structural similarity for document image classification and retrieval. Pattern Recog Lett 43:119–126

    Article  Google Scholar 

  12. Kang L et al (2014) Convolutional neural networks for document image classification. In: 2014 22nd international conference on pattern recognition, IEEE, pp –3172

  13. Afzal MZ et al (2015) Deepdocclassifier: document classification with deep convolutional neural network. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1111–1115

  14. Harley AW, Ufkes A, Derpanis KG (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 991–995

  15. Roy S, Das A, Bhattacharya U (2016) Generalized stacking of layerwise-trained deep convolutional neural networks for document image classification. In: 2016 23rd international conference on pattern recognition (ICPR), IEEE, pp 1273–1278

  16. Csurka G (2017) Document image classification, with a specific view on applications of patent images. In: Current challenges in patent information retrieval, Springer, pp 325–350

  17. Tensmeyer C, Martinez T (2017) Analysis of convolutional neural networks for document image classification. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 388–393

  18. Yaman D, Eyiokur FI, Ekenel HK (2017) Comparison of convolutional neural network models for document image classification. In: 2017 25th signal processing and communications applications conference (SIU), IEEE, pp 1–4

  19. Afzal MZ et al (2017) Cutting the error by half: investigation of very deep cnn and advanced training strategies for document image classification. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 883–888

  20. Zavalishin S et al (2017) Document image classification on the basis of layout information. Electr Imaging 2017(2):78–86

    Article  Google Scholar 

  21. Kölsch A et al (2017) Real-time document image classification using deep CNN and extreme learning machines. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1318–1323

  22. Das A et al (2018) Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: 2018 24th international conference on pattern recognition (ICPR), IEEE, pp 3180–3185

  23. Hassanpour M, Malek H (2019) Document Image Classification using SqueezeNet Convolutional Neural Network. In: 2019 5th Iranian conference on signal processing and intelligent systems (ICSPIS), IEEE, pp 1–4

  24. Mohsenzadegan K et al (2020) A convolutional neural network model for robust classification of document-images under real-world hard conditions. In: Developments of artificial intelligence technologies in computation and robotics: proceedings of the 14th international FLINS conference (FLINS 2020), World Scientific, pp 1023–1030

  25. Siddiqui SA, Dengel A, Ahmed S (2021) Self-supervised representation learning for document image classification. IEEE Access 9:164358–164367

    Article  Google Scholar 

  26. Liu Y, Soh L-K, Lorang E (2021) Investigating coupling preprocessing with shallow and deep convolutional neural networks in document image classification. J Electron Imaging 30(4):043024

    Article  Google Scholar 

  27. Şahin S et al (2020) Dijital Dokümanların Anahtar Kelime Tabanlı Doğrulanması. In: Proceedings of the 6. Ulusal Yüksek Başarımlı Hesaplama Konferansı (in Turkish), pp 1–6

  28. Noce L et al (2016) Embedded textual content for document image classification with convolutional neural networks. In: Proceedings of the 2016 ACM symposium on document engineering, pp 165–173

  29. Audebert N et al (2019) Multimodal deep networks for text and image-based document classification. In: Joint european conference on machine learning and knowledge discovery in databases, Springer, pp 427–443

  30. Jain R, Wigington C (2019) Multimodal Document Image Classification. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 71–77

  31. Asim MN et al (2019) Two stream deep network for document image classification. In: 2019 international conference on document analysis and recognition (ICDAR), IEEE, pp 1410–1416

  32. Bakkali S et al (2020) Cross-modal deep networks for document image classification. In: 2020 ieee international conference on image processing (ICIP), IEEE, pp 2556–2560

  33. Ferrando J et al (2020) Improving accuracy and speeding up Document Image Classification through parallel systems. In: International conference on computational science, Springer, pp 387–400

  34. Cosma A et al (2020) Self-supervised Representation Learning on Document Images. In: International workshop on document analysis systems, Springer, pp 103–117

  35. Bakkali S et al (2020) Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 562–563

  36. Goodrum H, Roberts K, Bernstam EV (2020) Automatic classification of scanned electronic health record documents. Int J Med Inf 144:104302

    Article  Google Scholar 

  37. Bakkali S et al (2021) EAML: ensemble self-attention-based mutual learning network for document image classification. Int J Doc Anal Recog (IJDAR) 24(3):251–268

  38. Prieto JR et al (2021) Textual-content-based classification of bundles of untranscribed manuscript images

  39. Kay A (2007) Tesseract: an open-source optical character recognition engine. Linux J 2007(159):2

    Google Scholar 

  40. Tang B, Kay S, He H (2016) Toward optimal feature selection in naive Bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521

    Article  Google Scholar 

  41. Farisi AA, Sibaroni Y, Al Faraby S (2019) Sentiment analysis on hotel reviews using Multinomial Naıve Bayes classifier. J Phys Conf Ser 1192(1):012024

    Google Scholar 

  42. GoudjilMet al (2018) A novel active learning method using SVM for text classification. Int J Autom Comput 15(3):290–298

    Article  Google Scholar 

  43. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

  44. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42

  45. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794

  46. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    MathSciNet  MATH  Google Scholar 

  47. Huang G et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  48. Zoph B et al (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710

  49. Radhika K et al (2020) Performance analysis of NASNet on unconstrained ear recognition. In: Nature inspired computing for data science, Springer, pp 57–82

  50. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  51. Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  52. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, PMLR, pp 6105–6114

  53. He K et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  54. Prashanth B, Mendu M, Thallapalli R (2021) Cloud based Machine learning with advanced predictive Analytics using Google Colaboratory. Materials today: proceedings, Elsevier

  55. Tensorflow. https://www.tensorflow.org/. Accessed18June2021

  56. ScikitLearn. https://scikit-learn.org/stable/. Accessed18June2021

  57. Zhang J, Guo H, Chen Z (2021) A concatenated approach based on transfer learning and PCA for classifying bees and wasps. J Phys Conf Ser 1873(1):012058

    Google Scholar 

Download references

Acknowledgements

We thank Ayhan Gültekin (Kocaeli University) for providing the experimental dataset.

Funding

This work has been supported by the Kocaeli University Scientific Researchand Development Support Program (BAP) in Turkey under project number FBA-2020-2152.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ekin Ekinci.

Ethics declarations

Conflict of Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Omurca, S.İ., Ekinci, E., Sevim, S. et al. A document image classification system fusing deep and machine learning models. Appl Intell 53, 15295–15310 (2023). https://doi.org/10.1007/s10489-022-04306-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-04306-5

Keywords

Navigation