Parsing Digitized Vietnamese Paper Documents

Dieu, Linh Truong; Nguyen, Thuan Trong; Vo, Nguyen D.; Nguyen, Tam V.; Nguyen, Khang

doi:10.1007/978-3-030-89128-2_37

Linh Truong Dieu^14,15,
Thuan Trong Nguyen^14,15,
Nguyen D. Vo^14,15,
Tam V. Nguyen¹⁶ &
…
Khang Nguyen^14,15

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13052))

Included in the following conference series:

International Conference on Computer Analysis of Images and Patterns

735 Accesses
11 Citations

Abstract

In recent years, the need to exploit digitized document data has been increasing. In this paper, we address the problem of parsing digitized Vietnamese paper documents. The digitized Vietnamese documents are mainly in the form of scanned images with diverse layouts and special characters introducing many challenges. To this end, we first collect the UIT-DODV dataset, a novel Vietnamese document image dataset that includes scientific papers in Vietnamese derived from different scientific conferences. We compile both images that were converted from PDF and scanned by a smartphone in addition a physical scanner that poses many new challenges. Additionally, we further leverage the state-of-the-art object detector along with the fused loss function to efficiently parse the Vietnamese paper documents. Extensive experiments conducted on the UIT-DODV dataset provide a comprehensive evaluation and insightful analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
UIT-DODV published at https://uit-together.github.io/datasets/.
2.
https://github.com/DevashishPrasad/CascadeTabNet.

References

Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection (2020)
Google Scholar
Cesarini, F., Marinai, S., Sarti, L., Soda, G.: Trainable table location in document images. In: Object Recognition Supported by User Interaction for Service Robots, vol. 3, pp. 236–240 (2002)
Google Scholar
Etemad, K., Doermann, D., Chellappa, R.: Multiscale segmentation of unstructured document pages using soft decision integration. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 92–96 (1997)
Article Google Scholar
Gao, L., et al.: ICDAR 2019 competition on table detection and recognition (ctdar). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019)
Google Scholar
Gatos, B., Danatsas, D., Pratikakis, I., Perantonis, S.J.: Automatic table detection in document images. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005, Part I. LNCS, vol. 3686, pp. 609–618. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_67
Chapter Google Scholar
Ha, J., Phillips, I., Haralick, R.: Document page decomposition using bounding boxes of connected components of black pixels. In: Proceedings of SPIE - The International Society for Optical Engineering (March 1995)
Google Scholar
Huang, Y., et al.: A YOLO-based table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813–818. IEEE (2019)
Google Scholar
Kim, K., Lee, H.S.: Probabilistic anchor assignment with IoU prediction for object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020, Part XXV. LNCS, vol. 12370, pp. 355–371. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_22
Chapter Google Scholar
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: table benchmark for image-based table detection and recognition. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 1918–1925 (2020)
Google Scholar
Li, X., Yin, F., Liu, C.: Page object detection from pdf document images by deep structured prediction and supervised clustering. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3627–3632 (2018)
Google Scholar
Nguyen, T.V., Zhao, Q., Yan, S.: Attentive systems: a survey. Int. J. Comput. Vis. 126(1), 86–110 (2018)
Article Google Scholar
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents (2020)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Towards real-time object detection with region proposal networks. Faster R-CNN (2016)
Google Scholar
Sauvola, J., Pietikäinen, M.: Page segmentation and classification using fast feature extraction and connectivity analysis, vol. 2, pp. 1127–1131 (September 1995). ISBN 0-8186-7128-9
Google Scholar
Sun, N., Zhu, Y., Hu, X., et al.: Table detection using boundary refining via corner locating. In: Lin, Z. (ed.) PRCV 2019, Part I. LNCS, vol. 11857, pp. 135–146. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31654-9_12
Chapter Google Scholar
Vo, N.D., Nguyen, K., Nguyen, T.V., Nguyen, K.: Ensemble of deep object detectors for page object detection. In: Proceedings of the 12th International Conference on Ubiquitous Information Management and Communication, pp. 1–6 (2018)
Google Scholar
Zhong, X., Tang, J., Jimeno Yepes, A.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022 (2019)
Google Scholar

Download references

Acknowledgment

The research team would like to express our sincere thanks to the Multimedia Communications Laboratory (MMLab) - University of Information Technology, VNU-HCM for supporting this research. We want to thank Can Tho University Journal of Science for the assistance in the data collection. This project is partially funded under National Science Foundation (NSF) under Grant No. 2025234 and Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number DSC2021-26-03.

Author information

Authors and Affiliations

University of Information Technology, Ho Chi Minh, Vietnam
Linh Truong Dieu, Thuan Trong Nguyen, Nguyen D. Vo & Khang Nguyen
Vietnam National University, Ho Chi Minh, Vietnam
Linh Truong Dieu, Thuan Trong Nguyen, Nguyen D. Vo & Khang Nguyen
University of Dayton, Dayton, USA
Tam V. Nguyen

Authors

Linh Truong Dieu
View author publications
You can also search for this author in PubMed Google Scholar
Thuan Trong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen D. Vo
View author publications
You can also search for this author in PubMed Google Scholar
Tam V. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Khang Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Linh Truong Dieu , Thuan Trong Nguyen , Nguyen D. Vo , Tam V. Nguyen or Khang Nguyen .

Editor information

Editors and Affiliations

Cyprus University of Technology, Limassol, Cyprus
Nicolas Tsapatsoulis
University of Cyprus, Nicosia, Cyprus
Andreas Panayides
University of Cyprus, Nicosia, Cyprus
Theo Theocharides
Cyprus University of Technology, Limassol, Cyprus
Andreas Lanitis
University of Cyprus, Nicosia, Cyprus
Constantinos Pattichis
University of Salerno, Salerno, Italy
Mario Vento

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dieu, L.T., Nguyen, T.T., Vo, N.D., Nguyen, T.V., Nguyen, K. (2021). Parsing Digitized Vietnamese Paper Documents. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds) Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science(), vol 13052. Springer, Cham. https://doi.org/10.1007/978-3-030-89128-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-89128-2_37
Published: 31 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89127-5
Online ISBN: 978-3-030-89128-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics