Abstract
Optical character recognition (OCR) in general and identity card recognition (IDOCR) in particular, embedded in mobile cameras, are being interested in and attracting the research community in Vietnam. Due to the variety of camera devices that capture images, the IDOCR systems often face a lot of difficulties, typically as the input images are distorted, rotated, scaled, translated, or sheared lead to weak recognition accuracy results. In this paper, we focus on the distortion document image problem of Vietnamese IDOCR system and propose an effective method to solve this problem. Our solution includes three main stages: (i) ROI detection; (ii) Image segmentation and corner points detection. (iii) distorted image area rectification. The accuracy and execution time of the method are verified on a large amount of data collected from the real environment with very differently lighting condition, shooting distance, camera angle and image size. The experimental results show that our method gains high accuracy, real-time calculation and is able to deal with the distorted input images.
Similar content being viewed by others
References
Tan NTT, Trong KN (2019) A method for segmentation of vietnamese identification card text fields. Int J Adv Comput Sci Appl 10(10):415–421
Tan NTT, Lam LH, Nam NH (2020) An efficient method for automatic recognizing text fields on identification card. Vnu J Sci Math – Phys 36(1):64–70
Van Hoai DP, Duong H-T, Hoang VT (2021) Text recognition for Vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition (IJDAR). https://doi.org/10.1007/s10032-021-00363-7
Hung PD, Linh DQ (2019) Implementing an android application for automatic vietnamese business card recognition. Pattern Recogn Image Anal 29(1):156–166
Shafait F, Breuel TM (2007) Document image dewarping contest. In: 2nd International Workshop on Camera-Based Document Analysis and Recognition, Curitiba, pp 181–188
Duan L-Y, Ji R, Chen Z, Huang T, Gao W (2014) Towards mobiledocument image retrieval for digital library. IEEE Trans Multimed 16(2):346–359
Cutter M, Manduchi R (2015) Towards mobile ocr: How to take a good picture of a document without sight. In: Proc of the 15th ACM SIGWEB International Symposium on Document Engineering
Fang X, Fu X, Xu X (2017) ID card identification system based on image recognition. In: 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp 1488–1492. https://doi.org/10.1109/ICIEA.2017.8283074
Ravneet K (2018) Text recognition applications for mobile devices. J Glob Res Comput Sci 9 (4):20–24
Satyawan W, Octaviano Pratama M, Jannati R, Muhammad G, Fajar B, Hamzah H, Fikri R, Kristian K (2019) Citizen id card detection using image processing and optical character recognition. J Phys Conf Ser 1235:012049
Baek J, Kim G, Lee J, Park S, Han D, Yun S, Oh SJ, Lee H (2019) What Is Wrong With Scene Text Recognition Model Comparisons?. Dataset and Model Analysis, arXiv:1904.01906
Sourvanos N, Tsatiris G (2018) Challenges In Input Preprocessing for Mobile OCR Applications: A Realistic Testing Scenario. 9th International Conference on Information, Intelligence, Systems and Applications (IISA). https://doi.org/10.1109/iisa.2018.8633688
Bulatov K, Matalov D, Arlazarov V (2020) MIDV-2019: challenges of the modern mobile-based document OCR. Proceedings of SPIE 11433, Twelfth International Conference on Machine Vision (ICMV 2019), pp 114332N. https://doi.org/10.1117/12.2558438
Fu B, Wu M, Li R, Li W, Xu Z, Yang C (2007) A model-based book dewarping method using text line detection. In: Proc. Int. Workshop Camera-Based Document Anal. Recognit., Curitiba, pg 63–70
Zhang L, Zhang Y, Tan C (2008) An improved physically-based method for geometric restoration of distorted document images. IEEE Trans Pattern Anal Mach Intell 30(4):728–734
Liang J, DeMenthon D, Doermann D (2008) Geometric rectification of Camera-Captured document images. IEEE Trans Pattern Anal Mach Intell 30(4):591–605. https://doi.org/10.1109/TPAMI.2007.70724
Williem CS, Park IK (2015) Correcting geometric and photometric distortion of document images on a smartphone. J Electron Imaging 24(1):013038. https://doi.org/10.1117/1.JEI.24.1.013038
Kim BS, Koo HI, Cho NI (2015) Document dewarping via text line based optimization. Pattern Recogn 48(11):3600–3614
ChangJun ML A Mathematic Morphology Approach for Radial Lens Correction of Document Image. 2010 International Conference On Computer Design And Appliations, pp 465–s468
Luo S, Fang X, Zhao C, Luo Y (2011) Text Line Based Correction of Distorted Document Images. 2011 International Joint Conference of IEEE TrustCom-11/IEEE ICESS-11/FCST-11, pp 1494–s1499
Kil T, Seo W, Koo HI, Cho NI (2017) Robust Document Image Dewarping Method Using Text-Lines and Line Segments. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol 1. IEEE, pp 865–870
Zhang L, Yip AM, Brown MS, Tan C (2009) A Unified Framework for Document Restoration Using Inpainting and Shape-from-shading. Pattern Recogn 42(11), pp 2961–2978
Zhang L, Zhang Y, Tan C (2008) An improved physically based method for geometric restoration of distorted document images. IEEE Trans Pattern Anal Mach Intell 30(4):728–734
Ostlund J, Varol A, Ngo DT, Fua P (2012) Laplacian meshes for monocular 3D shape recovery. In: Proceedings of the European Conference on Computer Vision. Springer, pp 412–425
Meng G, Wang Y, Qu S, Xiang S, Pan C (2014) Active flattening of curved document images via two structured beams. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3890–3897
You S, Matsushita Y, Sinha S, Bou Y, Ikeuchi K (2018) Multiview rectification of folded documents. IEEE Trans Pattern Anal Mach Intell 40:505–511
Takezawa Y, Hasegawa M, Tabbone S (2016) Camera-captured document image perspective distortion correction using vanishing point detection based on radon transform. In: International Conference on Pattern Recognition, pp 3968–3974
Takezawa Y, Hasegawa M, Tabbone S (2017) Robust Perspective Rectification of Camera-Captured Document Images. 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp 27–32
Sheshkus A, Ingacheva A, Arlazarov V, Nikolaev D (2019) HoughNet: neural network architecture for vanishing points detection, arXiv:1909.03812
Yue L, Li H, Zheng X (2019) Distorted building image matching with automatic viewpoint rectification and fusion. Sensors 19(23):5205
Das S, Mishra G, Sudharshana A, Shilkrot R (2017) The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image. In: Proceedings of the 2017 ACM Symposium on Document Engineering, pp 125–128
Ke M, Shu Z, Bai X, Wang J, Samaras D (2018) Docunet: Document image unwarping via a stacked unet. International Conference on Computer Vision (ICCV), pp 4700– 4709
Das S, Ma K, Shu Z, Samaras D, Shilkrot R (2019) DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks. IEEE/CVF International Conference on Computer Vision (ICCV), pp 131–140
Liua X, Meng G, Fana B, Xianga S, Pan C (2020) Geometric rectification of document images using adversarial gated unwarping network. Pattern Recogn 108
Zou Z, Shi Z, Guo Y, Ye J Object detection in 20 years: A survey. [Online]. Available: https://arxiv.org/abs/1905.05055v2
Zhao Z, Zheng P, Xu S, Wu X (2019) Object detection with deep learning: a review, vol 30. https://doi.org/10.1109/TNNLS.2018.2876865
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access:1–1. https://doi.org/10.1109/access.2019.2939201
Long X, Deng K, Wang G, Zhang Y, Dang Q, Gao Y, Shen H, Ren J, Han S, Ding E et al (2020) Pp-yolo: An effective and efficient implementation of object detector. 2007.12099
Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. Proc IEEE Conf Comput Vis Pattern Recognit:779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision—ECCV. Springer, Cham, pp 21–37
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-CNN. In: Proceedings of IEEE ICCV, pp 2980–2988
Gonzalez RC, Woods RE (2018) Digital Image Processing, 4th edn. Pearson
Lakshmi S, Sankaranarayanan V (2011) A study of edge detection techniques for segmentation computing approaches. IJCA Special Issue on CASCT, pp 35–41
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. In: AAAI
Acknowledgments
This research was supported by Energy Cloud R&D Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (2019M3F2A1073387), and this research was supported by Institute for Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government (MSIT) (No.2018-0-01456, AutoMaTa: Autonomous Management framework based on artificial intelligent Technology for adaptive and disposable IoT). This research also was supported by the project “Research on contentbased image retrieval using relevance feedback with sparse representation classification” code CS21.04 of Institute of Information Technology (IOIT), Vietnam Academy of Science and Technology (VAST). Any correspondence related to this paper should be addressed to Prof. Dohyeun Kim and Prof. Le Anh Ngoc. We are also thankful to VietNam National Technology Program (KC 4.0) and the Electric Power University (EPU) for their sponsorship of this research.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tan, N.T.T., Van Huy, H., Kim, D.H. et al. An intelligent threats solution for object detection and resource perspective rectification of distorted anomaly identification card images in cloud environments. Appl Intell 53, 385–404 (2023). https://doi.org/10.1007/s10489-022-03261-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03261-5