Skip to main content
Log in

Mobile ID Document Recognition–Coarse-to-Fine Approach

  • APPLICATION PROBLEMS
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

Automatic optical recognition of documents is a traditional function of modern document processing systems. In this context, recognition represents a complex process which includes image processing, segmentation, classification, and linguistic analysis. Although the idea of using mobile devices for recognition of paper documents is not new, direct usage of existing software solutions for scanned images recognition yields low recognition precision on images obtained using a mobile device. This is due, first of all, to perspective distortions and lower effective resolution in the latter case. In this paper, we present an original approach and a set of algorithms for recognition of video frame sequence containing a document image, which is suitable for mobile implementation. It is based on a coarse-to-fine methodology, where template matching and fields localization are performed on the image with lowered resolution, followed by lazy processing of parts of the images only corresponding to the fields which are not recognized yet. Video stream is utilized as a source of noise reduction both in coordinates of the fields and optical character recognition classifiers outputs. The algorithm based on the proposed approach is suitable for running on the device itself and can operate even when none of the video frames contain a document image of sufficient quality by themselves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.

Similar content being viewed by others

REFERENCES

  1. M. Aliev, D. Nikolaev, and A. Saraev, “Construction of fast computing adjustment for Niblack binarization algorithm,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 64 (3), 25–34 (2014).

    Google Scholar 

  2. V. V. Arlazarov, K. Bulatov, T. Manzhikov, O. Slavin, and I. Janiszewski, “Method of determining the necessary number of observations for video stream documents recognition,” Proc. SPIE 10696, 106961X (2018). https://doi.org/10.1117/12.2310132

    Article  Google Scholar 

  3. V. V. Arlazarov, K. B. Bulatov, T. S. Chernov, and V. L. Arlazarov, “MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream,” Komp’yut. Opt. 43, 818–824 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-818-824

    Article  Google Scholar 

  4. V. V. Arlazarov, A. Zhukovsky, V. Krivtsov, D. Nikolaev, and D. Polevoy, “Analysis of using stationary and mobile small-scale digital cameras for documents recognition,” Inf. Tekhnol. Vychisl. Sist., No. 3, 71–78 (2014).

  5. A.-M. Awal, N. Ghanmi, R. Sicre, and T. Furon, “Complex document classification and localization application on identity document images,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2018 (IEEE, 2018), vol. 01, pp. 426–431. https://doi.org/10.1109/ICDAR.2017.77

  6. P. Bezmaternykh, D. Ilin, and D. Nikolaev, “U-net-bin: hacking the document image binarization contest,” Komp’yut. Opt. 43, 825–832 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-825-832

    Article  Google Scholar 

  7. P. V. Bezmaternykh, D. P. Nikolaev, and V. L. Arlazarov, “Textual blocks rectification method based on fast hough transform analysis in identity documents recognition,” Proc. SPIE 10696, 1069606 (2018). https://doi.org/10.1117/12.2310162

    Article  Google Scholar 

  8. P. V. Bezmaternykh and D. P. Nikolaev, “A document skew detection method using fast Hough transform,” Proc. SPIE 11433, 114330J (2020). https://doi.org/10.1117/12.2559069

    Article  Google Scholar 

  9. K. Bulatov, V. V. Arlazarov, T. Chernov, O. Slavin, and D. Nikolaev, “Smart IDReader: Document recognition in video stream,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017 (IEEE, 2017), pp. 39–44. https://doi.org/10.1109/ICDAR.2017.347

  10. K. Bulatov, N. Razumnyi, and V. V. Arlazarov, “On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model,” Int. J. Doc. Anal. Recognit. 22, 303–314 (2019). https://doi.org/10.1007/s10032-019-00333-0

    Article  Google Scholar 

  11. K. B. Bulatov, N. V. Fedotova, and V. V. Arlazarov, “Fast approximate modelling of the next combination result for stopping the text field recognition in a video stream,” in 25th Int. Conf. on Pattern Recognition, Milan, 2021 (IEEE, 2021), pp. 239–246. https://doi.org/10.1109/ICPR48806.2021.9412574

  12. K. B. Bulatov, D. P. Nikolaev, and V. V. Postnikov, “General-purpose algorithm for text field OCR result post-procesing based on validation grammars,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 65 (4), 68–73 (2015).

    Google Scholar 

  13. K. Bulatov and D. Polevoy, “Reducing overconfidence in neural networks by dynamic variation of recognizer relevance,” in 29th European Conf. on Modelling and Simulation (ECMS 2015), Albena, Bulgaria, 2015 (Curran Associates, 2015), pp. 488–491. https://doi.org/10.7148/2015-0488

  14. R. G. Casey and E. Lecolinet, “A survey of methods and strategies in character segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 18, 690–706 (1996). https://doi.org/10.1109/34.506792

    Article  Google Scholar 

  15. D. M. Chandler, “Seven challenges in image quality assessment: Past, present, and future research,” Int. Scholarly Res. Not. 2013, 905685 (2013). https://doi.org/10.1155/2013/905685

    Article  Google Scholar 

  16. N. Chen and D. Blostein, “A survey of document image classification: problem statement, classifier architecture and performance evaluation,” Int. J. Doc. Anal. Recognit. 10, 1–16 (2007). https://doi.org/10.1007/s10032-006-0020-2

    Article  Google Scholar 

  17. T. S. Chernov, “Detection and filtration of glares in the tasks of document recognition on mobile devices,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 67, 66–74 (2017).

    Google Scholar 

  18. T. S. Chernov, D. A. Ilin, P. V. Bezmaternykh, I. A. Faradzhev, and S. M. Karpenko, “Research of segmentation methods for images of document textual blocks based on the structural analysis and machine learning,” Vestn. Ross. Fonda Fundam. Issled., No. 4 (2016), 55–71. https://doi.org/10.22204/2410-4639-2016-092-04-55-71

  19. T. S. Chernov, N. P. Razumnuy, A. S. Kozharinov, D. P. Nikolaev, and V. V. Arlazarov, “Image quality assessment for video stream recognition systems,” Proc. SPIE 10696, 106961U (2018). https://doi.org/10.1117/12.2309628

    Article  Google Scholar 

  20. T. S. Chernov, S. A. Ilyuhin, and V. V. Arlazarov, “Application of dynamic saliency maps to video stream recognition systems with image quality assessment,” Proc. SPIE 11041, 110410T (2019). https://doi.org/10.1117/12.2522768

    Article  Google Scholar 

  21. T. S. Chernov, S. I. Kolmakov, and D. P. Nikolaev, “An algorithm for detection and phase estimation of protective elements periodic lattice on document image,” Pattern Recognit. Image Anal. 27, 53–65 (2017). https://doi.org/10.1134/S1054661817010023

    Article  Google Scholar 

  22. Y. S. Chernyshova, A. N. Chirvonaya, and A. V. Sheshkus, “Localization of characters horizontal bounds in text line images with fully convolutional network,” Proc. SPIE 11433, 114333F (2020). https://doi.org/10.1117/12.2559449

    Article  Google Scholar 

  23. Y. S. Chernyshova, A. V. Gayer, and A. V. Sheshkus, “Generation method of synthetic training data for mobile OCR system,” Proc. SPIE 10696, 106962G (2018). https://doi.org/10.1117/12.2310119

    Article  Google Scholar 

  24. Y. S. Chernyshova, A. V. Sheshkus, and V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images,” IEEE Access 8, 32587–32600 (2020). https://doi.org/10.1109/ACCESS.2020.2974051

    Article  Google Scholar 

  25. Y. S. Chernyshova, A. V. Sheshkus, and V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images,” IEEE Access 8, 32587–32600 (2020). https://doi.org/10.1109/ACCESS.2020.2974051

    Article  Google Scholar 

  26. A. N. Chirvonaya, A. E. Lynchenko, Y. S. Chernyshova, and A. V. Sheshkus, “Comparison of the classifying and similarity metric-based neural networks through the recognition of the filed “gender” in Russian Federation passport,” Sensory Syst. 33, 65–69 (2019). https://doi.org/10.1134/S0235009219010049

    Article  Google Scholar 

  27. Y. S. Chow and H. Robbins, “A Martingale system theorem and applications,” in Proc. 4th Berkeley Symp. on Mathematical Statistics and Probability, Ed. by J. Neyman (Univ. of Calif. Press, Berkeley, Calif., 1961), vol. 1, pp. 93–104.

  28. L. De Koker, “Money laundering compliance—the challenges of technology,” in Financial Crimes: Psychological, Technological, and Ethical Issues, Ed. by M. Dion, D. Weisstub, and J. L. Richet, International Library of Ethics, Law, and the New Medicine, vol. 68 (Springer, Cham, 2016), pp. 329–347. https://doi.org/10.1007/978-3-319-32419-7_16

  29. D. Esser, K. Muthmann, and D. Schuster, “Information extraction efficiency of business documents captured with smartphones and tablets,” in Proc. of the 2013 ACM Symp. on Document Engineering, Florence, 2013 (Association for Computing Machinery, New York, 2013), pp. 111–114. https://doi.org/10.1145/2494266.2494302

  30. T. S. Ferguson, Optimal Stopping and Applications, https://www.math.ucla.edu/~tom/Stopping/Contents.html. Cited October 1, 2021.

  31. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM 24, 381–395 (1981). https://doi.org/10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  32. J. G. Fiscus, “A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER),” in IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, Santa Barbara, Calif., 1997 (IEEE, 1997), pp. 347–354. https://doi.org/10.1109/ASRU.1997.659110

  33. K. Gai, M. Qiu, and X. Sun, “A survey on fintech,” J. Network Comput. Appl. 103, 262–273 (2017). https://doi.org/10.1016/j.jnca.2017.10.011

    Article  Google Scholar 

  34. H. Hammarstrom, S. M. Virk, and M. Forsberg, “Poor man’s OCR post-correction: Unsupervised recognition of variant spelling applied to a multilingual document collection,” in Proc. of the 2nd Int. Conf. on Digital Access to Textual Cultural Heritage, Göttingen, 2017 (Association for Computing Machinery, New York, 2017), pp. 71–75. https://doi.org/10.1145/3078081.3078107

  35. Z. He, T. Tan, Z. Sun, and X. Qiu, “Toward accurate and fast iris segmentation for iris biometrics,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 1670–1684 (2009). https://doi.org/10.1109/TPAMI.2008.183

    Article  Google Scholar 

  36. G. Hua, Z. Liu, Z. Zhang, and Y. Wu, “Automatic business card scanning with a camera,” in Int. Conf. on Image Processing, Atlanta, 2006 (IEEE, 2006), pp. 373–376. https://doi.org/10.1109/ICIP.2006.312471

  37. S. A. Ilyuhin, A. V. Sheshkus, and V. L. Arlazarov, “Recognition of images of Korean characters using embedded networks,” Proc. SPIE 11433, 1143311 (2019). https://doi.org/10.1117/12.2559453

    Article  Google Scholar 

  38. S. A. Ilyukhin, A. V. Sheshkus, and V. L. Arlazarov, “Block convolutional layer for position dependent features calculation,” Proc. SPIE 11605, 116050R (2021). https://doi.org/10.1117/12.2587458

    Article  Google Scholar 

  39. M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” Int. J. Comput. Vision 116, 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z

    Article  MathSciNet  Google Scholar 

  40. K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: a survey,” Pattern Recognit. 37, 977–997 (2004). https://doi.org/10.1016/j.patcog.2003.10.012

    Article  Google Scholar 

  41. R. Kapinos, X. Feng, and P. Hilburger, “Forming scanned composite document with optical character recognition function,” US Patent No. US20150302246 (2015).

  42. J. Kaur and R. Mahajan, “A review of degraded document image binarization techniques,” Int. J. Adv. Res. Comput. Commun. Eng. 3, 6581–6586 (2014).

    Google Scholar 

  43. V. Kliatskine, G. Nepomniachtchi, and N. Kotovich, “Systems and methods for capturing critical fields from a mobile image of a credit card bill”, U.S. Patent No. 2014/0279323 (2014).

  44. T. Kobayashi, M. Iwamura, T. Matsuda, and K. Kise, “An anytime algorithm for camera-based character recognition,” in 12th Int. Conf. on Document Analysis and Recognition, Washington, D.C., 2013 (IEEE, 2013), pp. 1140–1144. https://doi.org/10.1109/ICDAR.2013.231

  45. I. V. Kondrashev, A. V. Sheshkus, and V. V. Arlazarov, “Distance-based online pairs generation method for metric networks training,” Proc. SPIE 11605, 1160508 (2020). https://doi.org/10.1117/12.2587175

    Article  Google Scholar 

  46. I. A. Konovalenko, J. A. Shemiakina, and I. A. Faradjev, “Calculation of a vanishing point by the maximum likelihood estimation method,” Vestn. Yuzhno-Ural. Gos. Univ., Ser. Math. Mod. Programm. 13, 107–117 (2020). https://doi.org/10.14529/mmp200108

    Article  Google Scholar 

  47. E. Limonova, P. Bezmaternykh, D. Nikolaev, and V. Arlazarov, “Slant rectification in Russian passport OCR system using fast Hough transform,” Proc. SPIE 10341, 103410P (2017). https://doi.org/10.1117/12.2268725

    Article  Google Scholar 

  48. E. Limonova, D. Ilin, and D. Nikolaev, “Improving neural network performance on SIMD architectures,” Proc. SPIE 9875, 98750L (2015). https://doi.org/10.1117/12.2228594

    Article  Google Scholar 

  49. E. Limonova, D. Matveev, D. Nikolaev, and V. V. Arlazarov, “Bipolar morphological neural networks: convolution without multiplication,” Proc. SPIE 11433, 114333J (2019). https://doi.org/10.1117/12.2559299

    Article  Google Scholar 

  50. E. Limonova, A. Sheshkus, A. Ivanova, and D. Nikolaev, “Convolutional neural network structure transformations for complexity reduction and speed improvement,” Pattern Recognit. Image Anal. 28, 24–33 (2018). https://doi.org/10.1134/S105466181801011X

    Article  Google Scholar 

  51. E. Limonova, A. Sheshkus, and D. Nikolaev, “Computational optimization of convolutional neural networks using separated filters architecture,” Int. J. Appl. Eng. Res. 11, 7491–7494 (2016).

    Google Scholar 

  52. E. E. Limonova, D. M. Alfonso, D. P. Nikolaev, and V. V. Arlazarov, “Bipolar morphological neural networks: Gate-efficient architecture for computer vision,” IEEE Access 9, 97569–97581 (2021). https://doi.org/10.1109/ACCESS.2021.3094484

    Article  Google Scholar 

  53. E. E. Limonova, A. P. Terekhin, D. P. Nikolaev, and V. V. Arlazarov, “Fast implementation of morphological filtering using arm neon extension,” Int. J. Appl. Eng. Res. 11, 11675–11680 (2016).

    Google Scholar 

  54. R. Llobet, J.-R. Cerdan-Navarro, J.-C. Perez-Cortes, and J. Arlandis, “OCR post-processing using weighted finite-state transducers,” in 20th Int. Conf. on Pattern Recognition, Istanbul, 2010 (IEEE, 2010), pp. 2021–2024. https://doi.org/10.1109/ICPR.2010.498

  55. M. M. Luqman, P. Gomez-Krämer, and J.-M. Ogier, “Mobile phone camera-based video scanning of paper documents,” in Camera-Based Document Analysis and Recognition. CBDAR 2013, Ed. by M. Iwamura and F. Shafait, Lecture Notes in Computer Science, vol. 8357 (Springer, Cham, 2014), pp. 164–178. https://doi.org/10.1007/978-3-319-05167-3_13

    Book  Google Scholar 

  56. S. Marinai, M. Gori, and G. Soda, “Artificial neural networks for document analysis and recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 23–35 (2005). https://doi.org/10.1109/TPAMI.2005.4

    Article  Google Scholar 

  57. D. P. Matalov, E. E. Limonova, N. S. Skoryukina, and V. V. Arlazarov, “RFDoc: Memory efficient local descriptors for id documents localization and classification,” Document Analysis and Recognition–ICDAR 2021, Ed. by J. Lladós, D. Lopresti, and S. Uchida, Lecture Notes in Computer Science, vol. 12822 (Springer, Cham, 2021). https://doi.org/10.1007/978-3-030-86331-9_14

    Book  Google Scholar 

  58. J. Mei, A. Islam, A. Moh’d, Y. Wu, and E. Milios, “Post-processing OCR text using web-scale corpora,” in Proc. of the 2017 ACM Symp. on Document Engineering, Valletta, 2017 (Association for Computing Machinery, New York, 2017), pp. 117–120, (2017). https://doi.org/10.1145/3103010.3121032

  59. A. Minkina, D. Nikolaev, S. Usilin, and V. Kozyrev, “Generalization of the viola-jones method as a decision tree of strong classifiers for real-time object recognition in video stream,” Proc. SPIE 9445, 944517 (2015). https://doi.org/10.1117/12.2180941

    Article  Google Scholar 

  60. G. Nagy, “Twenty years of document image analysis in PAMI”, IEEE Trans. Pattern Anal. Mach. Intell. 22, 38–62 (2000). https://doi.org/10.1109/34.824820

    Article  Google Scholar 

  61. D. P. Nikolaev, S. M. Karpenko, I. P Nikolayev, and P. P. Nikolaev, “Hough transform: underestimated tool in the computer vision field,” in Proc. 22nd European Conf. on Modelling and Simulation, ECMS 2008, Nicosia, 2008, pp. 238–243. https://doi.org/10.7148/2008-0238

  62. O. Petrova, K. Bulatov, V. V. Arlazarov, and V. L. Arlazarov, “Weighted combination of per-frame recognition results for text recognition in a video stream,” Komp’yut. Opt. 45 (1), 77–89 (2021).  https://doi.org/10.18287/2412-6179-CO-795

    Article  Google Scholar 

  63. D. Polevoy, K. Bulatov, N. Skoryukina, T. Chernov, V. Arlazarov, and A. Sheshkus, “Key aspects of document recognition using small digital cameras,” Vestn. Ross. Fonda Fundam. Issled., No. 4, 97–108 (2016). https://doi.org/10.22204/2410-4639-2016-092-04-97-108

  64. M. A. Povolotskiy and D. V. Tropin, “Dynamic programming approach to template-based OCR,” Proc. SPIE 11041, 110411T (2019). https://doi.org/10.1117/12.2522974

    Article  Google Scholar 

  65. T. Saba, G. Sulong, and A. Rehman, “A survey on methods and strategies on touched characters segmentation,” Int. J. Res. Rev. Comput. Sci. 1 (2), 103–114 (2010).

    Google Scholar 

  66. A. Sheshkus and V. L. Arlazarov, “Space symbol detection on complex background using visual context,” in 29th European Conf. on Modelling and Simulation (ECMS 2015), Albena, 2015 (Curran Associates, 2015), pp. 532–536. https://doi.org/10.7148/2015-0532

  67. A. Sheshkus, A. Ingacheva, V. Arlazarov, and D. Nikolaev, “HoughNet: Neural network architecture for vanishing points detection,” in Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019 (IEEE, 2019), pp. 844–849. https://doi.org/10.1109/ICDAR.2019.00140

  68. A. Sheshkus, E. Limonova, D. Nikolaev, and V. Krivtsov, “Combining convolutional neural networks and hough transform for classification of images containing lines,” Proc. SPIE 10341, 103411C (2017). https://doi.org/10.1117/12.2268717

    Article  Google Scholar 

  69. A. V. Sheshkus, Y. S. Chernyshova, A. N. Chirvonaya, and D. P. Nikolaev, “New criteria for neural network encoder learning in the string segmentation problem,” Sensory Syst. 33, 173–178 (2019). https://doi.org/10.1134/S0235009219020094

    Article  Google Scholar 

  70. N. Skoryukina, V. Arlazarov, and D. Nikolaev, “Fast method of ID documents location and type identification for mobile and server application,” in Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019 (IEEE, 2019), pp. 850–857. https://doi.org/10.1109/ICDAR.2019.00141

  71. N. Skoryukina, I. Faradjev, K. Bulatov, and V. V. Arlazarov, “Impact of geometrical restrictions in RANSAC sampling on the ID document classification,” Proc. SPIE 11433, 35–41 (2020). https://doi.org/10.1117/12.2559306

    Article  Google Scholar 

  72. N. Skoryukina, D. P. Nikolaev, A. Sheshkus, and D. Polevoy, “Real time rectangular document detection on mobile devices,” Proc. SPIE 9445, 94452A (2015). https://doi.org/10.1117/12.2181377

    Article  Google Scholar 

  73. N. S. Skoryukina, V. V. Arlazarov, and A. N. Milovzorov, “Memory consumption reduction for identity document classification with local and global features combination,” Proc. SPIE 11605, 116051G (2021). https://doi.org/10.1117/12.2587033

    Article  Google Scholar 

  74. D. G. Slugin and V. V. Arlazarov, “Text fields extraction based on image processing,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 67 (4), 65–73 (2017).

    Google Scholar 

  75. Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing the gap to human-level performance in face verification,” in IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, 2014 (IEEE, 2014), pp. 1701–1708. https://doi.org/10.1109/CVPR.2014.220

  76. L. Teplyakov, S. Gladilin, E. Shvets, and D. Nikolaev, “Training of neural network-based cascade classifiers,” J. Commun. Technol. Electron. 64, 846–853 (2019). https://doi.org/10.1134/S1064226919080254

    Article  Google Scholar 

  77. D. V. Tropin, I. A. Konovalenko, N. S. Skoryukina, D. P. Nikolaev, and V. V. Arlazarov, “Improved algorithm of ID card detection by a priori knowledge of the document aspect ratio,” Proc. SPIE 11605, 116051F (2020). https://doi.org/10.1117/12.2587029

    Article  Google Scholar 

  78. A. V. Trusov, E. E. Limonova, D. G. Slugin, D. P. Nikolaev, and V. V. Arlazarov, “Fast imple-mentation of 4-bit convolutional neural networks for mobile devices,” in 25th Int. Conf. on Pattern Recognition (ICPR), Milan, 2021 (IEEE, 2021), pp. 9897–9903. https://doi.org/10.1109/ICPR48806.2021.9412841

  79. A. V. Trusov, E. E. Limonova, and S. A. Usilin, “Almost indirect 8-bit convolution for QNNS,” Proc. SPIE 11605, 1160507 (2021). https://doi.org/10.1117/12.2587045

    Article  Google Scholar 

  80. S. Usilin, D. Nikolaev, V. Postnikov, and G. Schaefer, “Visual appearance based document image classification,” in IEEE Int. Conf. on Image Processing, Hong Kong, 2010 (IEEE, 2010), pp. 2133–2136. https://doi.org/10.1109/ICIP.2010.5652024

  81. P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vision 57, 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb

    Article  Google Scholar 

  82. X. Wang, A. Bissacco, G. Berntson, M. Nazif, J. Scheiner, S. Shih, M. Snyder, and D. Talavera, “Client side filtering of card OCR images,” US Patent No. 8903136 (2014).

  83. A. E. Zhukovskiy, D. P. Nikolaev, V. V. Arlazarov, V. V. Postnikov, D. V. Polevoy, N. S. Skoryukina, T. S. Chernov, Y. A. Shemyakina, A. A. Mukovozov, I. A. Konovalenko, and M. A. Povolotskiy, “Segments graph-based approach for document capture in a smartphone video stream,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017 (IEEE, 2017), vol. 01, pp. 337–342. https://doi.org/10.1109/ICDAR.2017.63

  84. S. Zilberstein, “Using anytime algorithms in intelligent systems,” AI Mag. 17 (3), 73–83 (1996). https://doi.org/10.1609/aimag.v17i3.1232

    Article  Google Scholar 

Download references

ACKNOWLEDGMENTS

Authors would like to express gratitude to Igor’ Aleksandrovich Faradjev and Aleksandr Borisovich Merkov for their valuable comments and methodological assistance as well as to Smart Engines Service LLC for providing private experimental results.

Funding

Some of the research presented in this paper has been partially funded by the Russian Foundation for Basic Research, project nos. 19-29-09092 and 19-29-09064.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to V. V. Arlazarov or D. V. Polevoy.

Ethics declarations

COMPLIANCE WITH ETHICAL STANDARDS

This article is a completely original work of its authors; it has not been published before and will not be sent to other publications until the PRIA Editorial Board decides not to accept it for publication.

Conflict of Interest

The authors declare that they have no conflicts of interest.

Additional information

Arlazarov Vladimir Lvovich (born 1939), Dr. Sci., Corresponding Member of the Russian Academy of Sciences, graduated from Moscow State University in 1961. Currently he works as head of sector at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). His research interests are game theory and pattern recognition.

Arlazarov Vladimir Viktorovich (born 1976), received the PhD degree in applied mathematics from the Moscow Institute of Steel and Alloys, in 1999. He works as a Head of Department at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences. He is currently an Associate Professor with the Moscow Institute of Physics and Technology (MIPT). His research interests are artificial intelligence, machine learning, recognition systems, information technology.

Bulatov Konstantin Bulatovich (born 1991), received his PhD degree in computer science from the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences in 2020. He is currently a Senior Researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences. Research interests include pattern recognition, computer vision, and document analysis systems.

Chernov Timofei Sergeevich (born 1992), PhD, graduated from the National University of Science and Technology MISiS in 2013. Received his PhD degree in computer science from the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences in 2018. Scientific interests: computer science, systems programming, computer vision, machine learning.

Nikolaev Dmitrii Petrovich (born 1978), PhD, received a master’s degree in physics and a Ph.D. degree in computer science from Moscow State University, Moscow, Russia, in 2000 and 2004, respectively. Since 2007, he has been the Head of the Vision Systems Laboratory, Institute for Information Transmission Problems, Russian Academy of Sciences (Kharkevich Institute) and, since 2016, he has been the CTO of Smart Engines Service LLC. Since 2016, he has been an Associate Professor with the Moscow Institute of Physics and Technology (MIPT), teaching the Image Processing and Analysis Course. His research activities are in the area of computer vision with primary application to color image understanding.

Polevoy Dmitry Valerevich (born 1981), PhD, received a master’s degree in applied mathematics and physics and a PhD degree in computer science from Moscow Institute of Physics and Technology (MIPT), in 2004 and 2007, respectively. Since 2011, he has been an Associate Professor with National University of Science and Technology “MISiS.” Currently he works as senior researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Research interests are pattern recognition and computer vision.

Sheshkus Alexandr Vladimirovich, (born 1986), received the BSc and MSc degrees in applied physics and mathematics from the Moscow Institute of Physics and Technology (MIPT) in 2009 and 2011, respectively. He is currently the Head of the Machine Learning Department, Smart Engines, and a Researcher with the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). His research interests include deep neural networks, computer vision, and projective invariant image segmentation.

Skoryukina Natal’ya Sergeevna (born 1991), graduated from National University of Science and Technology “MISiS” in 2013, majoring in Applied Mathematics. Computer programmer at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Scientific interests: image analysis, computer vision.

Slavin Oleg Anatolevich (born 1963), Dr. Sci. (Eng.), graduated from Moscow Institute Radiotechnics, Electronics and Automation (MIREA), majoring in Systems Engineering. Currently he works as a head of division at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Research interests are pattern recognition, computer vision and information systems.

Usilin Sergei Alexandrovich (born 1986), received the PhD degree in applied mathematics from the Moscow Institute of Physics and Technology (MIPT) in 2018. Works as a Senior Researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Scope of scientific interests: object detection, machine learning, recognition systems, digital image processing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arlazarov, V.L., Arlazarov, V.V., Bulatov, K.B. et al. Mobile ID Document Recognition–Coarse-to-Fine Approach. Pattern Recognit. Image Anal. 32, 89–108 (2022). https://doi.org/10.1134/S1054661822010023

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661822010023

Keywords:

Navigation