Mobile ID Document Recognition–Coarse-to-Fine Approach

Arlazarov, V. L.; Arlazarov, V. V.; Bulatov, K. B.; Chernov, T. S.; Nikolaev, D. P.; Polevoy, D. V.; Sheshkus, A. V.; Skoryukina, N. S.; Slavin, O. A.; Usilin, S. A.

doi:10.1134/S1054661822010023

Mobile ID Document Recognition–Coarse-to-Fine Approach

APPLICATION PROBLEMS
Published: 18 March 2022

Volume 32, pages 89–108, (2022)
Cite this article

Pattern Recognition and Image Analysis Aims and scope Submit manuscript

V. L. Arlazarov^1,3,
V. V. Arlazarov^1,3,
K. B. Bulatov^1,3,
T. S. Chernov³,
D. P. Nikolaev^2,3,
D. V. Polevoy^1,3,
A. V. Sheshkus^1,3,
N. S. Skoryukina^1,3,
O. A. Slavin^1,3 &
…
S. A. Usilin^1,3

217 Accesses
3 Citations
Explore all metrics

Abstract

Automatic optical recognition of documents is a traditional function of modern document processing systems. In this context, recognition represents a complex process which includes image processing, segmentation, classification, and linguistic analysis. Although the idea of using mobile devices for recognition of paper documents is not new, direct usage of existing software solutions for scanned images recognition yields low recognition precision on images obtained using a mobile device. This is due, first of all, to perspective distortions and lower effective resolution in the latter case. In this paper, we present an original approach and a set of algorithms for recognition of video frame sequence containing a document image, which is suitable for mobile implementation. It is based on a coarse-to-fine methodology, where template matching and fields localization are performed on the image with lowered resolution, followed by lazy processing of parts of the images only corresponding to the fields which are not recognized yet. Video stream is utilized as a source of noise reduction both in coordinates of the fields and optical character recognition classifiers outputs. The algorithm based on the proposed approach is suitable for running on the device itself and can operate even when none of the video frames contain a document image of sufficient quality by themselves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6.

Fig. 7.

Fig. 9.

Fig. 10.

Fig. 11.

Mobile Phone Camera-Based Video Scanning of Paper Documents

Determining Optimal Frame Processing Strategies for Real-Time Document Recognition Systems

An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Documents

REFERENCES

M. Aliev, D. Nikolaev, and A. Saraev, “Construction of fast computing adjustment for Niblack binarization algorithm,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 64 (3), 25–34 (2014).
Google Scholar
V. V. Arlazarov, K. Bulatov, T. Manzhikov, O. Slavin, and I. Janiszewski, “Method of determining the necessary number of observations for video stream documents recognition,” Proc. SPIE 10696, 106961X (2018). https://doi.org/10.1117/12.2310132
Article Google Scholar
V. V. Arlazarov, K. B. Bulatov, T. S. Chernov, and V. L. Arlazarov, “MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream,” Komp’yut. Opt. 43, 818–824 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-818-824
Article Google Scholar
V. V. Arlazarov, A. Zhukovsky, V. Krivtsov, D. Nikolaev, and D. Polevoy, “Analysis of using stationary and mobile small-scale digital cameras for documents recognition,” Inf. Tekhnol. Vychisl. Sist., No. 3, 71–78 (2014).
A.-M. Awal, N. Ghanmi, R. Sicre, and T. Furon, “Complex document classification and localization application on identity document images,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2018 (IEEE, 2018), vol. 01, pp. 426–431. https://doi.org/10.1109/ICDAR.2017.77
P. Bezmaternykh, D. Ilin, and D. Nikolaev, “U-net-bin: hacking the document image binarization contest,” Komp’yut. Opt. 43, 825–832 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-825-832
Article Google Scholar
P. V. Bezmaternykh, D. P. Nikolaev, and V. L. Arlazarov, “Textual blocks rectification method based on fast hough transform analysis in identity documents recognition,” Proc. SPIE 10696, 1069606 (2018). https://doi.org/10.1117/12.2310162
Article Google Scholar
P. V. Bezmaternykh and D. P. Nikolaev, “A document skew detection method using fast Hough transform,” Proc. SPIE 11433, 114330J (2020). https://doi.org/10.1117/12.2559069
Article Google Scholar
K. Bulatov, V. V. Arlazarov, T. Chernov, O. Slavin, and D. Nikolaev, “Smart IDReader: Document recognition in video stream,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017 (IEEE, 2017), pp. 39–44. https://doi.org/10.1109/ICDAR.2017.347
K. Bulatov, N. Razumnyi, and V. V. Arlazarov, “On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model,” Int. J. Doc. Anal. Recognit. 22, 303–314 (2019). https://doi.org/10.1007/s10032-019-00333-0
Article Google Scholar
K. B. Bulatov, N. V. Fedotova, and V. V. Arlazarov, “Fast approximate modelling of the next combination result for stopping the text field recognition in a video stream,” in 25th Int. Conf. on Pattern Recognition, Milan, 2021 (IEEE, 2021), pp. 239–246. https://doi.org/10.1109/ICPR48806.2021.9412574
K. B. Bulatov, D. P. Nikolaev, and V. V. Postnikov, “General-purpose algorithm for text field OCR result post-procesing based on validation grammars,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 65 (4), 68–73 (2015).
Google Scholar
K. Bulatov and D. Polevoy, “Reducing overconfidence in neural networks by dynamic variation of recognizer relevance,” in 29th European Conf. on Modelling and Simulation (ECMS 2015), Albena, Bulgaria, 2015 (Curran Associates, 2015), pp. 488–491. https://doi.org/10.7148/2015-0488
R. G. Casey and E. Lecolinet, “A survey of methods and strategies in character segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 18, 690–706 (1996). https://doi.org/10.1109/34.506792
Article Google Scholar
D. M. Chandler, “Seven challenges in image quality assessment: Past, present, and future research,” Int. Scholarly Res. Not. 2013, 905685 (2013). https://doi.org/10.1155/2013/905685
Article Google Scholar
N. Chen and D. Blostein, “A survey of document image classification: problem statement, classifier architecture and performance evaluation,” Int. J. Doc. Anal. Recognit. 10, 1–16 (2007). https://doi.org/10.1007/s10032-006-0020-2
Article Google Scholar
T. S. Chernov, “Detection and filtration of glares in the tasks of document recognition on mobile devices,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 67, 66–74 (2017).
Google Scholar
T. S. Chernov, D. A. Ilin, P. V. Bezmaternykh, I. A. Faradzhev, and S. M. Karpenko, “Research of segmentation methods for images of document textual blocks based on the structural analysis and machine learning,” Vestn. Ross. Fonda Fundam. Issled., No. 4 (2016), 55–71. https://doi.org/10.22204/2410-4639-2016-092-04-55-71
T. S. Chernov, N. P. Razumnuy, A. S. Kozharinov, D. P. Nikolaev, and V. V. Arlazarov, “Image quality assessment for video stream recognition systems,” Proc. SPIE 10696, 106961U (2018). https://doi.org/10.1117/12.2309628
Article Google Scholar
T. S. Chernov, S. A. Ilyuhin, and V. V. Arlazarov, “Application of dynamic saliency maps to video stream recognition systems with image quality assessment,” Proc. SPIE 11041, 110410T (2019). https://doi.org/10.1117/12.2522768
Article Google Scholar
T. S. Chernov, S. I. Kolmakov, and D. P. Nikolaev, “An algorithm for detection and phase estimation of protective elements periodic lattice on document image,” Pattern Recognit. Image Anal. 27, 53–65 (2017). https://doi.org/10.1134/S1054661817010023
Article Google Scholar
Y. S. Chernyshova, A. N. Chirvonaya, and A. V. Sheshkus, “Localization of characters horizontal bounds in text line images with fully convolutional network,” Proc. SPIE 11433, 114333F (2020). https://doi.org/10.1117/12.2559449
Article Google Scholar
Y. S. Chernyshova, A. V. Gayer, and A. V. Sheshkus, “Generation method of synthetic training data for mobile OCR system,” Proc. SPIE 10696, 106962G (2018). https://doi.org/10.1117/12.2310119
Article Google Scholar
Y. S. Chernyshova, A. V. Sheshkus, and V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images,” IEEE Access 8, 32587–32600 (2020). https://doi.org/10.1109/ACCESS.2020.2974051
Article Google Scholar
Y. S. Chernyshova, A. V. Sheshkus, and V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images,” IEEE Access 8, 32587–32600 (2020). https://doi.org/10.1109/ACCESS.2020.2974051
Article Google Scholar
A. N. Chirvonaya, A. E. Lynchenko, Y. S. Chernyshova, and A. V. Sheshkus, “Comparison of the classifying and similarity metric-based neural networks through the recognition of the filed “gender” in Russian Federation passport,” Sensory Syst. 33, 65–69 (2019). https://doi.org/10.1134/S0235009219010049
Article Google Scholar
Y. S. Chow and H. Robbins, “A Martingale system theorem and applications,” in Proc. 4th Berkeley Symp. on Mathematical Statistics and Probability, Ed. by J. Neyman (Univ. of Calif. Press, Berkeley, Calif., 1961), vol. 1, pp. 93–104.
L. De Koker, “Money laundering compliance—the challenges of technology,” in Financial Crimes: Psychological, Technological, and Ethical Issues, Ed. by M. Dion, D. Weisstub, and J. L. Richet, International Library of Ethics, Law, and the New Medicine, vol. 68 (Springer, Cham, 2016), pp. 329–347. https://doi.org/10.1007/978-3-319-32419-7_16
D. Esser, K. Muthmann, and D. Schuster, “Information extraction efficiency of business documents captured with smartphones and tablets,” in Proc. of the 2013 ACM Symp. on Document Engineering, Florence, 2013 (Association for Computing Machinery, New York, 2013), pp. 111–114. https://doi.org/10.1145/2494266.2494302
T. S. Ferguson, Optimal Stopping and Applications, https://www.math.ucla.edu/~tom/Stopping/Contents.html. Cited October 1, 2021.
M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM 24, 381–395 (1981). https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
J. G. Fiscus, “A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER),” in IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, Santa Barbara, Calif., 1997 (IEEE, 1997), pp. 347–354. https://doi.org/10.1109/ASRU.1997.659110
K. Gai, M. Qiu, and X. Sun, “A survey on fintech,” J. Network Comput. Appl. 103, 262–273 (2017). https://doi.org/10.1016/j.jnca.2017.10.011
Article Google Scholar
H. Hammarstrom, S. M. Virk, and M. Forsberg, “Poor man’s OCR post-correction: Unsupervised recognition of variant spelling applied to a multilingual document collection,” in Proc. of the 2nd Int. Conf. on Digital Access to Textual Cultural Heritage, Göttingen, 2017 (Association for Computing Machinery, New York, 2017), pp. 71–75. https://doi.org/10.1145/3078081.3078107
Z. He, T. Tan, Z. Sun, and X. Qiu, “Toward accurate and fast iris segmentation for iris biometrics,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 1670–1684 (2009). https://doi.org/10.1109/TPAMI.2008.183
Article Google Scholar
G. Hua, Z. Liu, Z. Zhang, and Y. Wu, “Automatic business card scanning with a camera,” in Int. Conf. on Image Processing, Atlanta, 2006 (IEEE, 2006), pp. 373–376. https://doi.org/10.1109/ICIP.2006.312471
S. A. Ilyuhin, A. V. Sheshkus, and V. L. Arlazarov, “Recognition of images of Korean characters using embedded networks,” Proc. SPIE 11433, 1143311 (2019). https://doi.org/10.1117/12.2559453
Article Google Scholar
S. A. Ilyukhin, A. V. Sheshkus, and V. L. Arlazarov, “Block convolutional layer for position dependent features calculation,” Proc. SPIE 11605, 116050R (2021). https://doi.org/10.1117/12.2587458
Article Google Scholar
M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” Int. J. Comput. Vision 116, 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z
Article MathSciNet Google Scholar
K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: a survey,” Pattern Recognit. 37, 977–997 (2004). https://doi.org/10.1016/j.patcog.2003.10.012
Article Google Scholar
R. Kapinos, X. Feng, and P. Hilburger, “Forming scanned composite document with optical character recognition function,” US Patent No. US20150302246 (2015).
J. Kaur and R. Mahajan, “A review of degraded document image binarization techniques,” Int. J. Adv. Res. Comput. Commun. Eng. 3, 6581–6586 (2014).
Google Scholar
V. Kliatskine, G. Nepomniachtchi, and N. Kotovich, “Systems and methods for capturing critical fields from a mobile image of a credit card bill”, U.S. Patent No. 2014/0279323 (2014).
T. Kobayashi, M. Iwamura, T. Matsuda, and K. Kise, “An anytime algorithm for camera-based character recognition,” in 12th Int. Conf. on Document Analysis and Recognition, Washington, D.C., 2013 (IEEE, 2013), pp. 1140–1144. https://doi.org/10.1109/ICDAR.2013.231
I. V. Kondrashev, A. V. Sheshkus, and V. V. Arlazarov, “Distance-based online pairs generation method for metric networks training,” Proc. SPIE 11605, 1160508 (2020). https://doi.org/10.1117/12.2587175
Article Google Scholar
I. A. Konovalenko, J. A. Shemiakina, and I. A. Faradjev, “Calculation of a vanishing point by the maximum likelihood estimation method,” Vestn. Yuzhno-Ural. Gos. Univ., Ser. Math. Mod. Programm. 13, 107–117 (2020). https://doi.org/10.14529/mmp200108
Article Google Scholar
E. Limonova, P. Bezmaternykh, D. Nikolaev, and V. Arlazarov, “Slant rectification in Russian passport OCR system using fast Hough transform,” Proc. SPIE 10341, 103410P (2017). https://doi.org/10.1117/12.2268725
Article Google Scholar
E. Limonova, D. Ilin, and D. Nikolaev, “Improving neural network performance on SIMD architectures,” Proc. SPIE 9875, 98750L (2015). https://doi.org/10.1117/12.2228594
Article Google Scholar
E. Limonova, D. Matveev, D. Nikolaev, and V. V. Arlazarov, “Bipolar morphological neural networks: convolution without multiplication,” Proc. SPIE 11433, 114333J (2019). https://doi.org/10.1117/12.2559299
Article Google Scholar
E. Limonova, A. Sheshkus, A. Ivanova, and D. Nikolaev, “Convolutional neural network structure transformations for complexity reduction and speed improvement,” Pattern Recognit. Image Anal. 28, 24–33 (2018). https://doi.org/10.1134/S105466181801011X
Article Google Scholar
E. Limonova, A. Sheshkus, and D. Nikolaev, “Computational optimization of convolutional neural networks using separated filters architecture,” Int. J. Appl. Eng. Res. 11, 7491–7494 (2016).
Google Scholar
E. E. Limonova, D. M. Alfonso, D. P. Nikolaev, and V. V. Arlazarov, “Bipolar morphological neural networks: Gate-efficient architecture for computer vision,” IEEE Access 9, 97569–97581 (2021). https://doi.org/10.1109/ACCESS.2021.3094484
Article Google Scholar
E. E. Limonova, A. P. Terekhin, D. P. Nikolaev, and V. V. Arlazarov, “Fast implementation of morphological filtering using arm neon extension,” Int. J. Appl. Eng. Res. 11, 11675–11680 (2016).
Google Scholar
R. Llobet, J.-R. Cerdan-Navarro, J.-C. Perez-Cortes, and J. Arlandis, “OCR post-processing using weighted finite-state transducers,” in 20th Int. Conf. on Pattern Recognition, Istanbul, 2010 (IEEE, 2010), pp. 2021–2024. https://doi.org/10.1109/ICPR.2010.498
M. M. Luqman, P. Gomez-Krämer, and J.-M. Ogier, “Mobile phone camera-based video scanning of paper documents,” in Camera-Based Document Analysis and Recognition. CBDAR 2013, Ed. by M. Iwamura and F. Shafait, Lecture Notes in Computer Science, vol. 8357 (Springer, Cham, 2014), pp. 164–178. https://doi.org/10.1007/978-3-319-05167-3_13
Book Google Scholar
S. Marinai, M. Gori, and G. Soda, “Artificial neural networks for document analysis and recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 23–35 (2005). https://doi.org/10.1109/TPAMI.2005.4
Article Google Scholar
D. P. Matalov, E. E. Limonova, N. S. Skoryukina, and V. V. Arlazarov, “RFDoc: Memory efficient local descriptors for id documents localization and classification,” Document Analysis and Recognition–ICDAR 2021, Ed. by J. Lladós, D. Lopresti, and S. Uchida, Lecture Notes in Computer Science, vol. 12822 (Springer, Cham, 2021). https://doi.org/10.1007/978-3-030-86331-9_14
Book Google Scholar
J. Mei, A. Islam, A. Moh’d, Y. Wu, and E. Milios, “Post-processing OCR text using web-scale corpora,” in Proc. of the 2017 ACM Symp. on Document Engineering, Valletta, 2017 (Association for Computing Machinery, New York, 2017), pp. 117–120, (2017). https://doi.org/10.1145/3103010.3121032
A. Minkina, D. Nikolaev, S. Usilin, and V. Kozyrev, “Generalization of the viola-jones method as a decision tree of strong classifiers for real-time object recognition in video stream,” Proc. SPIE 9445, 944517 (2015). https://doi.org/10.1117/12.2180941
Article Google Scholar
G. Nagy, “Twenty years of document image analysis in PAMI”, IEEE Trans. Pattern Anal. Mach. Intell. 22, 38–62 (2000). https://doi.org/10.1109/34.824820
Article Google Scholar
D. P. Nikolaev, S. M. Karpenko, I. P Nikolayev, and P. P. Nikolaev, “Hough transform: underestimated tool in the computer vision field,” in Proc. 22nd European Conf. on Modelling and Simulation, ECMS 2008, Nicosia, 2008, pp. 238–243. https://doi.org/10.7148/2008-0238
O. Petrova, K. Bulatov, V. V. Arlazarov, and V. L. Arlazarov, “Weighted combination of per-frame recognition results for text recognition in a video stream,” Komp’yut. Opt. 45 (1), 77–89 (2021). https://doi.org/10.18287/2412-6179-CO-795
Article Google Scholar
D. Polevoy, K. Bulatov, N. Skoryukina, T. Chernov, V. Arlazarov, and A. Sheshkus, “Key aspects of document recognition using small digital cameras,” Vestn. Ross. Fonda Fundam. Issled., No. 4, 97–108 (2016). https://doi.org/10.22204/2410-4639-2016-092-04-97-108
M. A. Povolotskiy and D. V. Tropin, “Dynamic programming approach to template-based OCR,” Proc. SPIE 11041, 110411T (2019). https://doi.org/10.1117/12.2522974
Article Google Scholar
T. Saba, G. Sulong, and A. Rehman, “A survey on methods and strategies on touched characters segmentation,” Int. J. Res. Rev. Comput. Sci. 1 (2), 103–114 (2010).
Google Scholar
A. Sheshkus and V. L. Arlazarov, “Space symbol detection on complex background using visual context,” in 29th European Conf. on Modelling and Simulation (ECMS 2015), Albena, 2015 (Curran Associates, 2015), pp. 532–536. https://doi.org/10.7148/2015-0532
A. Sheshkus, A. Ingacheva, V. Arlazarov, and D. Nikolaev, “HoughNet: Neural network architecture for vanishing points detection,” in Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019 (IEEE, 2019), pp. 844–849. https://doi.org/10.1109/ICDAR.2019.00140
A. Sheshkus, E. Limonova, D. Nikolaev, and V. Krivtsov, “Combining convolutional neural networks and hough transform for classification of images containing lines,” Proc. SPIE 10341, 103411C (2017). https://doi.org/10.1117/12.2268717
Article Google Scholar
A. V. Sheshkus, Y. S. Chernyshova, A. N. Chirvonaya, and D. P. Nikolaev, “New criteria for neural network encoder learning in the string segmentation problem,” Sensory Syst. 33, 173–178 (2019). https://doi.org/10.1134/S0235009219020094
Article Google Scholar
N. Skoryukina, V. Arlazarov, and D. Nikolaev, “Fast method of ID documents location and type identification for mobile and server application,” in Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019 (IEEE, 2019), pp. 850–857. https://doi.org/10.1109/ICDAR.2019.00141
N. Skoryukina, I. Faradjev, K. Bulatov, and V. V. Arlazarov, “Impact of geometrical restrictions in RANSAC sampling on the ID document classification,” Proc. SPIE 11433, 35–41 (2020). https://doi.org/10.1117/12.2559306
Article Google Scholar
N. Skoryukina, D. P. Nikolaev, A. Sheshkus, and D. Polevoy, “Real time rectangular document detection on mobile devices,” Proc. SPIE 9445, 94452A (2015). https://doi.org/10.1117/12.2181377
Article Google Scholar
N. S. Skoryukina, V. V. Arlazarov, and A. N. Milovzorov, “Memory consumption reduction for identity document classification with local and global features combination,” Proc. SPIE 11605, 116051G (2021). https://doi.org/10.1117/12.2587033
Article Google Scholar
D. G. Slugin and V. V. Arlazarov, “Text fields extraction based on image processing,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 67 (4), 65–73 (2017).
Google Scholar
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing the gap to human-level performance in face verification,” in IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, 2014 (IEEE, 2014), pp. 1701–1708. https://doi.org/10.1109/CVPR.2014.220
L. Teplyakov, S. Gladilin, E. Shvets, and D. Nikolaev, “Training of neural network-based cascade classifiers,” J. Commun. Technol. Electron. 64, 846–853 (2019). https://doi.org/10.1134/S1064226919080254
Article Google Scholar
D. V. Tropin, I. A. Konovalenko, N. S. Skoryukina, D. P. Nikolaev, and V. V. Arlazarov, “Improved algorithm of ID card detection by a priori knowledge of the document aspect ratio,” Proc. SPIE 11605, 116051F (2020). https://doi.org/10.1117/12.2587029
Article Google Scholar
A. V. Trusov, E. E. Limonova, D. G. Slugin, D. P. Nikolaev, and V. V. Arlazarov, “Fast imple-mentation of 4-bit convolutional neural networks for mobile devices,” in 25th Int. Conf. on Pattern Recognition (ICPR), Milan, 2021 (IEEE, 2021), pp. 9897–9903. https://doi.org/10.1109/ICPR48806.2021.9412841
A. V. Trusov, E. E. Limonova, and S. A. Usilin, “Almost indirect 8-bit convolution for QNNS,” Proc. SPIE 11605, 1160507 (2021). https://doi.org/10.1117/12.2587045
Article Google Scholar
S. Usilin, D. Nikolaev, V. Postnikov, and G. Schaefer, “Visual appearance based document image classification,” in IEEE Int. Conf. on Image Processing, Hong Kong, 2010 (IEEE, 2010), pp. 2133–2136. https://doi.org/10.1109/ICIP.2010.5652024
P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vision 57, 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb
Article Google Scholar
X. Wang, A. Bissacco, G. Berntson, M. Nazif, J. Scheiner, S. Shih, M. Snyder, and D. Talavera, “Client side filtering of card OCR images,” US Patent No. 8903136 (2014).
A. E. Zhukovskiy, D. P. Nikolaev, V. V. Arlazarov, V. V. Postnikov, D. V. Polevoy, N. S. Skoryukina, T. S. Chernov, Y. A. Shemyakina, A. A. Mukovozov, I. A. Konovalenko, and M. A. Povolotskiy, “Segments graph-based approach for document capture in a smartphone video stream,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017 (IEEE, 2017), vol. 01, pp. 337–342. https://doi.org/10.1109/ICDAR.2017.63
S. Zilberstein, “Using anytime algorithms in intelligent systems,” AI Mag. 17 (3), 73–83 (1996). https://doi.org/10.1609/aimag.v17i3.1232
Article Google Scholar

Download references

ACKNOWLEDGMENTS

Authors would like to express gratitude to Igor’ Aleksandrovich Faradjev and Aleksandr Borisovich Merkov for their valuable comments and methodological assistance as well as to Smart Engines Service LLC for providing private experimental results.

Funding

Some of the research presented in this paper has been partially funded by the Russian Foundation for Basic Research, project nos. 19-29-09092 and 19-29-09064.

Author information

Authors and Affiliations

Federal Research Center Computer Science and Control of the Russian Academy of Sciences, 119333, Moscow, Russia
V. L. Arlazarov, V. V. Arlazarov, K. B. Bulatov, D. V. Polevoy, A. V. Sheshkus, N. S. Skoryukina, O. A. Slavin & S. A. Usilin
Institute for Information Transmission Problems of Russian Academy of Sciences (Kharkevich Institute), 127051, Moscow, Russia
D. P. Nikolaev
Smart Engines Service LLC, 121205, Moscow, Russia
V. L. Arlazarov, V. V. Arlazarov, K. B. Bulatov, T. S. Chernov, D. P. Nikolaev, D. V. Polevoy, A. V. Sheshkus, N. S. Skoryukina, O. A. Slavin & S. A. Usilin

Authors

V. L. Arlazarov
View author publications
You can also search for this author in PubMed Google Scholar
V. V. Arlazarov
View author publications
You can also search for this author in PubMed Google Scholar
K. B. Bulatov
View author publications
You can also search for this author in PubMed Google Scholar
T. S. Chernov
View author publications
You can also search for this author in PubMed Google Scholar
D. P. Nikolaev
View author publications
You can also search for this author in PubMed Google Scholar
D. V. Polevoy
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Sheshkus
View author publications
You can also search for this author in PubMed Google Scholar
N. S. Skoryukina
View author publications
You can also search for this author in PubMed Google Scholar
O. A. Slavin
View author publications
You can also search for this author in PubMed Google Scholar
S. A. Usilin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to V. V. Arlazarov or D. V. Polevoy.

Ethics declarations

COMPLIANCE WITH ETHICAL STANDARDS

This article is a completely original work of its authors; it has not been published before and will not be sent to other publications until the PRIA Editorial Board decides not to accept it for publication.

Conflict of Interest

The authors declare that they have no conflicts of interest.

Additional information

Arlazarov Vladimir Lvovich (born 1939), Dr. Sci., Corresponding Member of the Russian Academy of Sciences, graduated from Moscow State University in 1961. Currently he works as head of sector at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). His research interests are game theory and pattern recognition.

Arlazarov Vladimir Viktorovich (born 1976), received the PhD degree in applied mathematics from the Moscow Institute of Steel and Alloys, in 1999. He works as a Head of Department at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences. He is currently an Associate Professor with the Moscow Institute of Physics and Technology (MIPT). His research interests are artificial intelligence, machine learning, recognition systems, information technology.

Bulatov Konstantin Bulatovich (born 1991), received his PhD degree in computer science from the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences in 2020. He is currently a Senior Researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences. Research interests include pattern recognition, computer vision, and document analysis systems.

Chernov Timofei Sergeevich (born 1992), PhD, graduated from the National University of Science and Technology MISiS in 2013. Received his PhD degree in computer science from the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences in 2018. Scientific interests: computer science, systems programming, computer vision, machine learning.

Nikolaev Dmitrii Petrovich (born 1978), PhD, received a master’s degree in physics and a Ph.D. degree in computer science from Moscow State University, Moscow, Russia, in 2000 and 2004, respectively. Since 2007, he has been the Head of the Vision Systems Laboratory, Institute for Information Transmission Problems, Russian Academy of Sciences (Kharkevich Institute) and, since 2016, he has been the CTO of Smart Engines Service LLC. Since 2016, he has been an Associate Professor with the Moscow Institute of Physics and Technology (MIPT), teaching the Image Processing and Analysis Course. His research activities are in the area of computer vision with primary application to color image understanding.

Polevoy Dmitry Valerevich (born 1981), PhD, received a master’s degree in applied mathematics and physics and a PhD degree in computer science from Moscow Institute of Physics and Technology (MIPT), in 2004 and 2007, respectively. Since 2011, he has been an Associate Professor with National University of Science and Technology “MISiS.” Currently he works as senior researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Research interests are pattern recognition and computer vision.

Sheshkus Alexandr Vladimirovich, (born 1986), received the BSc and MSc degrees in applied physics and mathematics from the Moscow Institute of Physics and Technology (MIPT) in 2009 and 2011, respectively. He is currently the Head of the Machine Learning Department, Smart Engines, and a Researcher with the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). His research interests include deep neural networks, computer vision, and projective invariant image segmentation.

Skoryukina Natal’ya Sergeevna (born 1991), graduated from National University of Science and Technology “MISiS” in 2013, majoring in Applied Mathematics. Computer programmer at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Scientific interests: image analysis, computer vision.

Slavin Oleg Anatolevich (born 1963), Dr. Sci. (Eng.), graduated from Moscow Institute Radiotechnics, Electronics and Automation (MIREA), majoring in Systems Engineering. Currently he works as a head of division at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Research interests are pattern recognition, computer vision and information systems.

Usilin Sergei Alexandrovich (born 1986), received the PhD degree in applied mathematics from the Moscow Institute of Physics and Technology (MIPT) in 2018. Works as a Senior Researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Scope of scientific interests: object detection, machine learning, recognition systems, digital image processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arlazarov, V.L., Arlazarov, V.V., Bulatov, K.B. et al. Mobile ID Document Recognition–Coarse-to-Fine Approach. Pattern Recognit. Image Anal. 32, 89–108 (2022). https://doi.org/10.1134/S1054661822010023

Download citation

Received: 28 October 2021
Revised: 28 October 2021
Accepted: 28 October 2021
Published: 18 March 2022
Issue Date: March 2022
DOI: https://doi.org/10.1134/S1054661822010023

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions