Abstract
Automatic optical recognition of documents is a traditional function of modern document processing systems. In this context, recognition represents a complex process which includes image processing, segmentation, classification, and linguistic analysis. Although the idea of using mobile devices for recognition of paper documents is not new, direct usage of existing software solutions for scanned images recognition yields low recognition precision on images obtained using a mobile device. This is due, first of all, to perspective distortions and lower effective resolution in the latter case. In this paper, we present an original approach and a set of algorithms for recognition of video frame sequence containing a document image, which is suitable for mobile implementation. It is based on a coarse-to-fine methodology, where template matching and fields localization are performed on the image with lowered resolution, followed by lazy processing of parts of the images only corresponding to the fields which are not recognized yet. Video stream is utilized as a source of noise reduction both in coordinates of the fields and optical character recognition classifiers outputs. The algorithm based on the proposed approach is suitable for running on the device itself and can operate even when none of the video frames contain a document image of sufficient quality by themselves.
Similar content being viewed by others
REFERENCES
M. Aliev, D. Nikolaev, and A. Saraev, “Construction of fast computing adjustment for Niblack binarization algorithm,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 64 (3), 25–34 (2014).
V. V. Arlazarov, K. Bulatov, T. Manzhikov, O. Slavin, and I. Janiszewski, “Method of determining the necessary number of observations for video stream documents recognition,” Proc. SPIE 10696, 106961X (2018). https://doi.org/10.1117/12.2310132
V. V. Arlazarov, K. B. Bulatov, T. S. Chernov, and V. L. Arlazarov, “MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream,” Komp’yut. Opt. 43, 818–824 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-818-824
V. V. Arlazarov, A. Zhukovsky, V. Krivtsov, D. Nikolaev, and D. Polevoy, “Analysis of using stationary and mobile small-scale digital cameras for documents recognition,” Inf. Tekhnol. Vychisl. Sist., No. 3, 71–78 (2014).
A.-M. Awal, N. Ghanmi, R. Sicre, and T. Furon, “Complex document classification and localization application on identity document images,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2018 (IEEE, 2018), vol. 01, pp. 426–431. https://doi.org/10.1109/ICDAR.2017.77
P. Bezmaternykh, D. Ilin, and D. Nikolaev, “U-net-bin: hacking the document image binarization contest,” Komp’yut. Opt. 43, 825–832 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-825-832
P. V. Bezmaternykh, D. P. Nikolaev, and V. L. Arlazarov, “Textual blocks rectification method based on fast hough transform analysis in identity documents recognition,” Proc. SPIE 10696, 1069606 (2018). https://doi.org/10.1117/12.2310162
P. V. Bezmaternykh and D. P. Nikolaev, “A document skew detection method using fast Hough transform,” Proc. SPIE 11433, 114330J (2020). https://doi.org/10.1117/12.2559069
K. Bulatov, V. V. Arlazarov, T. Chernov, O. Slavin, and D. Nikolaev, “Smart IDReader: Document recognition in video stream,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017 (IEEE, 2017), pp. 39–44. https://doi.org/10.1109/ICDAR.2017.347
K. Bulatov, N. Razumnyi, and V. V. Arlazarov, “On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model,” Int. J. Doc. Anal. Recognit. 22, 303–314 (2019). https://doi.org/10.1007/s10032-019-00333-0
K. B. Bulatov, N. V. Fedotova, and V. V. Arlazarov, “Fast approximate modelling of the next combination result for stopping the text field recognition in a video stream,” in 25th Int. Conf. on Pattern Recognition, Milan, 2021 (IEEE, 2021), pp. 239–246. https://doi.org/10.1109/ICPR48806.2021.9412574
K. B. Bulatov, D. P. Nikolaev, and V. V. Postnikov, “General-purpose algorithm for text field OCR result post-procesing based on validation grammars,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 65 (4), 68–73 (2015).
K. Bulatov and D. Polevoy, “Reducing overconfidence in neural networks by dynamic variation of recognizer relevance,” in 29th European Conf. on Modelling and Simulation (ECMS 2015), Albena, Bulgaria, 2015 (Curran Associates, 2015), pp. 488–491. https://doi.org/10.7148/2015-0488
R. G. Casey and E. Lecolinet, “A survey of methods and strategies in character segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 18, 690–706 (1996). https://doi.org/10.1109/34.506792
D. M. Chandler, “Seven challenges in image quality assessment: Past, present, and future research,” Int. Scholarly Res. Not. 2013, 905685 (2013). https://doi.org/10.1155/2013/905685
N. Chen and D. Blostein, “A survey of document image classification: problem statement, classifier architecture and performance evaluation,” Int. J. Doc. Anal. Recognit. 10, 1–16 (2007). https://doi.org/10.1007/s10032-006-0020-2
T. S. Chernov, “Detection and filtration of glares in the tasks of document recognition on mobile devices,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 67, 66–74 (2017).
T. S. Chernov, D. A. Ilin, P. V. Bezmaternykh, I. A. Faradzhev, and S. M. Karpenko, “Research of segmentation methods for images of document textual blocks based on the structural analysis and machine learning,” Vestn. Ross. Fonda Fundam. Issled., No. 4 (2016), 55–71. https://doi.org/10.22204/2410-4639-2016-092-04-55-71
T. S. Chernov, N. P. Razumnuy, A. S. Kozharinov, D. P. Nikolaev, and V. V. Arlazarov, “Image quality assessment for video stream recognition systems,” Proc. SPIE 10696, 106961U (2018). https://doi.org/10.1117/12.2309628
T. S. Chernov, S. A. Ilyuhin, and V. V. Arlazarov, “Application of dynamic saliency maps to video stream recognition systems with image quality assessment,” Proc. SPIE 11041, 110410T (2019). https://doi.org/10.1117/12.2522768
T. S. Chernov, S. I. Kolmakov, and D. P. Nikolaev, “An algorithm for detection and phase estimation of protective elements periodic lattice on document image,” Pattern Recognit. Image Anal. 27, 53–65 (2017). https://doi.org/10.1134/S1054661817010023
Y. S. Chernyshova, A. N. Chirvonaya, and A. V. Sheshkus, “Localization of characters horizontal bounds in text line images with fully convolutional network,” Proc. SPIE 11433, 114333F (2020). https://doi.org/10.1117/12.2559449
Y. S. Chernyshova, A. V. Gayer, and A. V. Sheshkus, “Generation method of synthetic training data for mobile OCR system,” Proc. SPIE 10696, 106962G (2018). https://doi.org/10.1117/12.2310119
Y. S. Chernyshova, A. V. Sheshkus, and V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images,” IEEE Access 8, 32587–32600 (2020). https://doi.org/10.1109/ACCESS.2020.2974051
Y. S. Chernyshova, A. V. Sheshkus, and V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images,” IEEE Access 8, 32587–32600 (2020). https://doi.org/10.1109/ACCESS.2020.2974051
A. N. Chirvonaya, A. E. Lynchenko, Y. S. Chernyshova, and A. V. Sheshkus, “Comparison of the classifying and similarity metric-based neural networks through the recognition of the filed “gender” in Russian Federation passport,” Sensory Syst. 33, 65–69 (2019). https://doi.org/10.1134/S0235009219010049
Y. S. Chow and H. Robbins, “A Martingale system theorem and applications,” in Proc. 4th Berkeley Symp. on Mathematical Statistics and Probability, Ed. by J. Neyman (Univ. of Calif. Press, Berkeley, Calif., 1961), vol. 1, pp. 93–104.
L. De Koker, “Money laundering compliance—the challenges of technology,” in Financial Crimes: Psychological, Technological, and Ethical Issues, Ed. by M. Dion, D. Weisstub, and J. L. Richet, International Library of Ethics, Law, and the New Medicine, vol. 68 (Springer, Cham, 2016), pp. 329–347. https://doi.org/10.1007/978-3-319-32419-7_16
D. Esser, K. Muthmann, and D. Schuster, “Information extraction efficiency of business documents captured with smartphones and tablets,” in Proc. of the 2013 ACM Symp. on Document Engineering, Florence, 2013 (Association for Computing Machinery, New York, 2013), pp. 111–114. https://doi.org/10.1145/2494266.2494302
T. S. Ferguson, Optimal Stopping and Applications, https://www.math.ucla.edu/~tom/Stopping/Contents.html. Cited October 1, 2021.
M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM 24, 381–395 (1981). https://doi.org/10.1145/358669.358692
J. G. Fiscus, “A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER),” in IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, Santa Barbara, Calif., 1997 (IEEE, 1997), pp. 347–354. https://doi.org/10.1109/ASRU.1997.659110
K. Gai, M. Qiu, and X. Sun, “A survey on fintech,” J. Network Comput. Appl. 103, 262–273 (2017). https://doi.org/10.1016/j.jnca.2017.10.011
H. Hammarstrom, S. M. Virk, and M. Forsberg, “Poor man’s OCR post-correction: Unsupervised recognition of variant spelling applied to a multilingual document collection,” in Proc. of the 2nd Int. Conf. on Digital Access to Textual Cultural Heritage, Göttingen, 2017 (Association for Computing Machinery, New York, 2017), pp. 71–75. https://doi.org/10.1145/3078081.3078107
Z. He, T. Tan, Z. Sun, and X. Qiu, “Toward accurate and fast iris segmentation for iris biometrics,” IEEE Trans. Pattern Anal. Mach. Intell. 31, 1670–1684 (2009). https://doi.org/10.1109/TPAMI.2008.183
G. Hua, Z. Liu, Z. Zhang, and Y. Wu, “Automatic business card scanning with a camera,” in Int. Conf. on Image Processing, Atlanta, 2006 (IEEE, 2006), pp. 373–376. https://doi.org/10.1109/ICIP.2006.312471
S. A. Ilyuhin, A. V. Sheshkus, and V. L. Arlazarov, “Recognition of images of Korean characters using embedded networks,” Proc. SPIE 11433, 1143311 (2019). https://doi.org/10.1117/12.2559453
S. A. Ilyukhin, A. V. Sheshkus, and V. L. Arlazarov, “Block convolutional layer for position dependent features calculation,” Proc. SPIE 11605, 116050R (2021). https://doi.org/10.1117/12.2587458
M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” Int. J. Comput. Vision 116, 1–20 (2016). https://doi.org/10.1007/s11263-015-0823-z
K. Jung, K. I. Kim, and A. K. Jain, “Text information extraction in images and video: a survey,” Pattern Recognit. 37, 977–997 (2004). https://doi.org/10.1016/j.patcog.2003.10.012
R. Kapinos, X. Feng, and P. Hilburger, “Forming scanned composite document with optical character recognition function,” US Patent No. US20150302246 (2015).
J. Kaur and R. Mahajan, “A review of degraded document image binarization techniques,” Int. J. Adv. Res. Comput. Commun. Eng. 3, 6581–6586 (2014).
V. Kliatskine, G. Nepomniachtchi, and N. Kotovich, “Systems and methods for capturing critical fields from a mobile image of a credit card bill”, U.S. Patent No. 2014/0279323 (2014).
T. Kobayashi, M. Iwamura, T. Matsuda, and K. Kise, “An anytime algorithm for camera-based character recognition,” in 12th Int. Conf. on Document Analysis and Recognition, Washington, D.C., 2013 (IEEE, 2013), pp. 1140–1144. https://doi.org/10.1109/ICDAR.2013.231
I. V. Kondrashev, A. V. Sheshkus, and V. V. Arlazarov, “Distance-based online pairs generation method for metric networks training,” Proc. SPIE 11605, 1160508 (2020). https://doi.org/10.1117/12.2587175
I. A. Konovalenko, J. A. Shemiakina, and I. A. Faradjev, “Calculation of a vanishing point by the maximum likelihood estimation method,” Vestn. Yuzhno-Ural. Gos. Univ., Ser. Math. Mod. Programm. 13, 107–117 (2020). https://doi.org/10.14529/mmp200108
E. Limonova, P. Bezmaternykh, D. Nikolaev, and V. Arlazarov, “Slant rectification in Russian passport OCR system using fast Hough transform,” Proc. SPIE 10341, 103410P (2017). https://doi.org/10.1117/12.2268725
E. Limonova, D. Ilin, and D. Nikolaev, “Improving neural network performance on SIMD architectures,” Proc. SPIE 9875, 98750L (2015). https://doi.org/10.1117/12.2228594
E. Limonova, D. Matveev, D. Nikolaev, and V. V. Arlazarov, “Bipolar morphological neural networks: convolution without multiplication,” Proc. SPIE 11433, 114333J (2019). https://doi.org/10.1117/12.2559299
E. Limonova, A. Sheshkus, A. Ivanova, and D. Nikolaev, “Convolutional neural network structure transformations for complexity reduction and speed improvement,” Pattern Recognit. Image Anal. 28, 24–33 (2018). https://doi.org/10.1134/S105466181801011X
E. Limonova, A. Sheshkus, and D. Nikolaev, “Computational optimization of convolutional neural networks using separated filters architecture,” Int. J. Appl. Eng. Res. 11, 7491–7494 (2016).
E. E. Limonova, D. M. Alfonso, D. P. Nikolaev, and V. V. Arlazarov, “Bipolar morphological neural networks: Gate-efficient architecture for computer vision,” IEEE Access 9, 97569–97581 (2021). https://doi.org/10.1109/ACCESS.2021.3094484
E. E. Limonova, A. P. Terekhin, D. P. Nikolaev, and V. V. Arlazarov, “Fast implementation of morphological filtering using arm neon extension,” Int. J. Appl. Eng. Res. 11, 11675–11680 (2016).
R. Llobet, J.-R. Cerdan-Navarro, J.-C. Perez-Cortes, and J. Arlandis, “OCR post-processing using weighted finite-state transducers,” in 20th Int. Conf. on Pattern Recognition, Istanbul, 2010 (IEEE, 2010), pp. 2021–2024. https://doi.org/10.1109/ICPR.2010.498
M. M. Luqman, P. Gomez-Krämer, and J.-M. Ogier, “Mobile phone camera-based video scanning of paper documents,” in Camera-Based Document Analysis and Recognition. CBDAR 2013, Ed. by M. Iwamura and F. Shafait, Lecture Notes in Computer Science, vol. 8357 (Springer, Cham, 2014), pp. 164–178. https://doi.org/10.1007/978-3-319-05167-3_13
S. Marinai, M. Gori, and G. Soda, “Artificial neural networks for document analysis and recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 27, 23–35 (2005). https://doi.org/10.1109/TPAMI.2005.4
D. P. Matalov, E. E. Limonova, N. S. Skoryukina, and V. V. Arlazarov, “RFDoc: Memory efficient local descriptors for id documents localization and classification,” Document Analysis and Recognition–ICDAR 2021, Ed. by J. Lladós, D. Lopresti, and S. Uchida, Lecture Notes in Computer Science, vol. 12822 (Springer, Cham, 2021). https://doi.org/10.1007/978-3-030-86331-9_14
J. Mei, A. Islam, A. Moh’d, Y. Wu, and E. Milios, “Post-processing OCR text using web-scale corpora,” in Proc. of the 2017 ACM Symp. on Document Engineering, Valletta, 2017 (Association for Computing Machinery, New York, 2017), pp. 117–120, (2017). https://doi.org/10.1145/3103010.3121032
A. Minkina, D. Nikolaev, S. Usilin, and V. Kozyrev, “Generalization of the viola-jones method as a decision tree of strong classifiers for real-time object recognition in video stream,” Proc. SPIE 9445, 944517 (2015). https://doi.org/10.1117/12.2180941
G. Nagy, “Twenty years of document image analysis in PAMI”, IEEE Trans. Pattern Anal. Mach. Intell. 22, 38–62 (2000). https://doi.org/10.1109/34.824820
D. P. Nikolaev, S. M. Karpenko, I. P Nikolayev, and P. P. Nikolaev, “Hough transform: underestimated tool in the computer vision field,” in Proc. 22nd European Conf. on Modelling and Simulation, ECMS 2008, Nicosia, 2008, pp. 238–243. https://doi.org/10.7148/2008-0238
O. Petrova, K. Bulatov, V. V. Arlazarov, and V. L. Arlazarov, “Weighted combination of per-frame recognition results for text recognition in a video stream,” Komp’yut. Opt. 45 (1), 77–89 (2021). https://doi.org/10.18287/2412-6179-CO-795
D. Polevoy, K. Bulatov, N. Skoryukina, T. Chernov, V. Arlazarov, and A. Sheshkus, “Key aspects of document recognition using small digital cameras,” Vestn. Ross. Fonda Fundam. Issled., No. 4, 97–108 (2016). https://doi.org/10.22204/2410-4639-2016-092-04-97-108
M. A. Povolotskiy and D. V. Tropin, “Dynamic programming approach to template-based OCR,” Proc. SPIE 11041, 110411T (2019). https://doi.org/10.1117/12.2522974
T. Saba, G. Sulong, and A. Rehman, “A survey on methods and strategies on touched characters segmentation,” Int. J. Res. Rev. Comput. Sci. 1 (2), 103–114 (2010).
A. Sheshkus and V. L. Arlazarov, “Space symbol detection on complex background using visual context,” in 29th European Conf. on Modelling and Simulation (ECMS 2015), Albena, 2015 (Curran Associates, 2015), pp. 532–536. https://doi.org/10.7148/2015-0532
A. Sheshkus, A. Ingacheva, V. Arlazarov, and D. Nikolaev, “HoughNet: Neural network architecture for vanishing points detection,” in Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019 (IEEE, 2019), pp. 844–849. https://doi.org/10.1109/ICDAR.2019.00140
A. Sheshkus, E. Limonova, D. Nikolaev, and V. Krivtsov, “Combining convolutional neural networks and hough transform for classification of images containing lines,” Proc. SPIE 10341, 103411C (2017). https://doi.org/10.1117/12.2268717
A. V. Sheshkus, Y. S. Chernyshova, A. N. Chirvonaya, and D. P. Nikolaev, “New criteria for neural network encoder learning in the string segmentation problem,” Sensory Syst. 33, 173–178 (2019). https://doi.org/10.1134/S0235009219020094
N. Skoryukina, V. Arlazarov, and D. Nikolaev, “Fast method of ID documents location and type identification for mobile and server application,” in Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019 (IEEE, 2019), pp. 850–857. https://doi.org/10.1109/ICDAR.2019.00141
N. Skoryukina, I. Faradjev, K. Bulatov, and V. V. Arlazarov, “Impact of geometrical restrictions in RANSAC sampling on the ID document classification,” Proc. SPIE 11433, 35–41 (2020). https://doi.org/10.1117/12.2559306
N. Skoryukina, D. P. Nikolaev, A. Sheshkus, and D. Polevoy, “Real time rectangular document detection on mobile devices,” Proc. SPIE 9445, 94452A (2015). https://doi.org/10.1117/12.2181377
N. S. Skoryukina, V. V. Arlazarov, and A. N. Milovzorov, “Memory consumption reduction for identity document classification with local and global features combination,” Proc. SPIE 11605, 116051G (2021). https://doi.org/10.1117/12.2587033
D. G. Slugin and V. V. Arlazarov, “Text fields extraction based on image processing,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 67 (4), 65–73 (2017).
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: Closing the gap to human-level performance in face verification,” in IEEE Conf. on Computer Vision and Pattern Recognition, Columbus, Ohio, 2014 (IEEE, 2014), pp. 1701–1708. https://doi.org/10.1109/CVPR.2014.220
L. Teplyakov, S. Gladilin, E. Shvets, and D. Nikolaev, “Training of neural network-based cascade classifiers,” J. Commun. Technol. Electron. 64, 846–853 (2019). https://doi.org/10.1134/S1064226919080254
D. V. Tropin, I. A. Konovalenko, N. S. Skoryukina, D. P. Nikolaev, and V. V. Arlazarov, “Improved algorithm of ID card detection by a priori knowledge of the document aspect ratio,” Proc. SPIE 11605, 116051F (2020). https://doi.org/10.1117/12.2587029
A. V. Trusov, E. E. Limonova, D. G. Slugin, D. P. Nikolaev, and V. V. Arlazarov, “Fast imple-mentation of 4-bit convolutional neural networks for mobile devices,” in 25th Int. Conf. on Pattern Recognition (ICPR), Milan, 2021 (IEEE, 2021), pp. 9897–9903. https://doi.org/10.1109/ICPR48806.2021.9412841
A. V. Trusov, E. E. Limonova, and S. A. Usilin, “Almost indirect 8-bit convolution for QNNS,” Proc. SPIE 11605, 1160507 (2021). https://doi.org/10.1117/12.2587045
S. Usilin, D. Nikolaev, V. Postnikov, and G. Schaefer, “Visual appearance based document image classification,” in IEEE Int. Conf. on Image Processing, Hong Kong, 2010 (IEEE, 2010), pp. 2133–2136. https://doi.org/10.1109/ICIP.2010.5652024
P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vision 57, 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb
X. Wang, A. Bissacco, G. Berntson, M. Nazif, J. Scheiner, S. Shih, M. Snyder, and D. Talavera, “Client side filtering of card OCR images,” US Patent No. 8903136 (2014).
A. E. Zhukovskiy, D. P. Nikolaev, V. V. Arlazarov, V. V. Postnikov, D. V. Polevoy, N. S. Skoryukina, T. S. Chernov, Y. A. Shemyakina, A. A. Mukovozov, I. A. Konovalenko, and M. A. Povolotskiy, “Segments graph-based approach for document capture in a smartphone video stream,” in 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017 (IEEE, 2017), vol. 01, pp. 337–342. https://doi.org/10.1109/ICDAR.2017.63
S. Zilberstein, “Using anytime algorithms in intelligent systems,” AI Mag. 17 (3), 73–83 (1996). https://doi.org/10.1609/aimag.v17i3.1232
ACKNOWLEDGMENTS
Authors would like to express gratitude to Igor’ Aleksandrovich Faradjev and Aleksandr Borisovich Merkov for their valuable comments and methodological assistance as well as to Smart Engines Service LLC for providing private experimental results.
Funding
Some of the research presented in this paper has been partially funded by the Russian Foundation for Basic Research, project nos. 19-29-09092 and 19-29-09064.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
COMPLIANCE WITH ETHICAL STANDARDS
This article is a completely original work of its authors; it has not been published before and will not be sent to other publications until the PRIA Editorial Board decides not to accept it for publication.
Conflict of Interest
The authors declare that they have no conflicts of interest.
Additional information
Arlazarov Vladimir Lvovich (born 1939), Dr. Sci., Corresponding Member of the Russian Academy of Sciences, graduated from Moscow State University in 1961. Currently he works as head of sector at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). His research interests are game theory and pattern recognition.
Arlazarov Vladimir Viktorovich (born 1976), received the PhD degree in applied mathematics from the Moscow Institute of Steel and Alloys, in 1999. He works as a Head of Department at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences. He is currently an Associate Professor with the Moscow Institute of Physics and Technology (MIPT). His research interests are artificial intelligence, machine learning, recognition systems, information technology.
Bulatov Konstantin Bulatovich (born 1991), received his PhD degree in computer science from the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences in 2020. He is currently a Senior Researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences. Research interests include pattern recognition, computer vision, and document analysis systems.
Chernov Timofei Sergeevich (born 1992), PhD, graduated from the National University of Science and Technology MISiS in 2013. Received his PhD degree in computer science from the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences in 2018. Scientific interests: computer science, systems programming, computer vision, machine learning.
Nikolaev Dmitrii Petrovich (born 1978), PhD, received a master’s degree in physics and a Ph.D. degree in computer science from Moscow State University, Moscow, Russia, in 2000 and 2004, respectively. Since 2007, he has been the Head of the Vision Systems Laboratory, Institute for Information Transmission Problems, Russian Academy of Sciences (Kharkevich Institute) and, since 2016, he has been the CTO of Smart Engines Service LLC. Since 2016, he has been an Associate Professor with the Moscow Institute of Physics and Technology (MIPT), teaching the Image Processing and Analysis Course. His research activities are in the area of computer vision with primary application to color image understanding.
Polevoy Dmitry Valerevich (born 1981), PhD, received a master’s degree in applied mathematics and physics and a PhD degree in computer science from Moscow Institute of Physics and Technology (MIPT), in 2004 and 2007, respectively. Since 2011, he has been an Associate Professor with National University of Science and Technology “MISiS.” Currently he works as senior researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Research interests are pattern recognition and computer vision.
Sheshkus Alexandr Vladimirovich, (born 1986), received the BSc and MSc degrees in applied physics and mathematics from the Moscow Institute of Physics and Technology (MIPT) in 2009 and 2011, respectively. He is currently the Head of the Machine Learning Department, Smart Engines, and a Researcher with the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). His research interests include deep neural networks, computer vision, and projective invariant image segmentation.
Skoryukina Natal’ya Sergeevna (born 1991), graduated from National University of Science and Technology “MISiS” in 2013, majoring in Applied Mathematics. Computer programmer at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Scientific interests: image analysis, computer vision.
Slavin Oleg Anatolevich (born 1963), Dr. Sci. (Eng.), graduated from Moscow Institute Radiotechnics, Electronics and Automation (MIREA), majoring in Systems Engineering. Currently he works as a head of division at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Research interests are pattern recognition, computer vision and information systems.
Usilin Sergei Alexandrovich (born 1986), received the PhD degree in applied mathematics from the Moscow Institute of Physics and Technology (MIPT) in 2018. Works as a Senior Researcher at the Federal Research Center “Computer Science and Control” of Russian Academy of Sciences (FRC CSC RAS). Scope of scientific interests: object detection, machine learning, recognition systems, digital image processing.
Rights and permissions
About this article
Cite this article
Arlazarov, V.L., Arlazarov, V.V., Bulatov, K.B. et al. Mobile ID Document Recognition–Coarse-to-Fine Approach. Pattern Recognit. Image Anal. 32, 89–108 (2022). https://doi.org/10.1134/S1054661822010023
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661822010023