Abstract
Text tracking is challenging due to unpredictable variations in orientation, shape, size, color and loss of information. This paper presents a new method for reconstructing text components especially from multi-views for tracking. Our first step is to find Text Candidates (TCs) from multi-views by exploring deep learning. Text candidates are then verified with the degree of similarity and dissimilarity estimated by SIFT feature to eliminate false text candidates, which results in Potential Text Candidates (PTCs). Potential text candidates are further aligned in standard format with the help of affine transform. Next, the proposed method uses mosaicing concept for stitching PTC from multi-views based on overlapping regions between PTC, which results in reconstructed images. Experimental results on a large dataset with multi-view images show that the proposed method is effective and useful. The recognition experiments of several recognition methods show that the performances of the recognition methods improve significantly for the reconstructed images compared to prior reconstruction results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yin, X.C., Zuo, Z.Y., Tian, S., Liu, C.L.: Text detection, tracking and recognition in video: A comprehensive survey. IEEE Trans. Image Process. 25(6), 2752–2773 (2016)
Jain, M., Mathew, M., Jawahar, C.V.: Unconstrained scene text and video text recognition for Arabic script. In: Proceedings of ASAR, pp. 26–30 (2017)
Tian, S., Yin, X.C., Su, Y., Hao, H.W.: A unified framework for tracking based text detection and recognition from web videos. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 542–554 (2018)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of CVPR, pp. 2315–2324 (2016)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Wu, Y., Shivakumara, P., Lu, T., Tan, C.L., Blumenstein, M., Kumar, G.H.: Contour restoration of text components for recognition in video/scene images. IEEE Trans. Image Process. 25(12), 5622–5634 (2016)
Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, pp. 2231–2239 (2016)
Bai, X., Yao, C., Liu, W.: Strokelets: A learned multi-scale mid-level representation for scene text recognition. IEEE Trans. Image Process. 25(6), 2789–2802 (2016)
Yang, H., Li, S., Yin, X., Han, A., Zhang, J.: Recurrent highway networks with attention mechanism for scene text recognition. In: Proceeding of the DICTA, pp. 1–8 (2017)
Endo, R., Kawai, Y., Sumiyoshi, H., Sano, M.: Scene-text-detection method robust against orientation and discontiguous components of characters. In: Proceedings of the CVPR, pp. 1–9 (2017)
Krämer, M., Afzal, M.Z., Bukhari, S.S., Shafait, F., Breuel, T.M.: Robust stereo correspondence for documents by matching connected components of text-lines with dynamic programming. In: Proceedings of the ICPR, pp. 734–737 (2012)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of ICCV, pp. 1150–1157 (1999)
Beck, J., Sutter, A., Ivry, R.: Spatial frequency channels and perceptual grouping in texture segregation. In: Proceedings of ICVGIP, pp. 299–325 (1987)
Lee, Jj, Kim, G.: Robust estimation of camera homography using fuzzy RANSAC. In: Gervasi, O., Gavrilova, Marina L. (eds.) ICCSA 2007. LNCS, vol. 4705, pp. 992–1002. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74472-6_81
Michahial, S.: Automatic Image Mosaicing Using Sift Ransac and Homography. IJENT 3(10), 247–251 (2014)
Mittal A., Moorthy A.K., Bovik A.C.: Blind/Referenceless image spatial quality evaluator. In: Proceedings of ACSSC, pp. 723–727(2011)
Mittal A., Soundararajan R., Bovik A.C.: Making a ‘Completely Blind’ image quality analyzer. In: Proceedings of ISPL, pp. 209–212(2013)
Blanchet G., Moisan L., Roug´e, B.: Measuring the global phase coherence of an image. In: Proceedings of ICIP, pp. 1176–1179(2008)
Blanchet, G., Moisan, L.: An explicit sharpness index related to global phase coherence. In: Proceedings of ICASSP, pp. 1065–1068(2012)
Robust Reading Competition. http://rrc.cvc.uab.es/?ch=5&com=evaluation&task=2. accessed 17 May 2018
Google Cloud Vision API. https://cloud.google.com/vision/. accessed 17 May 2018
Tesseract OCR. https://github.com/tesseract-ocr/tesseract. accessed 17 May 2018
Acknowledgment
The work described in this paper was supported by the Natural Science Foundation of China under Grant No. 61672273 and No. 61272218, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant No. BK20160021, and Scientific Foundation of State Grid Corporation of China (Research on Ice-wind Disaster Feature Recognition and Prediction by Few-shot Machine Learning in Transmission Lines).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Yuan, M., Shivakumara, P., Kong, H., Lu, T., Pal, U. (2018). Text Component Reconstruction for Tracking in Video. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-00776-8_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)