Text Component Reconstruction for Tracking in Video

Yuan, Minglei; Shivakumara, Palaiahnakote; Kong, Hao; Lu, Tong; Pal, Umapada

doi:10.1007/978-3-030-00776-8_40

Minglei Yuan¹⁸,
Palaiahnakote Shivakumara¹⁹,
Hao Kong¹⁸,
Tong Lu¹⁸ &
…
Umapada Pal²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3650 Accesses

Abstract

Text tracking is challenging due to unpredictable variations in orientation, shape, size, color and loss of information. This paper presents a new method for reconstructing text components especially from multi-views for tracking. Our first step is to find Text Candidates (TCs) from multi-views by exploring deep learning. Text candidates are then verified with the degree of similarity and dissimilarity estimated by SIFT feature to eliminate false text candidates, which results in Potential Text Candidates (PTCs). Potential text candidates are further aligned in standard format with the help of affine transform. Next, the proposed method uses mosaicing concept for stitching PTC from multi-views based on overlapping regions between PTC, which results in reconstructed images. Experimental results on a large dataset with multi-view images show that the proposed method is effective and useful. The recognition experiments of several recognition methods show that the performances of the recognition methods improve significantly for the reconstructed images compared to prior reconstruction results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yin, X.C., Zuo, Z.Y., Tian, S., Liu, C.L.: Text detection, tracking and recognition in video: A comprehensive survey. IEEE Trans. Image Process. 25(6), 2752–2773 (2016)
Article MathSciNet Google Scholar
Jain, M., Mathew, M., Jawahar, C.V.: Unconstrained scene text and video text recognition for Arabic script. In: Proceedings of ASAR, pp. 26–30 (2017)
Google Scholar
Tian, S., Yin, X.C., Su, Y., Hao, H.W.: A unified framework for tracking based text detection and recognition from web videos. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 542–554 (2018)
Article Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of CVPR, pp. 2315–2324 (2016)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Wu, Y., Shivakumara, P., Lu, T., Tan, C.L., Blumenstein, M., Kumar, G.H.: Contour restoration of text components for recognition in video/scene images. IEEE Trans. Image Process. 25(12), 5622–5634 (2016)
Article MathSciNet Google Scholar
Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, pp. 2231–2239 (2016)
Google Scholar
Bai, X., Yao, C., Liu, W.: Strokelets: A learned multi-scale mid-level representation for scene text recognition. IEEE Trans. Image Process. 25(6), 2789–2802 (2016)
Article MathSciNet Google Scholar
Yang, H., Li, S., Yin, X., Han, A., Zhang, J.: Recurrent highway networks with attention mechanism for scene text recognition. In: Proceeding of the DICTA, pp. 1–8 (2017)
Google Scholar
Endo, R., Kawai, Y., Sumiyoshi, H., Sano, M.: Scene-text-detection method robust against orientation and discontiguous components of characters. In: Proceedings of the CVPR, pp. 1–9 (2017)
Google Scholar
Krämer, M., Afzal, M.Z., Bukhari, S.S., Shafait, F., Breuel, T.M.: Robust stereo correspondence for documents by matching connected components of text-lines with dynamic programming. In: Proceedings of the ICPR, pp. 734–737 (2012)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of ICCV, pp. 1150–1157 (1999)
Google Scholar
Beck, J., Sutter, A., Ivry, R.: Spatial frequency channels and perceptual grouping in texture segregation. In: Proceedings of ICVGIP, pp. 299–325 (1987)
Article Google Scholar
Lee, Jj, Kim, G.: Robust estimation of camera homography using fuzzy RANSAC. In: Gervasi, O., Gavrilova, Marina L. (eds.) ICCSA 2007. LNCS, vol. 4705, pp. 992–1002. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74472-6_81
Chapter Google Scholar
Michahial, S.: Automatic Image Mosaicing Using Sift Ransac and Homography. IJENT 3(10), 247–251 (2014)
Google Scholar
Mittal A., Moorthy A.K., Bovik A.C.: Blind/Referenceless image spatial quality evaluator. In: Proceedings of ACSSC, pp. 723–727(2011)
Google Scholar
Mittal A., Soundararajan R., Bovik A.C.: Making a ‘Completely Blind’ image quality analyzer. In: Proceedings of ISPL, pp. 209–212(2013)
Article Google Scholar
Blanchet G., Moisan L., Roug´e, B.: Measuring the global phase coherence of an image. In: Proceedings of ICIP, pp. 1176–1179(2008)
Google Scholar
Blanchet, G., Moisan, L.: An explicit sharpness index related to global phase coherence. In: Proceedings of ICASSP, pp. 1065–1068(2012)
Google Scholar
Robust Reading Competition. http://rrc.cvc.uab.es/?ch=5&com=evaluation&task=2. accessed 17 May 2018
Google Cloud Vision API. https://cloud.google.com/vision/. accessed 17 May 2018
Tesseract OCR. https://github.com/tesseract-ocr/tesseract. accessed 17 May 2018

Download references

Acknowledgment

The work described in this paper was supported by the Natural Science Foundation of China under Grant No. 61672273 and No. 61272218, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant No. BK20160021, and Scientific Foundation of State Grid Corporation of China (Research on Ice-wind Disaster Feature Recognition and Prediction by Few-shot Machine Learning in Transmission Lines).

Author information

Authors and Affiliations

National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China
Minglei Yuan, Hao Kong & Tong Lu
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Palaiahnakote Shivakumara
Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India
Umapada Pal

Authors

Minglei Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Palaiahnakote Shivakumara
View author publications
You can also search for this author in PubMed Google Scholar
Hao Kong
View author publications
You can also search for this author in PubMed Google Scholar
Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, M., Shivakumara, P., Kong, H., Lu, T., Pal, U. (2018). Text Component Reconstruction for Tracking in Video. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-00776-8_40
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics