Skip to main content

Text Component Reconstruction for Tracking in Video

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

  • 3650 Accesses

Abstract

Text tracking is challenging due to unpredictable variations in orientation, shape, size, color and loss of information. This paper presents a new method for reconstructing text components especially from multi-views for tracking. Our first step is to find Text Candidates (TCs) from multi-views by exploring deep learning. Text candidates are then verified with the degree of similarity and dissimilarity estimated by SIFT feature to eliminate false text candidates, which results in Potential Text Candidates (PTCs). Potential text candidates are further aligned in standard format with the help of affine transform. Next, the proposed method uses mosaicing concept for stitching PTC from multi-views based on overlapping regions between PTC, which results in reconstructed images. Experimental results on a large dataset with multi-view images show that the proposed method is effective and useful. The recognition experiments of several recognition methods show that the performances of the recognition methods improve significantly for the reconstructed images compared to prior reconstruction results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yin, X.C., Zuo, Z.Y., Tian, S., Liu, C.L.: Text detection, tracking and recognition in video: A comprehensive survey. IEEE Trans. Image Process. 25(6), 2752–2773 (2016)

    Article  MathSciNet  Google Scholar 

  2. Jain, M., Mathew, M., Jawahar, C.V.: Unconstrained scene text and video text recognition for Arabic script. In: Proceedings of ASAR, pp. 26–30 (2017)

    Google Scholar 

  3. Tian, S., Yin, X.C., Su, Y., Hao, H.W.: A unified framework for tracking based text detection and recognition from web videos. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 542–554 (2018)

    Article  Google Scholar 

  4. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of CVPR, pp. 2315–2324 (2016)

    Google Scholar 

  5. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  6. Wu, Y., Shivakumara, P., Lu, T., Tan, C.L., Blumenstein, M., Kumar, G.H.: Contour restoration of text components for recognition in video/scene images. IEEE Trans. Image Process. 25(12), 5622–5634 (2016)

    Article  MathSciNet  Google Scholar 

  7. Lee, C.Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR, pp. 2231–2239 (2016)

    Google Scholar 

  8. Bai, X., Yao, C., Liu, W.: Strokelets: A learned multi-scale mid-level representation for scene text recognition. IEEE Trans. Image Process. 25(6), 2789–2802 (2016)

    Article  MathSciNet  Google Scholar 

  9. Yang, H., Li, S., Yin, X., Han, A., Zhang, J.: Recurrent highway networks with attention mechanism for scene text recognition. In: Proceeding of the DICTA, pp. 1–8 (2017)

    Google Scholar 

  10. Endo, R., Kawai, Y., Sumiyoshi, H., Sano, M.: Scene-text-detection method robust against orientation and discontiguous components of characters. In: Proceedings of the CVPR, pp. 1–9 (2017)

    Google Scholar 

  11. Krämer, M., Afzal, M.Z., Bukhari, S.S., Shafait, F., Breuel, T.M.: Robust stereo correspondence for documents by matching connected components of text-lines with dynamic programming. In: Proceedings of the ICPR, pp. 734–737 (2012)

    Google Scholar 

  12. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of ICCV, pp. 1150–1157 (1999)

    Google Scholar 

  13. Beck, J., Sutter, A., Ivry, R.: Spatial frequency channels and perceptual grouping in texture segregation. In: Proceedings of ICVGIP, pp. 299–325 (1987)

    Article  Google Scholar 

  14. Lee, Jj, Kim, G.: Robust estimation of camera homography using fuzzy RANSAC. In: Gervasi, O., Gavrilova, Marina L. (eds.) ICCSA 2007. LNCS, vol. 4705, pp. 992–1002. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74472-6_81

    Chapter  Google Scholar 

  15. Michahial, S.: Automatic Image Mosaicing Using Sift Ransac and Homography. IJENT 3(10), 247–251 (2014)

    Google Scholar 

  16. Mittal A., Moorthy A.K., Bovik A.C.: Blind/Referenceless image spatial quality evaluator. In: Proceedings of ACSSC, pp. 723–727(2011)

    Google Scholar 

  17. Mittal A., Soundararajan R., Bovik A.C.: Making a ‘Completely Blind’ image quality analyzer. In: Proceedings of ISPL, pp. 209–212(2013)

    Article  Google Scholar 

  18. Blanchet G., Moisan L., Roug´e, B.: Measuring the global phase coherence of an image. In: Proceedings of ICIP, pp. 1176–1179(2008)

    Google Scholar 

  19. Blanchet, G., Moisan, L.: An explicit sharpness index related to global phase coherence. In: Proceedings of ICASSP, pp. 1065–1068(2012)

    Google Scholar 

  20. Robust Reading Competition. http://rrc.cvc.uab.es/?ch=5&com=evaluation&task=2. accessed 17 May 2018

  21. Google Cloud Vision API. https://cloud.google.com/vision/. accessed 17 May 2018

  22. Tesseract OCR. https://github.com/tesseract-ocr/tesseract. accessed 17 May 2018

Download references

Acknowledgment

The work described in this paper was supported by the Natural Science Foundation of China under Grant No. 61672273 and No. 61272218, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant No. BK20160021, and Scientific Foundation of State Grid Corporation of China (Research on Ice-wind Disaster Feature Recognition and Prediction by Few-shot Machine Learning in Transmission Lines).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yuan, M., Shivakumara, P., Kong, H., Lu, T., Pal, U. (2018). Text Component Reconstruction for Tracking in Video. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00776-8_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00775-1

  • Online ISBN: 978-3-030-00776-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics