Abstract
The detection of scene text in videos is of great value in various content-based video applications such as video analysis and retrieval. In this paper, we present a robust scene text detection and tracking method for videos. We first propose an effective deep neural network model for detecting text in individual video frames, which enhances the EAST model by introducing deconvolution layers and inception modules. We then present a correlation filter based tracking algorithm for text in the video and further combine detection and tracking results, which effectively enhances the final video text detection performance. The proposed method outperforms other state-of-the-art methods in experiments on public scene text video datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: complementary learners for real-time tracking. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1401–1409 (2016)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970 (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
ICDAR 2015 robust reading competition. http://rrc.cvc.uab.es/
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
Khare, V., Shivakumara, P., Paramesran, R., Blumenstein, M.: Arbitrarily-oriented multi-lingual text detection in video. Multimedia Tools Appl. 76(15), 16625–16655 (2017)
Khare, V., Shivakumara, P., Raveendran, P.: A new histogram oriented moments descriptor for multi-oriented moving text detection in video. Expert Syst. Appl. 42(21), 7627–7640 (2015)
Kim, K., Cheon, Y., Hong, S., Roh, B., Park, M.: PVANET: deep but lightweight neural networks for real-time object detection. CoRR abs/1608.08021 (2016)
Liu, C., Wang, C., Dai, R.: Text detection in images based on unsupervised classification of edge-based features. In: Eighth International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 610–614 (2005)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016)
Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: Snoopertrack: text detection and tracking for outdoor videos. In: 2011 18th IEEE International Conference on Image Processing, pp. 505–508 (2011)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3482–3490 (2017)
Shivakumara, P., Phan, T.Q., Tan, C.L.: New fourier-statistical features in rgb space for video text detection. IEEE Trans. Circ. Syst. Video Technol. 20(11), 1520–1532 (2010)
Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L.: Multioriented video scene text detection through bayesian classification and boundary growing. IEEE Trans. Circ. Syst. Video Technol. 22(8), 1227–1235 (2012)
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4651–4659 (2015)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Yang, C., et al.: Tracking based multi-orientation scene text detection: a unified framework with dynamic programming. IEEE Trans. Image Process. 26(7), 3235–3248 (2017)
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014)
Yin, X.C., Zuo, Z.Y., Tian, S., Liu, C.L.: Text detection, tracking and recognition in video: a comprehensive survey. IEEE Trans. Image Process. 25(6), 2752–2773 (2016)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: 2016 ACM Conference on Multimedia, MM 2016, pp. 516–520. ACM, New York (2016)
Zhao, X., Lin, K.H., Fu, Y., Hu, Y., Liu, Y., Huang, T.S.: Text from corners: a novel approach to detect text and caption in videos. IEEE Trans. Image Process. 20(3), 790–799 (2011)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2017)
Zuo, Z.Y., Tian, S., Pei, W., Yin, X.C.: Multi-strategy tracking based text detection in scene videos. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 66–70 (2015)
Acknowledgments
Research supported by the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20171345 and the National Natural Science Foundation of China under Grant Nos. 61003113, 61321491, 61672273.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Y., Wang, L., Su, F. (2018). A Robust Approach for Scene Text Detection and Tracking in Video. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-00764-5_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00763-8
Online ISBN: 978-3-030-00764-5
eBook Packages: Computer ScienceComputer Science (R0)