Multimedia Tools and Applications

, Volume 76, Issue 15, pp 16625–16655 | Cite as

Arbitrarily-oriented multi-lingual text detection in video

  • Vijeta Khare
  • Palaiahnakote Shivakumara
  • Raveendran Paramesran
  • Michael Blumenstein


Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.


Higher order moments Stroke width distance, dynamic window Caption text Region growing Arbitrarily-oriented text detection Multi-lingual text detection 



The work is also partly supported by the University of Malaya HIR under Grant No: UM.C/625/1/HIR/MOHE/ENG/42. The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which helped us to improve the quality and to clarify the paper significantly.


  1. 1.
    Bernsen J (1986) Dynamic thresholding of gray-level images. In Proc. ICPR, 1251–1255Google Scholar
  2. 2.
    Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In Proc CVPR, 2963–2970Google Scholar
  3. 3.
    Huang X (2011) A novel approach to detecting scene text in video. In Proc ICISP, 469–473Google Scholar
  4. 4.
    Huang W, Shivakumara P, Tan CL (2008) Detecting moving text in video using temporal information. In Proc ICPR, 1–4Google Scholar
  5. 5.
    Huang X, Ma H, Ling CX, Gao G (2014) Detecting both superimposed and scene text with multiple languages and multiple alignments in video. MTA 70:1703–1727Google Scholar
  6. 6.
    Karatzas D, Shafait F, Uchida S, Iwamura M, Boorda LGI, Mestre SR, Mas J, Mota DF, Almazan JA, De las Heras LP (2013) ICDAR 2013 robust reading competition. In Proc. ICDAR, 1115–1124Google Scholar
  7. 7.
    Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanow A, Iwamura M, Matas J, Neumann L, Chandrsekhar VR (2015) ICDAR 2015 Competition on Robust Reading. In Proc ICDAR, 1156–1160Google Scholar
  8. 8.
    Khare V, Shivakumara P, Raveendran P (2015) A new histogram oriented moments descriptor for multi-oriented moving text detection in video. ESWA 42:7627–7640Google Scholar
  9. 9.
    Li H, Doermann D, Kia O (2000) Automatic text Detection and tracking in digital video. IEEE Trans. IP 9:147–156Google Scholar
  10. 10.
    Li L, Li J, Song Y, Wang L (2010) A multiple frame integration and mathematical morphology based technique for video text extraction. In Proc ICCIA, 434–437Google Scholar
  11. 11.
    Liang G, Shivakumara P, Lu T, Tan CL (2015) Multi-spectral fusion based approach for arbitrarily-oriented scene text detection in video image, IEEE Trans. IP 24(11):4488–4501Google Scholar
  12. 12.
    Liu X, Wang W (2012) Robustly extracting captions in videos based on stroke-line edges and spatio-temporal analysis. IEEE Trans. MM 14:482–489Google Scholar
  13. 13.
    Liu C, Wang C, Dai R (2005) Text detection in images based on unsupervised classification of edge-based features. In Proc. ICDAR, 610–614Google Scholar
  14. 14.
    Liu X, Fu H, Jia Y (2008) Gaussian mixture modeling and learning on neighboring characters for multilingual text extraction in images. Pattern Recogn 41:484–493CrossRefzbMATHGoogle Scholar
  15. 15.
    Mi C, Xu Y, Lu H, Xue X (2005) A novel video text extraction approach based on multiple frames. In Proc ICICSP, 678–682Google Scholar
  16. 16.
    Nguyen P, Wang K, Belongie S (2014) Video text detection and recognition: dataset and benchmark. In Proc WCACV, 776–783Google Scholar
  17. 17.
    Otsu N (1979) A threshold selection method from gray-level histograms, IEEE Trans. SMAC, 62–66Google Scholar
  18. 18.
    Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In Proc ACMMM, 765–768Google Scholar
  19. 19.
    Qian X, Wang H, Hou X (2014) Video text detection and localization in intra-frames of H.264/AVC compressed video. MTA 70:1487–1502Google Scholar
  20. 20.
    Risnumawan A, Shivakumara P, Chan CS, Tan CL (2014) A robust arbitrary text detection system for natural scene images. ESWA 41:8027–8048Google Scholar
  21. 21.
    Roy S, Shivakumara P, Roy PP, Pal U, Tan CL (2015) Bayesian classifier for multi-oriented video text recognition system. Pattern Recogn:5554–5565Google Scholar
  22. 22.
    Shi A, Yao C, Zhang C, Guo Z, Huang F, Bai X (2015) Automatic Script Identification in the Wild. In Proc. ICDAR, 531–535Google Scholar
  23. 23.
    Shivakumara P, Phan TQ, Tan CL (2010) New fourier-statistical features in rgb space for video text detection. IEEE Trans. CSVT 20(11):1520–1532Google Scholar
  24. 24.
    Shivakumara P, Phan TQ, Tan CL (2011) A Laplacian approach to multi-oriented text detection in video. IEEE Trans. PAMI, 33 412–419Google Scholar
  25. 25.
    Shivakumara P, Sreedhar RP, Phan TQ, Lu S, Tan CL (2012) Multi-oriented video scene text detection through Bayesian classification and boundary growing. IEEE Trans. CSVT 22:1227–1235Google Scholar
  26. 26.
    Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. IEEE Trans. CSVT 23:1729–1739Google Scholar
  27. 27.
    Shivakumara P, Dutta A, Tan CL, Pal U (2014) Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing. MTA 72:515–539Google Scholar
  28. 28.
    Su F, Xu H (2015) Robust seed-based stroke width transform for text detection in natural images. In Proc. ICDAR, 916–920Google Scholar
  29. 29.
  30. 30.
    Tian S, Bhattacharya U, Lu S, Su B, Tan CL (2016) Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recogn 51:125–134CrossRefGoogle Scholar
  31. 31.
    Wang YK, Chen JM (2006) Detection video texts using spatial-temporal wavelet transform. In Proc. ICPR, 754–757Google Scholar
  32. 32.
    Wu L, Shivakumara P, Lu T, Tan CL (2015) A new technique for multi-oriented scene text detection and tracking. IEEE Trans. MM 17:1137–1152Google Scholar
  33. 33.
    Wu H, Zou BJ, Zhao YQ, Fu HP (2016) An automatic video text detection method based on BP-adaboostGoogle Scholar
  34. 34.
    Yang H, Quehl B, Sack H (2014) A framework for improved video text detection and recognition. MTA 69:217–245Google Scholar
  35. 35.
    Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE. Trans. PAMI 37:1480–1500CrossRefGoogle Scholar
  36. 36.
    Yin XC, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE trans. PAMI 36:970–983CrossRefGoogle Scholar
  37. 37.
    Zhao Z, Lin KH, Fu Y, Hu Y, Liu Y, Huang TS (2011) Text from corners: A novel approach to detect text and caption in videos. IEEE Trans. IP 20:790–799MathSciNetGoogle Scholar
  38. 38.
    Zhou J (2007) A robust system for text extraction in video. In Proc ICMV, 119–124Google Scholar
  39. 39.
    Zhou Y, Feild J, Miller EL, Wang R (2013) Scene text segmentation via inverse rendering, In Proc. ICDAR, 457–461Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Vijeta Khare
    • 1
  • Palaiahnakote Shivakumara
    • 2
    • 3
  • Raveendran Paramesran
    • 1
  • Michael Blumenstein
    • 4
  1. 1.Faculty of EngineeringUniversity of MalayaKuala LumpurMalaysia
  2. 2.Faculty of Computer Science and Information TechnologyUniversity of MalayaKuala LumpurMalaysia
  3. 3.Computer Systems and Information TechnologyUniversity of MalayaMalaysiaMalaysia
  4. 4.School of SoftwareUniversity of Technology SydneySydneyAustralia

Personalised recommendations