Multimedia Tools and Applications

, Volume 78, Issue 6, pp 6989–7004 | Cite as

A new video text extraction using local laplacian filters and mean shift

  • Xiaodong HuangEmail author


Video text constitutes the semantic context of the video. For that reason, robust extraction of text is essential for successful video understanding, search and retrieval. Extracting text from background is an important phrase before the text can be recognized correctly. It is a challenging task because of the difficulties in text segmentation from the varied and complicated backgrounds. Therefore, this paper proposes a novel text extraction method to tackle this issue. First, we perform background complexity determination to distinguish the text lines with clear and simple background from those with complex background, which will increase the extraction speed. Then, for the text lines with complicated background and low contrast, we utilize the Local Laplacian Filters Commun ACM 58(3):81–91 [18] to enhance the details of text regions and get the Integrated Enhanced Map (IEM). Finally, we perform the Mean Shift IEEE Trans Pattern Anal Mach Intell 24(5):603–619 [4] for the segmentation on IEM and retrieve the text extraction results. Experimental evaluations based on a variety of videos dataset we collected demonstrate that our method significantly outperforms the other three video text extraction algorithms in terms of recall, precision and F-score, especially when there are challenges such as video text with different font sizes, font styles, languages, and background complexities.


Video text Text extraction Local laplacian filters Mean shift 



This work reported in this paper is supported by Beijing Natural Science Foundation(4173073); the Surface Project of Beijing Committee of Education under Grant No. KM201710028021; Supported by Capacity Building for Sci-Tech Innovation - Fundamental Scientific Research Funds (025185305000/152).


  1. 1.
    Bai B, Yin F, Liu C-L (2014) A seed-based segmentation method for scene text extraction. IAPR international workshop on document analysis systems, pp 262–266Google Scholar
  2. 2.
    Cho MS, Seok J-H, Lee S, Kim JH (2011) Scene text extraction by superpixel CRFs combining multiple character features. International conference on document analysis and recognition, pp 1034–1038Google Scholar
  3. 3.
    Clavelli A, Karatzas D, Lladós J (2010) A framework for the assessment of text extraction algorithms on complex colour images. the 9th IAPR international workshop on document analysis systems, ACM, pp 19–26Google Scholar
  4. 4.
    Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619CrossRefGoogle Scholar
  5. 5.
    Gòmez L, Karatzas D (2013) Multi-script text extraction from natural scenes. International conference on document analysis and recognition, pp 467–471Google Scholar
  6. 6.
    Hedjam R, Cheriet M (2011) Novel data representation for text extraction from multispectral historical document images. International conference on document analysis and recognition, pp 172–176Google Scholar
  7. 7.
    Kachouri R, Armas CM, Akil M (2015) Gamma correction acceleration for real-time text extraction from complex colored images. ICIP, pp 527–531Google Scholar
  8. 8.
    Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda LGi, Mestre SR, Mas J, Mota DF, Almazàn JA, de las Heras LP (2013) ICDAR 2013 Robust Reading Competition. ICDAR, pp: 1115–1124Google Scholar
  9. 9.
    Kavitha AS, Shivakumara P, Kumar GH, Lu T (2016) Text segmentation in degraded historical document images. Egypt Info J 17:189–197CrossRefGoogle Scholar
  10. 10.
    Kim W, Kim C (2009) A new approach for overlay text detection and extraction from complex video scene. IEEE Trans Image Process 18(2):401–411MathSciNetCrossRefGoogle Scholar
  11. 11.
    Lee SH, Kim JH (2013) Integrating multiple character proposals for robust scene text extraction. Image Vis Comput 31:823–840CrossRefGoogle Scholar
  12. 12.
    Lee S, Cho MS, Jun K, Kim JH (2010) Scene text extraction with edge constraint and text collinearity. ICPR pp:3983–3986Google Scholar
  13. 13.
    Li X, Wang W, Huang Q, Gao W, Qing L (2009) A hybrid text segmentaion approach. ICME, pp 510–513Google Scholar
  14. 14.
    Li Z, Liu G, Qian X, Guo D, Jiang H (2011) Effective and efficient video text extraction using key text points. IET Image Process 5(8):671–683MathSciNetCrossRefGoogle Scholar
  15. 15.
    Y Liu, Song Y, Zhang Y, Meng Q (2013) A Novel multi-oriented chinese text extraction approach from videos. 12th international conference on document analysis and recognition, pp 1355–1359Google Scholar
  16. 16.
    Liu Y, Song Y, Zhang Y, Meng Q (2013) A novel multi-oriented chinese text extraction approach from videos. International conference on document analysis and recognition, pp 1355–1359Google Scholar
  17. 17.
    Lyu MR, Song J, Cai M (2005) A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans Circuits Syst Video Technol 15(2):243–255CrossRefGoogle Scholar
  18. 18.
    Paris S, Hasinoff SW, Kautz J (2015) Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. Commun ACM 58(3):81–91CrossRefGoogle Scholar
  19. 19.
    Roy A, Parui SK, Roy U (2013) A pair-copula based scheme for text extraction from digital images. International conference on document analysis and recognition, pp 892–896Google Scholar
  20. 20.
    Šari’c M (2017) Scene text segmentation using low variation extremal regions and sorting based character grouping. Neurocomputing 266:56–65CrossRefGoogle Scholar
  21. 21.
    Sumathi CP, Gayathri Devi G (2014) Automatic text extraction from complex colored images using gamma correction method. J Comput Sci 4:705–715CrossRefGoogle Scholar
  22. 22.
    Wang R, Jin W, Wu L A novel video caption detection approach using multi-frame integration. Proceedings of the 17th international conference on pattern recognition (ICPR’04)Google Scholar
  23. 23.
    Yin XC, Zuo ZY, Tian S, Liu CL (2016) Text detection, tracking and recognition in video: a comprehensive survey. IEEE Trans Image Process 25(6):2752–2773MathSciNetCrossRefGoogle Scholar
  24. 24.
    Zhang Z, Wang W, Lu K (2014) Video text extraction using the fusion of color gradient and log-gabor filter. International conference on pattern recognition, pp 2938–2943Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Capital Normal UniversityBeijingChina

Personalised recommendations