A Deep Convolution Neural Network Based Model for Enhancing Text Video Frames for Detection

  • C. Sunil
  • H. K. Chethan
  • K. S. Raghunandan
  • G. Hemantha Kumar
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 736)


The main causes of getting poor results in video text detection is low quality of frames and which is affected by different factors like de-blurring, complex background, illumination etc. are few of the challenges encountered in image enhancement. This paper proposes a technique for enhancing image quality for better human perception along with text detection for video frames. An approach based on set of smart and effective CNN denoisers are designed and trained to denoise an image by adopting variable splitting technique, the robust denoisers are plugged into model based optimization methods with HQS framework to handle image deblurring and super resolution problems. Further, for detecting text from denoised frames, we have used state-of-art methods such as MSER (Maximally Extremal Regions) and SWT (Stroke Width Transform) and experiments are done on our database, ICDAR and YVT database to demonstrate our proposed work in terms of precision, recall and F-measure.


Video text detection CNN Enhancement Low quality images 



The work carried out in this paper was supported by High Performance Computing Lab, under UPE Grant Department of Studies in Computer Science, University of Mysore, Mysore.


  1. 1.
    Sato, T., Kanade, T., Hughes, E.K., Smith, M.A.: Video OCR for digital news archive. In: Proceedings of IEEE Workshop on Content Based Access of Image and Video Databases, Bombay, India, pp. 52–60 (1998)Google Scholar
  2. 2.
    Li, H., Kia, O., Doermann, D.: Text enhancement in digital video. In: Proceedings of SPIE, Document Recognition IV, pp. 1–8 (1999) Google Scholar
  3. 3.
    Li, H., Doerman, D., Kia, O.: Automatic text detection and tracking in digital video. IEEE Trans. Image Process. 9, 147–156 (2000)CrossRefGoogle Scholar
  4. 4.
    Li, H., Doermann, D.: A video text detection system based on automated training. In: Proceedings of IEEE International Conference on Pattern Recognition, pp. 223–226 (2000)Google Scholar
  5. 5.
    Chen, D., Odobez, J., Bourlard, H.: Text segmentation and recognition in complex background based on Markov random field. In: Proceedings of International Conference on Pattern Recognition, Quebec, Canada, vol. 4, pp. 227–230 (2002)Google Scholar
  6. 6.
    Rainer, L., Stuber, F.: Automatic text recognition in digital videos. Technical Report, University of Mannheim (1995)Google Scholar
  7. 7.
    Burger, H.C., Schuler, C.J., Harmeling, S.: Image denoising: can plain neural networks compete with BM3D? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2392–2399 (2012)Google Scholar
  8. 8.
    Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)CrossRefGoogle Scholar
  9. 9.
    Dong, W., Zhang, L., Shi, G., Li, X.: Nonlocally centralized sparse representation for image restoration. IEEE Trans. Image Process. 22(4), 1620–1630 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolution neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798 (2014)Google Scholar
  11. 11.
    Jain, A.K., Yu, B.: Automatic text location in images and video frames. Pattern Recogn. 31(12), 2055–2076 (1998)CrossRefGoogle Scholar
  12. 12.
    Petter, M., Fragoso, V., Turk, M., Baur, C.: Automatic text detection for mobile augmented reality translation. In: Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV 2011), pp. 48–55 (2011)Google Scholar
  13. 13.
    Lyu, M.R., Song, J., Cai, M.: A comprehensive method for multilingual video text detection, localization, and extraction. IEEE Trans. Circ. Syst. Video Technol. 15(2), 243–255 (2005)CrossRefGoogle Scholar
  14. 14.
    Shivakumara, P., Phan, T.Q., Lu, S., Tan, C.L.: Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images. IEEE Trans. Circ. Syst. Video Technol. 23(10), 1729–1739 (2013)CrossRefGoogle Scholar
  15. 15.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of International Conference on Computer Vision and Pattern Recognition, CVPR 2010, pp. 2963–2970 (2010)Google Scholar
  16. 16.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British Machine Vision Conference, vol. 1, pp. 384–393 (2002)Google Scholar
  17. 17.
    Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolution neural networks. In: Proceedings of International Conference on Pattern Recognition (ICPR 2012), pp. 3304–3308 (2012)Google Scholar
  18. 18.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Proceedings of the 13th European Conference on Computer Vision (ECCV 2014), pp. 512–528 (2014)Google Scholar
  19. 19.
    Yin, X.-C., Yin, X., Huang, K., Hao, H.-W.: Robust text detection in natural scene images. IEEE Trans. PAMI 36(5), 970–983 (2014)CrossRefGoogle Scholar
  20. 20.
    Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: Computer Vision and Pattern Recognition, CVPR (2017)Google Scholar
  21. 21.
    Andrews, H.C., Hunt, B.R.: Digital Image Restoration. Prentice-Hall Signal Processing Series, vol. 1. Prentice-Hall, Englewood Cliffs (1977)Google Scholar
  22. 22.
    Campisi, P., Egiazarian, K.: Blind Image Deconvolution: Theory and Applications. CRC Press, New York (2016)Google Scholar
  23. 23.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR (2010)Google Scholar
  24. 24.
    Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B: Robust text detection in natural scene images with edge-enhanced maximally stable extremal regions. In: 18th IEEE International Conference Image Processing (ICIP), pp. 2609–2612 (2011)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • C. Sunil
    • 1
  • H. K. Chethan
    • 1
  • K. S. Raghunandan
    • 2
  • G. Hemantha Kumar
    • 2
  1. 1.Department of Computer Science and EngineeringMaharaja Research Foundation, Maharaja Institute of TechnologyMysoreIndia
  2. 2.Department of Studies in Computer ScienceUniversity of MysoreMysoreIndia

Personalised recommendations