Advertisement

Scene Text Detection with a SSD and Encoder-Decoder Network Based Method

  • Cong Luo
  • Xue GaoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11264)

Abstract

In this work, we propose a simple yet powerful method that yields effective text detection in natural scenes. We present a Text Localization Neural Network, which detects text in scene images with one forward propagation and a standard non-maximum suppression subsequently. In order to eliminate few scene background mistaken by Text Localization Neural Network, we propose a Text Verification Model based on the encoder-decoder network. Thus, precision of text detection can be further improved by recognizing text in our candidate text regions. We have evaluated the proposed method for text detection on our own constructed horizontal text detection dataset. Compared with previous approaches, our method achieves a highest recall rate of 0.784 and competitive precision rate in text detection.

Keywords

Text detection SSD Encoder-decoder network 

Notes

Acknowledgments

This research was partially supported by National science and technology support plan (2013BAH65F04), Natural Science Foundation of Guangdong Province (No. 2015A030313210) and Science and Technology Program of Guangzhou (Grant No.: 201604010061, 201707010141).

References

  1. 1.
    Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_2CrossRefGoogle Scholar
  2. 2.
    Shi, B., et al.: ICDAR2017 competition on reading Chinese text in the wild (RCTW-17) (2017)Google Scholar
  3. 3.
    Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1631–1639 (2003)CrossRefGoogle Scholar
  4. 4.
    Phan, T.Q., Shivakumara, P., Tan, C.L.: A Laplacian method for video text detection. In: 10th International Conference on Document Analysis and Recognition, ICDAR 2009, pp. 66–70 (2009)Google Scholar
  5. 5.
    Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15549-9_43CrossRefGoogle Scholar
  6. 6.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition, CVPR, pp. 2963–2970 (2010)Google Scholar
  7. 7.
    Chowdhury, A.R., Bhattacharya, U., Parui, S.K.: Scene text detection using sparse stroke information and MLP. In: 21st International Conference on Pattern Recognition, ICPR, pp. 294–297 (2012)Google Scholar
  8. 8.
    Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1083–1090 (2012)Google Scholar
  9. 9.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22, 761–767 (2004)CrossRefGoogle Scholar
  10. 10.
    Shahab, A., Shafait, F., Dengel, A.: Robust reading competition challenge 2: reading text in scene images. In: IEEE International Conference on Document Analysis and Recognition, pp. 1491–1496 (2011)Google Scholar
  11. 11.
    Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: IEEE International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)Google Scholar
  12. 12.
    Sun, L., Huo, Q., Jia, W., Chen, K.: Robust text detection in natural scene images by generalized color-enhanced contrasting extremal region and neural networks. In: IEEE 22nd International Conference on Pattern Recognition, ICPR, pp. 2715–2720 (2014)Google Scholar
  13. 13.
    Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 497–511. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10593-2_33CrossRefGoogle Scholar
  14. 14.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)CrossRefGoogle Scholar
  15. 15.
    Zhong, Z., Jin, L., Zhang, S., Feng, Z.: DeepText: a unified framework for text proposal generation and text detection in natural images, pp. 1–18. arXiv preprint arXiv:1605.07314 (2015)
  16. 16.
    Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_4CrossRefGoogle Scholar
  17. 17.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition, pp. 779–788 (2016)Google Scholar
  18. 18.
    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)Google Scholar
  19. 19.
    Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Computer Vision and Pattern Recognition, pp. 4159–4167 (2016)Google Scholar
  20. 20.
    Zhou, X., et al.: EAST: an efficient and accurate scene text detector, pp. 2642–2651 (2017)Google Scholar
  21. 21.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)Google Scholar
  22. 22.
    Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298–2304 (2017)CrossRefGoogle Scholar
  23. 23.
    Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. Computer Science (2015)Google Scholar
  24. 24.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Computer Science (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Electronic and Information EngineeringSouth China University of TechnologyGuangzhouChina
  2. 2.SCUT-Zhuhai Institute of Modern Industrial InnovationZhuhaiChina

Personalised recommendations