A Fast Method for Scene Text Detection

  • Qing Fang
  • Yanping Yang
  • Yali Chen
  • Xiaoyu Yao
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 771)


Text detection is important for many applications such as text retrieval, blind guidance, and industrial automation. Meanwhile, text detection is a challenging task due to the complexity of the background and the diversity of the font, size and color of the text. In recent years, deep learning achieves good results in image classification and detection, and provides us a new method for text detection. In this paper, a deep learning based detection method – Single Shot MultiBox Detector (SSD) is adopted. But SSD is a general object detection method, not specific for text detection and is not fast enough. Our method aims to develop a network for text detection, improve the speed and reduce the model. Therefore, we design a feature extraction network with the inception module and an additional deconvolution layer. The experiment on benchmark – ICDAR2013 demonstrates that our method is faster than other SSD-based method comparable results.


Scene text detection Deep network 


  1. 1.
    Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks (2016)Google Scholar
  2. 2.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531 (2013)
  3. 3.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)Google Scholar
  4. 4.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451 (2000)CrossRefGoogle Scholar
  5. 5.
    Girshick, R.: Fast R-CNN. Computer Science (2015)Google Scholar
  6. 6.
    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, pp. 770–778 (2015)Google Scholar
  8. 8.
    Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision, pp. 1241–1248 (2013)Google Scholar
  9. 9.
    Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 497–511. Springer, Cham (2014). Google Scholar
  10. 10.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G.I., Mestre, S.R., Mas, J., Mota, D.F., Almazn, J.A., Heras, L.P.D.L.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)Google Scholar
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  13. 13.
    Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network (2016)Google Scholar
  14. 14.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). CrossRefGoogle Scholar
  15. 15.
    Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals (2017)Google Scholar
  16. 16.
    Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). CrossRefGoogle Scholar
  17. 17.
    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection, pp. 779–788 (2015)Google Scholar
  18. 18.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137 (2016)CrossRefGoogle Scholar
  19. 19.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  20. 20.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. Eprint Arxiv (2014)Google Scholar
  21. 21.
    Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). CrossRefGoogle Scholar
  22. 22.
    Yao, C.: Detecting texts of arbitrary orientations in natural images, vol. 157, no. 10, pp. 1083–1090 (2012)Google Scholar
  23. 23.
    Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. 23(11), 4737–4749 (2014)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • Qing Fang
    • 1
  • Yanping Yang
    • 1
  • Yali Chen
    • 1
  • Xiaoyu Yao
    • 1
  1. 1.School of Electronic EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations