Advertisement

Detecting Text in Natural Image with Connectionist Text Proposal Network

  • Zhi Tian
  • Weilin Huang
  • Tong He
  • Pan He
  • Yu Qiao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9912)

Abstract

We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text. The CTPN works reliably on multi-scale and multi-language text without further post-processing, departing from previous bottom-up methods requiring multi-step post filtering. It achieves 0.88 and 0.61 F-measure on the ICDAR 2013 and 2015 benchmarks, surpassing recent results [8, 35] by a large margin. The CTPN is computationally efficient with 0.14 s/image, by using the very deep VGG16 model [27]. Online demo is available: http://textdet.com/.

Keywords

Scene text detection Convolutional network Recurrent neural network Anchor mechanism 

Notes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61503367), the Science and Technology Planning Project of Guangdong Province (2015A030310289, 2014B050505017, 2015B010129013), Shenzhen Research Program (KQCX2015033117354153, JSGG20150925164740726, CXZZ201- 50930104115529, JCYJ20150925163005055), and External Cooperation Program of BIC, Chinese Academy of Sciences (172644KYSB20150019).

References

  1. 1.
    Busta, M., Neumann, L., Matas, J.: FasText: efficient unconstrained scene text detector. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  2. 2.
    Cheng, M., Zhang, Z., Lin, W., Torr, P.: BING: binarized normed gradients for objectness estimation at 300 fps. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  3. 3.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  4. 4.
    Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput.Vis. (IJCV) 88(2), 303–338 (2010)CrossRefGoogle Scholar
  5. 5.
    Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV)(2015)Google Scholar
  6. 6.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  7. 7.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRefGoogle Scholar
  8. 8.
    Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  9. 9.
    He, P., Huang, W., Qiao, Y., Loy, C.C., Tang, X.: Reading scene text in deep convolutional sequences. In: The 30th AAAI Conference on Artificial Intelligence (AAAI-16) (2016)Google Scholar
  10. 10.
    He, T., Huang, W., Qiao, Y., Yao, J.: Accurate text localization in natural image with cascaded convolutional text network (2016). arXiv:1603.09423
  11. 11.
    He, T., Huang, W., Qiao, Y., Yao, J.: Text-attentional convolutional neural networks for scene text detection. IEEE Trans. Image Processing (TIP) 25, 2529–2541 (2016)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Netw. 9(8), 1735–1780 (1997)Google Scholar
  13. 13.
    Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar
  14. 14.
    Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolutional neural networks induced mser trees. In: European Conference on Computer Vision (ECCV) (2014)Google Scholar
  15. 15.
    Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. (IJCV) 116(1), 1–20 (2016)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 512–528. Springer, Heidelberg (2014)Google Scholar
  17. 17.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (ACM MM) (2014)Google Scholar
  18. 18.
    Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: International Conference on Document Analysis and Recognition (ICDAR)(2015)Google Scholar
  19. 19.
    Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras., L.P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition (ICDAR) (2013)Google Scholar
  20. 20.
    Mao, J., Li, H., Zhou, W., Yan, S., Tian, Q.: Scale based region growing for scene text detection. In: ACM International Conference on Multimedia (ACM MM) (2013)Google Scholar
  21. 21.
    Minetto, R., Thome, N., Cord, M., Fabrizio, J., Marcotegui, B.: Snoopertext: a multiresolution system for text detection in complex visual scenes. In: IEEE International Conference on Pattern Recognition (ICIP) (2010)Google Scholar
  22. 22.
    Neumann, L., Matas, J.: Efficient scene text localization and recognition with local character refinement. In: International Conference on Document Analysis and Recognition (ICDAR) (2015)Google Scholar
  23. 23.
    Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. In: IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI) (2015)Google Scholar
  24. 24.
    Pan, Y., Hou, X., Liu, C.: Hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. (TIP) 20, 800–813 (2011)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NIPS) (2015)Google Scholar
  26. 26.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Li, F.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representation (ICLR) (2015)Google Scholar
  28. 28.
    Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  29. 29.
    Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision (ICCV) (2011)Google Scholar
  30. 30.
    Wolf, C., Jolion, J.: Object count / area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. 8, 280–296 (2006)CrossRefGoogle Scholar
  31. 31.
    Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. (TIP) 23(11), 4737–4749 (2014)CrossRefMathSciNetGoogle Scholar
  32. 32.
    Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(9), 1930–1937 (2015)CrossRefGoogle Scholar
  33. 33.
    Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(4), 970–983 (2014)Google Scholar
  34. 34.
    Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  35. 35.
    Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)(2016)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Zhi Tian
    • 1
  • Weilin Huang
    • 1
    • 2
  • Tong He
    • 1
  • Pan He
    • 1
  • Yu Qiao
    • 1
    • 3
  1. 1.Shenzhen Key Lab of Computer Vision and Pattern RecognitionShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
  2. 2.University of OxfordOxfordUK
  3. 3.The Chinese University of Hong KongSha TinHong Kong

Personalised recommendations