Skip to main content

Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11166))

Included in the following conference series:

  • 3095 Accesses

Abstract

In the past few years, text detection in natural scenes has attracted increasing attention due to many real-world applications. Most existing methods only detect horizontal or nearly horizontal texts and have complicated processes. When using the neural network to detect text in the image, some ambiguity and small words are easy to be ignored because of many pooling operations. Therefore, this paper proposes an end-to-end trainable neural network for detecting multi-oriented text lines or words in natural scene images. The network fuses multi-level features and is guided by deep supervision during training. In this way, richer hierarchical representations can be learned automatically. The network makes two kinds of predictions: text/no text classification and location regression, thus we can directly locate multi-oriented words or text lines without other unnecessary intermediate steps. Experimental results on the ICDAR 2015 datasets and MSRA-TD500 datasets have proven that the proposed method outperforms the state-of-the-art methods by a noticeable margin on F-score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4

    Chapter  Google Scholar 

  2. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)

    Google Scholar 

  3. Zhong, Z., Jin, L., Zhang, S., Feng, Z.: DeepText: a unified framework for text proposal generation and text detection in natural images. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–18 (2017)

    Google Scholar 

  4. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60

    Chapter  Google Scholar 

  5. Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4651–4659 (2015)

    Google Scholar 

  6. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  7. Zhou, X., et al.: EAST: an efficient and accurate scene text detector (2017)

    Google Scholar 

  8. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1156–1160 (2015)

    Google Scholar 

  9. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1083–1090 (2012)

    Google Scholar 

  10. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform (2010)

    Google Scholar 

  11. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34

    Chapter  Google Scholar 

  12. Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43

    Chapter  Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)

    Google Scholar 

  14. Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017)

    Article  Google Scholar 

  15. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  16. Impiombato, D., Giarrusso, S., et al.: You only look once: unified, real-time object detection Joseph. Nucl. Instrum. Methods Phys. Res. Sect. A 794, 185–192 (2015)

    Article  Google Scholar 

  17. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation PPT. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 8828, pp. 3431–3440 (2015)

    Google Scholar 

  18. Xie, S., Tu, Z.: Holistically-nested edge detection. Int. J. Comput. Vis. 125, 3–18 (2017)

    Article  MathSciNet  Google Scholar 

  19. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28

    Chapter  Google Scholar 

  20. Jiang, Y., Cao, Z., et al.: UnitBox: an advanced object detection network. In: ACM on Multimedia Conference, pp. 516–520 (2016)

    Google Scholar 

  21. Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soci. 23, 4737–4749 (2014)

    Article  MathSciNet  Google Scholar 

  22. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. Int. Conf. Learn. Represent. 2015, 1–15 (2015)

    Google Scholar 

  23. Abadi, M., et al: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–284 (2016)

    Google Scholar 

  24. He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  25. Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction, pp. 1–10 (2016)

    Google Scholar 

  26. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. CVPR 363, 4159–4167 (2016)

    Google Scholar 

  27. Yin, X.-C., Pei, W.-Y., Zhang, J., Hao, H.-W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1930–1937 (2015)

    Article  Google Scholar 

  28. Kang, L., Li, Y., Doermann, D.: Orientation robust text line detection in natural images. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4034–4041 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoning Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, N., Jin, X., Li, X. (2018). Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00764-5_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00763-8

  • Online ISBN: 978-3-030-00764-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics