Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network

Zhang, Nan; Jin, Xiaoning; Li, Xiaowei

doi:10.1007/978-3-030-00764-5_40

Nan Zhang¹⁸,
Xiaoning Jin¹⁸ &
Xiaowei Li¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11166))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3095 Accesses

Abstract

In the past few years, text detection in natural scenes has attracted increasing attention due to many real-world applications. Most existing methods only detect horizontal or nearly horizontal texts and have complicated processes. When using the neural network to detect text in the image, some ambiguity and small words are easy to be ignored because of many pooling operations. Therefore, this paper proposes an end-to-end trainable neural network for detecting multi-oriented text lines or words in natural scene images. The network fuses multi-level features and is guided by deep supervision during training. In this way, richer hierarchical representations can be learned automatically. The network makes two kinds of predictions: text/no text classification and location regression, thus we can directly locate multi-oriented words or text lines without other unnecessary intermediate steps. Experimental results on the ICDAR 2015 datasets and MSRA-TD500 datasets have proven that the proposed method outperforms the state-of-the-art methods by a noticeable margin on F-score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
Google Scholar
Zhong, Z., Jin, L., Zhang, S., Feng, Z.: DeepText: a unified framework for text proposal generation and text detection in natural images. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1–18 (2017)
Google Scholar
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60
Chapter Google Scholar
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4651–4659 (2015)
Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector (2017)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 1156–1160 (2015)
Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1083–1090 (2012)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform (2010)
Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 512–528. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_34
Chapter Google Scholar
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Impiombato, D., Giarrusso, S., et al.: You only look once: unified, real-time object detection Joseph. Nucl. Instrum. Methods Phys. Res. Sect. A 794, 185–192 (2015)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation PPT. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 8828, pp. 3431–3440 (2015)
Google Scholar
Xie, S., Tu, Z.: Holistically-nested edge detection. Int. J. Comput. Vis. 125, 3–18 (2017)
Article MathSciNet Google Scholar
Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28
Chapter Google Scholar
Jiang, Y., Cao, Z., et al.: UnitBox: an advanced object detection network. In: ACM on Multimedia Conference, pp. 516–520 (2016)
Google Scholar
Yao, C., Bai, X., Liu, W.: A unified framework for multioriented text detection and recognition. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soci. 23, 4737–4749 (2014)
Article MathSciNet Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. Int. Conf. Learn. Represent. 2015, 1–15 (2015)
Google Scholar
Abadi, M., et al: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–284 (2016)
Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE International Conference on Computer Vision (2017)
Google Scholar
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction, pp. 1–10 (2016)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. CVPR 363, 4159–4167 (2016)
Google Scholar
Yin, X.-C., Pei, W.-Y., Zhang, J., Hao, H.-W.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1930–1937 (2015)
Article Google Scholar
Kang, L., Li, Y., Doermann, D.: Orientation robust text line detection in natural images. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4034–4041 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Advanced Innovation Center for Future Internet Technology, Beijing University of Technology, Beijing, China
Nan Zhang, Xiaoning Jin & Xiaowei Li

Authors

Nan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoning Jin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoning Jin .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, N., Jin, X., Li, X. (2018). Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11166. Springer, Cham. https://doi.org/10.1007/978-3-030-00764-5_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-00764-5_40
Published: 18 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00763-8
Online ISBN: 978-3-030-00764-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics