Skip to main content
Log in

Irregular scene text detection via attention guided border labeling

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Scene text detection plays an important role in many computer vision applications. With the help of recent deep learning techniques, multi-oriented text detection that was considered to be quite challenging has been solved to some extent. However, most existing methods still perform poorly for curved text detection, mainly due to the limitation of their text representations (e.g., horizontal boxes, rotated rectangles or quadrangles). To solve this problem, we propose a novel method to detect irregular scene texts based on instance-aware segmentation. The key idea is to design an attention guided semantic segmentation model to precisely label the weighted borders of text regions. Experiments conducted on several widely-used benchmarks demonstrate that our method achieves superior results on curved text datasets (i.e., with F-score 80.1% and 78.8% for the CTW1500 and Total-Text, respectively) and obtains comparable performance on multi-oriented text datasets compared to the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2550–2558

  2. Tian Z, Huang W L, He T, et al. Detecting text in natural image with connectionist text proposal network. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 56–72

  3. Lyu P Y, Yao C, Wu W H, et al. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7553–7563

  4. Yao C, Bai X, Sang N, et al. Scene text detection via holistic, multi-channel prediction. 2016. ArXiv:1606.09002

  5. Zhang Z, Zhang C Q, Shen W, et al. Multi-oriented text detection with fully convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 4159–4167

  6. He D F, Yang X, Liang C, et al. Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 3519–3528

  7. Wu Y, Natarajan P. Self-organized text detection with minimal post-processing via border learning. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 5000–5009

  8. Polzounov A, Ablavatski A, Escalera S, et al. Wordfence: text detection in natural images with border awareness. In: Proceedings of IEEE International Conference on Image Processing, Beijing, 2017. 1222–1226

  9. Woo S, Park J, Lee J Y, et al. Cbam: convolutional block attention module. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 3–19

  10. Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2963–2970

  11. Neumann L, Matas J. A method for text localization and recognition in real-world images. In: Proceedings of Asian Conference on Computer Vision, Queenstown, 2010. 770–783

  12. Tian S X, Lu S J, Li C S. Wetext: scene text detection under weak supervision. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 1492–1500

  13. Tian S X, Pan Y F, Huang C, et al. Text flow: a unified text detection system in natural scene images. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 4651–4659

  14. Liao M H, Shi B G, Bai X, et al. Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017

  15. Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 21–37

  16. Ma J Q, Shao W Y, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia, 2018, 20: 3111–3122

    Article  Google Scholar 

  17. Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, Palais, 2015. 91–99

  18. Lyu P Y, Yao C, Wu W H, et al. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7553–7563

  19. Xu Y C, Wang Y K, Zhou W, et al. TextField: learning a deep direction field for irregular scene text detection. IEEE Trans Image Process, 2019, 28: 5566–5579

    Article  MathSciNet  MATH  Google Scholar 

  20. Xue C H, Lu S J, Zhan F N. Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 355–372

  21. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431–3440

  22. Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2117–2125

  23. Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, 2015. 234–241

  24. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. ArXiv:1409.1556

  25. Milletari F, Navab N, Ahmadi S A. V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the 4th International Conference on 3D Vision (3DV), California, 2016. 565–571

  26. Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 2315–2324

  27. Yuliang L, Lianwen J, Shuaitao Z, et al. Detecting curve text in the wild: new dataset and new solution. 2017. ArXiv:1712.02170

  28. Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017. 935–942

  29. Karatzas D, Gomez-Bigorda L, Nicolaou A, et al. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Nancy, 2015. 1156–1160

  30. Yao C, Bai X, Liu W Y, et al. Detecting texts of arbitrary orientations in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 1083–1090

  31. Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014. ArXiv:1412.6980

  32. Zhou X Y, Yao C, Wen H, et al. EAST: an efficient and accurate scene text detector. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5551–5560

  33. Liu Y L, Jin L W. Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1962–1969

  34. Liu Y L, Jin L W, Zhang S T, et al. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn, 2019, 90: 337–345

    Article  Google Scholar 

  35. Long S B, Ruan J Q, Zhang W J, et al. Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 20–36

  36. Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 1520–1528

  37. Hu H, Zhang C Q, Luo Y X, et al. Wordsup: exploiting word annotations for character based text detection. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 4940–4949

  38. Wang F F, Zhao L M, Li X, et al. Geometry-aware scene text detection with instance transformation network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1381–1389

  39. Deng D, Liu H F, Li X L, et al. Pixellink: detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018

  40. He W H, Zhang X Y, Yin F, et al. Deep direct regression for multi-oriented scene text detection. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 745–753

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61672056, 61672043) and Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhouhui Lian.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Lian, Z., Wang, Y. et al. Irregular scene text detection via attention guided border labeling. Sci. China Inf. Sci. 62, 220103 (2019). https://doi.org/10.1007/s11432-019-2673-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2673-8

Keywords

Navigation