Abstract
Scene text detection plays an important role in many computer vision applications. With the help of recent deep learning techniques, multi-oriented text detection that was considered to be quite challenging has been solved to some extent. However, most existing methods still perform poorly for curved text detection, mainly due to the limitation of their text representations (e.g., horizontal boxes, rotated rectangles or quadrangles). To solve this problem, we propose a novel method to detect irregular scene texts based on instance-aware segmentation. The key idea is to design an attention guided semantic segmentation model to precisely label the weighted borders of text regions. Experiments conducted on several widely-used benchmarks demonstrate that our method achieves superior results on curved text datasets (i.e., with F-score 80.1% and 78.8% for the CTW1500 and Total-Text, respectively) and obtains comparable performance on multi-oriented text datasets compared to the state-of-the-art approaches.
Similar content being viewed by others
References
Shi B G, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2550–2558
Tian Z, Huang W L, He T, et al. Detecting text in natural image with connectionist text proposal network. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 56–72
Lyu P Y, Yao C, Wu W H, et al. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7553–7563
Yao C, Bai X, Sang N, et al. Scene text detection via holistic, multi-channel prediction. 2016. ArXiv:1606.09002
Zhang Z, Zhang C Q, Shen W, et al. Multi-oriented text detection with fully convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 4159–4167
He D F, Yang X, Liang C, et al. Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 3519–3528
Wu Y, Natarajan P. Self-organized text detection with minimal post-processing via border learning. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 5000–5009
Polzounov A, Ablavatski A, Escalera S, et al. Wordfence: text detection in natural images with border awareness. In: Proceedings of IEEE International Conference on Image Processing, Beijing, 2017. 1222–1226
Woo S, Park J, Lee J Y, et al. Cbam: convolutional block attention module. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 3–19
Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2963–2970
Neumann L, Matas J. A method for text localization and recognition in real-world images. In: Proceedings of Asian Conference on Computer Vision, Queenstown, 2010. 770–783
Tian S X, Lu S J, Li C S. Wetext: scene text detection under weak supervision. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 1492–1500
Tian S X, Pan Y F, Huang C, et al. Text flow: a unified text detection system in natural scene images. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 4651–4659
Liao M H, Shi B G, Bai X, et al. Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, 2017
Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 21–37
Ma J Q, Shao W Y, Ye H, et al. Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia, 2018, 20: 3111–3122
Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, Palais, 2015. 91–99
Lyu P Y, Yao C, Wu W H, et al. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7553–7563
Xu Y C, Wang Y K, Zhou W, et al. TextField: learning a deep direction field for irregular scene text detection. IEEE Trans Image Process, 2019, 28: 5566–5579
Xue C H, Lu S J, Zhan F N. Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 355–372
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431–3440
Lin T Y, Dollar P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2117–2125
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, 2015. 234–241
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. ArXiv:1409.1556
Milletari F, Navab N, Ahmadi S A. V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of the 4th International Conference on 3D Vision (3DV), California, 2016. 565–571
Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 2315–2324
Yuliang L, Lianwen J, Shuaitao Z, et al. Detecting curve text in the wild: new dataset and new solution. 2017. ArXiv:1712.02170
Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017. 935–942
Karatzas D, Gomez-Bigorda L, Nicolaou A, et al. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Nancy, 2015. 1156–1160
Yao C, Bai X, Liu W Y, et al. Detecting texts of arbitrary orientations in natural images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Providence, 2012. 1083–1090
Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014. ArXiv:1412.6980
Zhou X Y, Yao C, Wen H, et al. EAST: an efficient and accurate scene text detector. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5551–5560
Liu Y L, Jin L W. Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1962–1969
Liu Y L, Jin L W, Zhang S T, et al. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn, 2019, 90: 337–345
Long S B, Ruan J Q, Zhang W J, et al. Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 20–36
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, Santiago, 2015. 1520–1528
Hu H, Zhang C Q, Luo Y X, et al. Wordsup: exploiting word annotations for character based text detection. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 4940–4949
Wang F F, Zhao L M, Li X, et al. Geometry-aware scene text detection with instance transformation network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 1381–1389
Deng D, Liu H F, Li X L, et al. Pixellink: detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, 2018
He W H, Zhang X Y, Yin F, et al. Deep direct regression for multi-oriented scene text detection. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 745–753
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 61672056, 61672043) and Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, J., Lian, Z., Wang, Y. et al. Irregular scene text detection via attention guided border labeling. Sci. China Inf. Sci. 62, 220103 (2019). https://doi.org/10.1007/s11432-019-2673-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2673-8