Abstract
Recently, scene text detection has witnessed rapid advancement. However, there still exits two limitations: (1) boundary information is processed with color, texture information together inside a deep CNN, this however may not be ideal as they have different type of information relevant for adjacent text discrimination; (2) previous methods are lack of text structure preservation, which prevents network to accurately localization when enlarging receptive fields. In this paper, we propose two modules named Gate Convolution Module (GCM) and Tree Filter Module (TFM) respectively. GCM is a separate processing branch which leverages text shape information to split the close text instances. TFM models long-range dependencies while preserving the text details by exploiting the structural property of minimal spanning tree. Benefiting from two modules, our method effectively separates the text instances which are close to each other, while preserving detailed text structure. Extensive experiments on four standard text benchmarks (ICDAR2015, MSRA-TD500, CTW1500 and Total-Text) demonstrate that our method achieves the excellent performance.
Similar content being viewed by others
References
Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proc. CVPR, pp 9365–9374
Cai W, Wei Z, PiiGAN (2020) Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463
Chng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: Proc. ICDAR, pp 935–942
Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) Imagenet: A large-scale hierarchical image database. In: Proc. CVPR, pp 248–255
Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proc. AAAI, pp 6773–6780
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. CVPR, pp 770–778
He K, Gkioxari G, Dolla ́r P, Gir-shick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: Proc. ICCV, pp 745–753
Hu S, Wang G, Wang Y, Chen C, Pan Z (2020) Accurate image super-resolution using dense connections and dimension reduction network. Multimed Tools Appl 79:1427–1443
ICDAR (2019) Robust Reading Challenge on Multi-lingual scene text detection and recognition. https://rrc.cvc.uab.es/?ch=15
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh SK, Bagdanov AD, Iwamura M, Matas J, Neumann L, Chan- drasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: Proc. ICDAR, pp 1156–1160
Liao M, Shi B, Bai X, Textboxes++ (2018) A single-shot oriented scene text detector. IEEE Trans Image Processing 27(8):3676–3690
Liao M, Zhu Z, Shi B, Xia G, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proc. CVPR, pp 5909–5918
Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: Proc. AAAI
Lin T, Dolla ́r P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proc. CVPR, 936–944
Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018) Learning markov clustering networks for scene text detection. In: Proc. CVPR, pp 6936–6944
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn 90:337–345
Liu Z, Lin G, Yang S, Liu F, Lin W, Goh WL (2019) Towards robust curve text detection with conditional spatial expansion. In: Proc. CVPR, pp 7269–7278
Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020) ABCNet: Real-time scene text spotting with adaptive bezier-curve network. In: Proc. CVPR, pp 9809–9818
Liu C, Xie H, Zha Z, Yu L, Chen Z, Zhang Y (2020) Bidi-rectional attention-recognition model for fine-grained object classification. IEEE Trans Multimed 22(7):1785–1795
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. CVPR, pp 3431–3440
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible repre- sentation for detecting text of arbitrary shapes. In: Proc. ECCV, pp 19–35
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proc. ECCV, pp 20–36
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. CVPR, pp 7553–7563
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. CVPR, pp 7553–7563
Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proc. ECCV, pp 67–83
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Tian S, Lu S, Li C (2017) Wetext: Scene text detection under weak supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1492–1500
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4234–4243
Tian C, Xu Y, Zuo W, Zhang B, Fei L, Lin C (n.d.) Coarse-to-Fine CNN for Image Super-Resolution. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2020.2999182
Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016) Conditional image generation with pixelcnn decoders. In: Proc. NIPS, pp 4790–4798
Wang Y, Xie H, Fu Z, Zhang Y (2019) Dsrn: a deep scale relationship network for scene text detection. In: Proc. AAAI, pp 947–953
Wang X, Jiang Y, Luo Z, Liu C, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6449–6458
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 8440–8449
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proc. CVPR, pp 9336–9345
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proc. ICCV, pp 8440–8449
Wang X, Jiang Y, Luo Z, Liu C-L, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proc. CVPR, pp 6449–6458
Wang Y, Wang G, Chen C, Pan Z (2019) Multi-scale dilated convolution of convolutional neural network for image denoising. Multimed Tools Appl 78:19945–19960
Wang Y, Xie H, Zha Z, Xing M, Fu Z, Zhang Y (2020) ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proc. CVPR, pp 11753–11762
Wang w, Xie E, Liu X, Wang W, Liang D, Shen C, Bai X (2020) Scene text image super-resolution in the wild. In: Proc. ECCV
Wang Z, Zou C, Cai W (2020) Small sample classification of hyperspectral remote sensing images based on sequential joint deeping learning model. IEEE Access 8:71353–71363
Wang Y, Hu S, Wang G, Chen C, Pan Z (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tools Appl 79:1057–1073
Xie L, Liu Y, Jin L, Xie Z (2019) Derpn: Taking a fur- ther step toward more general object detection. In: Proc. AAAI, pp 33:9046–9053
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. Proc. AAAI 33:9038–9045
Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X, Textfield (2019) Learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579
Xue C, Lu S, Zhan F (2018) Accurate scene text detection through border semantics awareness and bootstrapping. In: Proc. ECCV, pp 355–372
Xue C, Lu S, Zhang W (2019) Msr: multi-scale shape regression for scene text detection. In: Proc. AAAI, pp 989–995
Xue C, Lu S, Zhang W (2019) MSR: multi-scale shape regression for scene text detection. In: Pro. IJCAI, pp 989–995
Yang Q (2015) Stereo matching using tree filtering. IEEE Trans Pattern Anal Mach Intell 37(4):834–846
Yang Z, Guo X, Chen Z, Huang Y, Zhang Y (2019) RNN-Stega: Linguistic Steganography Based on Recurrent Neural Networks. IEEE Trans Inf Forensics Secur 14(5):1280–1295
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: Proc. CVPR, pp 1089–1090
Yao C, Bai X, Liu W (2014) A unified framework for multi-oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749
You H, Tian S, Yu L, Lv Y (2020) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293
Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10552–10561
Zhang S, Zhu X, Hou J, Liu C, Yang C, Wang H, Yin X (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: Proc. CVPR, pp 9699–9708
Zhang S, Lu C, Jiang S, Shan L, Xiong N (2020) An unmanned intelligent transportation scheduling system for open-pit mine vehicles based on 5G and big data. IEEE Access 8:135524–135539
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proc. CVPR, pp 2642–2651
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proc. CVPR, pp 9308–9316
Acknowledgements
This work is supported by the Natural Science Foundation of Shandong Province (ZR2019MF050), the Shandong Province colleges and universities youth innovation technology plan innovation team project under Grant (No.2020KJN011).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cheng, Q., Wang, G. Shape awareness and structure-preserving network for arbitrary shape text detection. Multimed Tools Appl 80, 10761–10775 (2021). https://doi.org/10.1007/s11042-020-10039-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10039-9