Shape awareness and structure-preserving network for arbitrary shape text detection

Cheng, Qi; Wang, Guodong

doi:10.1007/s11042-020-10039-9

Shape awareness and structure-preserving network for arbitrary shape text detection

Published: 02 January 2021

Volume 80, pages 10761–10775, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qi Cheng¹ &
Guodong Wang¹

322 Accesses
1 Citation
Explore all metrics

Abstract

Recently, scene text detection has witnessed rapid advancement. However, there still exits two limitations: (1) boundary information is processed with color, texture information together inside a deep CNN, this however may not be ideal as they have different type of information relevant for adjacent text discrimination; (2) previous methods are lack of text structure preservation, which prevents network to accurately localization when enlarging receptive fields. In this paper, we propose two modules named Gate Convolution Module (GCM) and Tree Filter Module (TFM) respectively. GCM is a separate processing branch which leverages text shape information to split the close text instances. TFM models long-range dependencies while preserving the text details by exploiting the structural property of minimal spanning tree. Benefiting from two modules, our method effectively separates the text instances which are close to each other, while preserving detailed text structure. Extensive experiments on four standard text benchmarks (ICDAR2015, MSRA-TD500, CTW1500 and Total-Text) demonstrate that our method achieves the excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Progressive Scale Expansion Network with Octave Convolution for Arbitrary Shape Scene Text Detection

TextPolar: irregular scene text detection using polar representation

Article 23 May 2021

Arbitrary-shaped scene text detection by predicting distance map

Article 07 March 2022

References

Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proc. CVPR, pp 9365–9374
Cai W, Wei Z, PiiGAN (2020) Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463
Article Google Scholar
Chng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: Proc. ICDAR, pp 935–942
Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) Imagenet: A large-scale hierarchical image database. In: Proc. CVPR, pp 248–255
Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proc. AAAI, pp 6773–6780
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. CVPR, pp 770–778
He K, Gkioxari G, Dolla ́r P, Gir-shick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: Proc. ICCV, pp 745–753
Hu S, Wang G, Wang Y, Chen C, Pan Z (2020) Accurate image super-resolution using dense connections and dimension reduction network. Multimed Tools Appl 79:1427–1443
Article Google Scholar
ICDAR (2019) Robust Reading Challenge on Multi-lingual scene text detection and recognition. https://rrc.cvc.uab.es/?ch=15
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh SK, Bagdanov AD, Iwamura M, Matas J, Neumann L, Chan- drasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: Proc. ICDAR, pp 1156–1160
Liao M, Shi B, Bai X, Textboxes++ (2018) A single-shot oriented scene text detector. IEEE Trans Image Processing 27(8):3676–3690
Article MathSciNet Google Scholar
Liao M, Zhu Z, Shi B, Xia G, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proc. CVPR, pp 5909–5918
Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: Proc. AAAI
Lin T, Dolla ́r P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proc. CVPR, 936–944
Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018) Learning markov clustering networks for scene text detection. In: Proc. CVPR, pp 6936–6944
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn 90:337–345
Article Google Scholar
Liu Z, Lin G, Yang S, Liu F, Lin W, Goh WL (2019) Towards robust curve text detection with conditional spatial expansion. In: Proc. CVPR, pp 7269–7278
Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020) ABCNet: Real-time scene text spotting with adaptive bezier-curve network. In: Proc. CVPR, pp 9809–9818
Liu C, Xie H, Zha Z, Yu L, Chen Z, Zhang Y (2020) Bidi-rectional attention-recognition model for fine-grained object classification. IEEE Trans Multimed 22(7):1785–1795
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. CVPR, pp 3431–3440
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible repre- sentation for detecting text of arbitrary shapes. In: Proc. ECCV, pp 19–35
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proc. ECCV, pp 20–36
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. CVPR, pp 7553–7563
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. CVPR, pp 7553–7563
Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proc. ECCV, pp 67–83
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Article Google Scholar
Tian S, Lu S, Li C (2017) Wetext: Scene text detection under weak supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1492–1500
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4234–4243
Tian C, Xu Y, Zuo W, Zhang B, Fei L, Lin C (n.d.) Coarse-to-Fine CNN for Image Super-Resolution. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2020.2999182
Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016) Conditional image generation with pixelcnn decoders. In: Proc. NIPS, pp 4790–4798
Wang Y, Xie H, Fu Z, Zhang Y (2019) Dsrn: a deep scale relationship network for scene text detection. In: Proc. AAAI, pp 947–953
Wang X, Jiang Y, Luo Z, Liu C, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6449–6458
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 8440–8449
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proc. CVPR, pp 9336–9345
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proc. ICCV, pp 8440–8449
Wang X, Jiang Y, Luo Z, Liu C-L, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proc. CVPR, pp 6449–6458
Wang Y, Wang G, Chen C, Pan Z (2019) Multi-scale dilated convolution of convolutional neural network for image denoising. Multimed Tools Appl 78:19945–19960
Article Google Scholar
Wang Y, Xie H, Zha Z, Xing M, Fu Z, Zhang Y (2020) ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proc. CVPR, pp 11753–11762
Wang w, Xie E, Liu X, Wang W, Liang D, Shen C, Bai X (2020) Scene text image super-resolution in the wild. In: Proc. ECCV
Wang Z, Zou C, Cai W (2020) Small sample classification of hyperspectral remote sensing images based on sequential joint deeping learning model. IEEE Access 8:71353–71363
Article Google Scholar
Wang Y, Hu S, Wang G, Chen C, Pan Z (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tools Appl 79:1057–1073
Article Google Scholar
Xie L, Liu Y, Jin L, Xie Z (2019) Derpn: Taking a fur- ther step toward more general object detection. In: Proc. AAAI, pp 33:9046–9053
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. Proc. AAAI 33:9038–9045
Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X, Textfield (2019) Learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579
Article MathSciNet Google Scholar
Xue C, Lu S, Zhan F (2018) Accurate scene text detection through border semantics awareness and bootstrapping. In: Proc. ECCV, pp 355–372
Xue C, Lu S, Zhang W (2019) Msr: multi-scale shape regression for scene text detection. In: Proc. AAAI, pp 989–995
Xue C, Lu S, Zhang W (2019) MSR: multi-scale shape regression for scene text detection. In: Pro. IJCAI, pp 989–995
Yang Q (2015) Stereo matching using tree filtering. IEEE Trans Pattern Anal Mach Intell 37(4):834–846
Article Google Scholar
Yang Z, Guo X, Chen Z, Huang Y, Zhang Y (2019) RNN-Stega: Linguistic Steganography Based on Recurrent Neural Networks. IEEE Trans Inf Forensics Secur 14(5):1280–1295
Article Google Scholar
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: Proc. CVPR, pp 1089–1090
Yao C, Bai X, Liu W (2014) A unified framework for multi-oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749
Article MathSciNet Google Scholar
You H, Tian S, Yu L, Lv Y (2020) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293
Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10552–10561
Zhang S, Zhu X, Hou J, Liu C, Yang C, Wang H, Yin X (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: Proc. CVPR, pp 9699–9708
Zhang S, Lu C, Jiang S, Shan L, Xiong N (2020) An unmanned intelligent transportation scheduling system for open-pit mine vehicles based on 5G and big data. IEEE Access 8:135524–135539
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proc. CVPR, pp 2642–2651
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proc. CVPR, pp 9308–9316

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Shandong Province (ZR2019MF050), the Shandong Province colleges and universities youth innovation technology plan innovation team project under Grant (No.2020KJN011).

Author information

Authors and Affiliations

College of Computer Science and Technology, Qingdao University, Qingdao, China
Qi Cheng & Guodong Wang

Authors

Qi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guodong Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, Q., Wang, G. Shape awareness and structure-preserving network for arbitrary shape text detection. Multimed Tools Appl 80, 10761–10775 (2021). https://doi.org/10.1007/s11042-020-10039-9

Download citation

Received: 23 July 2020
Revised: 20 September 2020
Accepted: 06 October 2020
Published: 02 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10039-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shape awareness and structure-preserving network for arbitrary shape text detection

Abstract

Access this article

Similar content being viewed by others

Progressive Scale Expansion Network with Octave Convolution for Arbitrary Shape Scene Text Detection

TextPolar: irregular scene text detection using polar representation

Arbitrary-shaped scene text detection by predicting distance map

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Shape awareness and structure-preserving network for arbitrary shape text detection

Abstract

Access this article

Similar content being viewed by others

Progressive Scale Expansion Network with Octave Convolution for Arbitrary Shape Scene Text Detection

TextPolar: irregular scene text detection using polar representation

Arbitrary-shaped scene text detection by predicting distance map

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation