Skip to main content
Log in

Shape awareness and structure-preserving network for arbitrary shape text detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, scene text detection has witnessed rapid advancement. However, there still exits two limitations: (1) boundary information is processed with color, texture information together inside a deep CNN, this however may not be ideal as they have different type of information relevant for adjacent text discrimination; (2) previous methods are lack of text structure preservation, which prevents network to accurately localization when enlarging receptive fields. In this paper, we propose two modules named Gate Convolution Module (GCM) and Tree Filter Module (TFM) respectively. GCM is a separate processing branch which leverages text shape information to split the close text instances. TFM models long-range dependencies while preserving the text details by exploiting the structural property of minimal spanning tree. Benefiting from two modules, our method effectively separates the text instances which are close to each other, while preserving detailed text structure. Extensive experiments on four standard text benchmarks (ICDAR2015, MSRA-TD500, CTW1500 and Total-Text) demonstrate that our method achieves the excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proc. CVPR, pp 9365–9374

  2. Cai W, Wei Z, PiiGAN (2020) Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451–48463

    Article  Google Scholar 

  3. Chng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: Proc. ICDAR, pp 935–942

  4. Deng J, Dong W, Socher R, Li L, Li K, Li F (2009) Imagenet: A large-scale hierarchical image database. In: Proc. CVPR, pp 248–255

  5. Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proc. AAAI, pp 6773–6780

  6. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. CVPR, pp 770–778

  7. He K, Gkioxari G, Dolla ́r P, Gir-shick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  8. He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: Proc. ICCV, pp 745–753

  9. Hu S, Wang G, Wang Y, Chen C, Pan Z (2020) Accurate image super-resolution using dense connections and dimension reduction network. Multimed Tools Appl 79:1427–1443

    Article  Google Scholar 

  10. ICDAR (2019) Robust Reading Challenge on Multi-lingual scene text detection and recognition. https://rrc.cvc.uab.es/?ch=15

  11. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh SK, Bagdanov AD, Iwamura M, Matas J, Neumann L, Chan- drasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: Proc. ICDAR, pp 1156–1160

  12. Liao M, Shi B, Bai X, Textboxes++ (2018) A single-shot oriented scene text detector. IEEE Trans Image Processing 27(8):3676–3690

    Article  MathSciNet  Google Scholar 

  13. Liao M, Zhu Z, Shi B, Xia G, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proc. CVPR, pp 5909–5918

  14. Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: Proc. AAAI

  15. Lin T, Dolla ́r P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proc. CVPR, 936–944

  16. Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018) Learning markov clustering networks for scene text detection. In: Proc. CVPR, pp 6936–6944

  17. Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn 90:337–345

    Article  Google Scholar 

  18. Liu Z, Lin G, Yang S, Liu F, Lin W, Goh WL (2019) Towards robust curve text detection with conditional spatial expansion. In: Proc. CVPR, pp 7269–7278

  19. Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020) ABCNet: Real-time scene text spotting with adaptive bezier-curve network. In: Proc. CVPR, pp 9809–9818

  20. Liu C, Xie H, Zha Z, Yu L, Chen Z, Zhang Y (2020) Bidi-rectional attention-recognition model for fine-grained object classification. IEEE Trans Multimed 22(7):1785–1795

    Article  Google Scholar 

  21. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. CVPR, pp 3431–3440

  22. Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible repre- sentation for detecting text of arbitrary shapes. In: Proc. ECCV, pp 19–35

  23. Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proc. ECCV, pp 20–36

  24. Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. CVPR, pp 7553–7563

  25. Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. CVPR, pp 7553–7563

  26. Lyu P, Liao M, Yao C, Wu W, Bai X (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proc. ECCV, pp 67–83

  27. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  28. Tian S, Lu S, Li C (2017) Wetext: Scene text detection under weak supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1492–1500

  29. Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4234–4243

  30. Tian C, Xu Y, Zuo W, Zhang B, Fei L, Lin C (n.d.) Coarse-to-Fine CNN for Image Super-Resolution. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2020.2999182

  31. Van den Oord A, Kalchbrenner N, Espeholt L, Vinyals O, Graves A et al (2016) Conditional image generation with pixelcnn decoders. In: Proc. NIPS, pp 4790–4798

  32. Wang Y, Xie H, Fu Z, Zhang Y (2019) Dsrn: a deep scale relationship network for scene text detection. In: Proc. AAAI, pp 947–953

  33. Wang X, Jiang Y, Luo Z, Liu C, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6449–6458

  34. Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp 8440–8449

  35. Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proc. CVPR, pp 9336–9345

  36. Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proc. ICCV, pp 8440–8449

  37. Wang X, Jiang Y, Luo Z, Liu C-L, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proc. CVPR, pp 6449–6458

  38. Wang Y, Wang G, Chen C, Pan Z (2019) Multi-scale dilated convolution of convolutional neural network for image denoising. Multimed Tools Appl 78:19945–19960

    Article  Google Scholar 

  39. Wang Y, Xie H, Zha Z, Xing M, Fu Z, Zhang Y (2020) ContourNet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proc. CVPR, pp 11753–11762

  40. Wang w, Xie E, Liu X, Wang W, Liang D, Shen C, Bai X (2020) Scene text image super-resolution in the wild. In: Proc. ECCV

  41. Wang Z, Zou C, Cai W (2020) Small sample classification of hyperspectral remote sensing images based on sequential joint deeping learning model. IEEE Access 8:71353–71363

    Article  Google Scholar 

  42. Wang Y, Hu S, Wang G, Chen C, Pan Z (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimed Tools Appl 79:1057–1073

    Article  Google Scholar 

  43. Xie L, Liu Y, Jin L, Xie Z (2019) Derpn: Taking a fur- ther step toward more general object detection. In: Proc. AAAI, pp 33:9046–9053

  44. Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. Proc. AAAI 33:9038–9045

  45. Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X, Textfield (2019) Learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579

    Article  MathSciNet  Google Scholar 

  46. Xue C, Lu S, Zhan F (2018) Accurate scene text detection through border semantics awareness and bootstrapping. In: Proc. ECCV, pp 355–372

  47. Xue C, Lu S, Zhang W (2019) Msr: multi-scale shape regression for scene text detection. In: Proc. AAAI, pp 989–995

  48. Xue C, Lu S, Zhang W (2019) MSR: multi-scale shape regression for scene text detection. In: Pro. IJCAI, pp 989–995

  49. Yang Q (2015) Stereo matching using tree filtering. IEEE Trans Pattern Anal Mach Intell 37(4):834–846

    Article  Google Scholar 

  50. Yang Z, Guo X, Chen Z, Huang Y, Zhang Y (2019) RNN-Stega: Linguistic Steganography Based on Recurrent Neural Networks. IEEE Trans Inf Forensics Secur 14(5):1280–1295

    Article  Google Scholar 

  51. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: Proc. CVPR, pp 1089–1090

  52. Yao C, Bai X, Liu W (2014) A unified framework for multi-oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749

    Article  MathSciNet  Google Scholar 

  53. You H, Tian S, Yu L, Lv Y (2020) Pixel-level remote sensing image recognition based on bidirectional word vectors. IEEE Trans Geosci Remote Sens 58(2):1281–1293

  54. Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 10552–10561

  55. Zhang S, Zhu X, Hou J, Liu C, Yang C, Wang H, Yin X (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: Proc. CVPR, pp 9699–9708

  56. Zhang S, Lu C, Jiang S, Shan L, Xiong N (2020) An unmanned intelligent transportation scheduling system for open-pit mine vehicles based on 5G and big data. IEEE Access 8:135524–135539

  57. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: Proc. CVPR, pp 2642–2651

  58. Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: Proc. CVPR, pp 9308–9316

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Shandong Province (ZR2019MF050), the Shandong Province colleges and universities youth innovation technology plan innovation team project under Grant (No.2020KJN011).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guodong Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, Q., Wang, G. Shape awareness and structure-preserving network for arbitrary shape text detection. Multimed Tools Appl 80, 10761–10775 (2021). https://doi.org/10.1007/s11042-020-10039-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10039-9

Keywords

Navigation