Arbitrary-shaped scene text detection with keypoint-based shape representation

Qin, Shuxin; Chen, Lin

doi:10.1007/s10032-022-00396-6

Arbitrary-shaped scene text detection with keypoint-based shape representation

Original Paper
Published: 25 March 2022

Volume 25, pages 115–127, (2022)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

389 Accesses
2 Citations
Explore all metrics

Abstract

Recently scene text detection has become a hot research topic. Arbitrary-shaped text detection is more challenging due to the irregular geometry of the texts such as long curved shapes. Most existing works attempt to solve the problem by using bottom-up methods, followed by heuristic post-processing, or top-down methods with boundary regression. Through analysis and comparison, we present an efficient framework to detect arbitrary-shaped text by fusing bottom-up and top-down methods. Specifically, we use a segmentation method as the bottom-up detector to regress the text areas. We employ an anchor-free method as the top-down detector to represent and distinguish each text based on the results of bottom-up detector. To detect text with arbitrary shapes, we propose a keypoint-based shape representation method, which treats a text as several keypoints linked together. Then, keypoints are regressed by the top-down detector. With the keypoint-based shape representation, the detected text can be easily rectified by Thin Plate Spline (TPS) transformation, and the framework can be directly extended to support end-to-end text spotting. Extensive experiments on several public benchmarks, including both regular-shaped and arbitrary-shaped scene texts in natural images, demonstrate that our method has achieved state-of-the-art performance .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

Article 24 October 2020

Accurate Arbitrary-Shaped Scene Text Detection via Iterative Polynomial Parameter Regression

Arbitrary-shaped scene text detection by predicting distance map

Article 07 March 2022

References

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 779–788, (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv:1904.07850 (2019)
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 6569–6578, (2019)
Long, S., He, X., Yao, C.: Scene text detection and recognition: te deep learning era. Int. J. Comput. Vision 129(1), 161–184 (2021)
Article Google Scholar
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015). https://doi.org/10.1109/TPAMI.2014.2366765
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 3431–3440, (2015)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 19–35, (2018)
Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 9336–9345, (2019a)
Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 8440–8449, (2019b)
Wang, X., Jiang, Y., Luo, Z., Liu, C.L., Choi, H, Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 6449–6458, (2019c)
Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process 28(11), 5566–5579 (2019). https://doi.org/10.1109/TIP.2019.2900589
Article MathSciNet MATH Google Scholar
Liu, Z., Lin, G., Goh, W.L.: Bottom-up scene text detection with markov clustering networks. Int. J. Comput. Vision 128(6), 1786–1809 (2020). https://doi.org/10.1007/s11263-020-01298-y
Article MathSciNet MATH Google Scholar
Zhu, Y., Du, J.: Textmountain: accurate scene text detection via instance segmentation. Pattern Recognit. (2021). https://doi.org/10.1016/j.patcog.2020.107336
Article Google Scholar
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: An efficient and accurate scene text detector. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 5551–5560, (2017)
Yang, Q., Cheng, M., Zhou, W., Chen, Y., Qiu, M., Lin, W.: Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. In: Proc. Int. Joint Conf. Artif. Intell. (IJCAI), pp 1071–1077, (2018)
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018). https://doi.org/10.1109/TIP.2018.2825107
Article MathSciNet MATH Google Scholar
Liu, X., Zhang, R., Zhou, Y., Wang, D.: Scene text detection with feature pyramid network and linking segments. In: Proc. Int. Conf. Doc. Anal. Recog. (ICDAR), pp 508–513, (2019) https://doi.org/10.1109/ICDAR.2019.00087
Huang, Z., Zhong, Z., Sun, L., Huo, Q.: Mask r-cnn with pyramid attention network for scene text detection. In: Proc. IEEE Winter Conf. Applications of Computer Vision (WACV), pp 764–772, (2019) https://doi.org/10.1109/WACV.2019.00086
Xie, E., Zang, Y., Shao, S., Yu, G., Yao, C., Li, G.: Scene text detection with supervised pyramid context network. Proc. AAAI Conf. Artif. Intell. (AAAI) 33(01), 9038–9045 (2019). https://doi.org/10.1609/aaai.v33i01.33019038
Article Google Scholar
Liu, Z., Lin, G., Yang, S., Liu, F., Lin, W., Goh, W.L.: Towards robust curve text detection with conditional spatial expansion. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 7269–7278, (2019)
Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., Ding, X.: Look more than once: An accurate detector for text of arbitrary shapes. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 10552–10561, (2019)
Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., Jia, J.: Learning shape-aware embedding for scene text detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 4234–4243, (2019)
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 9365–9374, (2019)
Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L. Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 9076–9085, (2019)
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 5, 1–1 (2019). https://doi.org/10.1109/TPAMI.2019.2937086
Article Google Scholar
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014). https://doi.org/10.1109/TPAMI.2013.182
Article Google Scholar
Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 1241–1248, (2013)
Lee, J.J., Lee, P.H., Lee, S.W., Yuille, A., Koch, C.: Adaboost for text detection in natural scene. Proc. Int. Conf. Doc. Anal. Recogn. (ICDAR) (2011). https://doi.org/10.1109/ICDAR.2011.93
Article Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. Proc. IEEE Int. Conf. Comput. Vis. (ICCV) (2011). https://doi.org/10.1109/ICCV.2011.6126402
Article Google Scholar
Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. Proc. Int. Conf. Doc. Anal. Recog. (ICDAR) (2011). https://doi.org/10.1109/ICDAR.2011.95
Article Google Scholar
Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 2359–2367, (2017)
Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: Upsnet: A unified panoptic segmentation network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 8818–8826, (2019)
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. Proc. AAAI Conf. Artif. Intell. (AAAI) 32, 6773–6780 (2018)
Google Scholar
Xue, C., Lu, S., Zhan, F.: Accurate scene text detection through border semantics awareness and bootstrapping. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 355–372, (2018)
Liu, Z., Zhou, W., Li, H.: Scene text detection with fully convolutional neural networks. Multimed. Tools Appl. 78(13), 18205–18227 (2019). https://doi.org/10.1007/s11042-019-7177-4
Article Google Scholar
He, W., Zhang, X., Yin, F., Liu, C.: Deep direct regression for multi-oriented scene text detection. Proc. IEEE Int. Conf. Comput. Vis. (ICCV) (2017). https://doi.org/10.1109/ICCV.2017.87
Article Google Scholar
He, W., Zhang, X., Yin, F., Liu, C.: Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans. Image Process. 27(11), 5406–5419 (2018). https://doi.org/10.1109/TIP.2018.2855399
Article MathSciNet Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 4159–4167, (2016)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vision Comput. 22(10), 761–767 (2004). https://doi.org/10.1016/j.imavis.2004.02.006
Article Google Scholar
Liu, Y., Jin, L., Fang, C.: Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29, 2918–2930 (2020). https://doi.org/10.1109/TIP.2019.2954218
Article Google Scholar
Shang, M., Gao, J., Sun, J.: Character region awareness network for scene text recognition. Proc. IEEE Int. Conf. Multim. Expo (ICME) (2020). https://doi.org/10.1109/ICME46284.2020.9102785
Article Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 2961–2969, (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 2980–2988, (2017a)
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 2117–2125, (2017b)
Wei, L., Dragomir, A., Dumitru, E., Christian, S., Scott, R., Chengyang, F., Alexander, B.: Ssd: Single shot multibox detector. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 21–37, (2016)
He, W., Zhang, X.Y., Yin, F., Luo, Z., Ogier, J.M., Liu, C.L.: Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognit. 98, 107026 (2020). https://doi.org/10.1016/j.patcog.2019.107026
Article Google Scholar
Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. Proc. AAAI Conf. Artif. Intell. (AAAI) 34, 12160–12167 (2020)
Google Scholar
Dai, P., Zhang, H., Cao, X.: Deep multi-scale context aware feature aggregation for curved scene text detection. IEEE Trans. Multim. 22(8), 1969–1984 (2020). https://doi.org/10.1109/TMM.2019.2952978
Article Google Scholar
Wang, F., Chen, Y., Wu, F., Li, X.: Textray: Contour-based geometric modeling for arbitrary-shaped scene text detection. In: Proc. ACM Int. Conf. Multimedia (ACM MM), pp 111–119, (2020)
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 9809–9818, (2020)
Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., Yan, Y.: Blendmask: Top-down meets bottom-up for instance segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 8573–8581, (2020)
Feng, W., Yin, F., Zhang, X.Y., He, W., Liu, C.L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. Int. J. Comput. Vision 129(3), 619–637 (2021). https://doi.org/10.1007/s11263-020-01388-x
Article Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 7291–7299, (2017)
He, K., Zhang. X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 770–778, (2016)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. Int. J. Comput. Vision 128(3), 642–656 (2020)
Article Google Scholar
Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. Proc. Int. Conf. Doc. Anal. Recog. (ICDAR) 1, 935–942 (2017). https://doi.org/10.1109/ICDAR.2017.157
Article Google Scholar
Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit. 90, 337–345 (2019). https://doi.org/10.1016/j.patcog.2019.02.002
Article Google Scholar
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: Icdar 2015 competition on robust reading. Proc. Int. Conf. Doc. Anal. Recog. (ICDAR) (2015). https://doi.org/10.1109/ICDAR.2015.7333942
Article Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 1083–1090, (2012) https://doi.org/10.1109/CVPR.2012.6247787
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 2315–2324, (2016)
Deng, J., Dong, W., Socher, R., Li, L., Li, Kai, Fei-Fei, Li: Imagenet: A large-scale hierarchical image database. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 248–255, (2009)
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Proc. Int. Conf. Learning Representations (ICLR), http://arxiv.org/abs/1412.6980
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 761–769, (2016)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe B, Matas J, Sebe N, Welling M (eds) Proc. Eur. Conf. Comput. Vis. (ECCV), Springer International Publishing, Cham, pp 56–72, (2016)
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 5676–5685, (2018)
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 7553–7563, (2018)

Download references

Author information

Authors and Affiliations

Purple Mountain Laboratories, No.9 Mozhou East Road, Jiangning District, Nanjing, China
Shuxin Qin
Institute of Automation, Chinese Academy of Sciences, No.95 Zhongguancun East Road, 100190, Beijing, China
Lin Chen

Authors

Shuxin Qin
View author publications
You can also search for this author in PubMed Google Scholar
Lin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuxin Qin.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, S., Chen, L. Arbitrary-shaped scene text detection with keypoint-based shape representation. IJDAR 25, 115–127 (2022). https://doi.org/10.1007/s10032-022-00396-6

Download citation

Received: 28 May 2021
Revised: 19 February 2022
Accepted: 19 February 2022
Published: 25 March 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10032-022-00396-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Arbitrary-shaped scene text detection with keypoint-based shape representation

Abstract

Access this article

Similar content being viewed by others

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

Accurate Arbitrary-Shaped Scene Text Detection via Iterative Polynomial Parameter Regression

Arbitrary-shaped scene text detection by predicting distance map

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Arbitrary-shaped scene text detection with keypoint-based shape representation

Abstract

Access this article

Similar content being viewed by others

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

Accurate Arbitrary-Shaped Scene Text Detection via Iterative Polynomial Parameter Regression

Arbitrary-shaped scene text detection by predicting distance map

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation