Skip to main content
Log in

Arbitrary-shaped scene text detection with keypoint-based shape representation

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Recently scene text detection has become a hot research topic. Arbitrary-shaped text detection is more challenging due to the irregular geometry of the texts such as long curved shapes. Most existing works attempt to solve the problem by using bottom-up methods, followed by heuristic post-processing, or top-down methods with boundary regression. Through analysis and comparison, we present an efficient framework to detect arbitrary-shaped text by fusing bottom-up and top-down methods. Specifically, we use a segmentation method as the bottom-up detector to regress the text areas. We employ an anchor-free method as the top-down detector to represent and distinguish each text based on the results of bottom-up detector. To detect text with arbitrary shapes, we propose a keypoint-based shape representation method, which treats a text as several keypoints linked together. Then, keypoints are regressed by the top-down detector. With the keypoint-based shape representation, the detected text can be easily rectified by Thin Plate Spline (TPS) transformation, and the framework can be directly extended to support end-to-end text spotting. Extensive experiments on several public benchmarks, including both regular-shaped and arbitrary-shaped scene texts in natural images, demonstrate that our method has achieved state-of-the-art performance .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 779–788, (2016)

  2. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  3. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv:1904.07850 (2019)

  4. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keypoint triplets for object detection. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 6569–6578, (2019)

  5. Long, S., He, X., Yao, C.: Scene text detection and recognition: te deep learning era. Int. J. Comput. Vision 129(1), 161–184 (2021)

    Article  Google Scholar 

  6. Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015). https://doi.org/10.1109/TPAMI.2014.2366765

    Article  Google Scholar 

  7. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 3431–3440, (2015)

  8. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 19–35, (2018)

  9. Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 9336–9345, (2019a)

  10. Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 8440–8449, (2019b)

  11. Wang, X., Jiang, Y., Luo, Z., Liu, C.L., Choi, H, Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 6449–6458, (2019c)

  12. Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process 28(11), 5566–5579 (2019). https://doi.org/10.1109/TIP.2019.2900589

    Article  MathSciNet  MATH  Google Scholar 

  13. Liu, Z., Lin, G., Goh, W.L.: Bottom-up scene text detection with markov clustering networks. Int. J. Comput. Vision 128(6), 1786–1809 (2020). https://doi.org/10.1007/s11263-020-01298-y

    Article  MathSciNet  MATH  Google Scholar 

  14. Zhu, Y., Du, J.: Textmountain: accurate scene text detection via instance segmentation. Pattern Recognit. (2021). https://doi.org/10.1016/j.patcog.2020.107336

    Article  Google Scholar 

  15. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: An efficient and accurate scene text detector. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 5551–5560, (2017)

  16. Yang, Q., Cheng, M., Zhou, W., Chen, Y., Qiu, M., Lin, W.: Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. In: Proc. Int. Joint Conf. Artif. Intell. (IJCAI), pp 1071–1077, (2018)

  17. Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018). https://doi.org/10.1109/TIP.2018.2825107

    Article  MathSciNet  MATH  Google Scholar 

  18. Liu, X., Zhang, R., Zhou, Y., Wang, D.: Scene text detection with feature pyramid network and linking segments. In: Proc. Int. Conf. Doc. Anal. Recog. (ICDAR), pp 508–513, (2019) https://doi.org/10.1109/ICDAR.2019.00087

  19. Huang, Z., Zhong, Z., Sun, L., Huo, Q.: Mask r-cnn with pyramid attention network for scene text detection. In: Proc. IEEE Winter Conf. Applications of Computer Vision (WACV), pp 764–772, (2019) https://doi.org/10.1109/WACV.2019.00086

  20. Xie, E., Zang, Y., Shao, S., Yu, G., Yao, C., Li, G.: Scene text detection with supervised pyramid context network. Proc. AAAI Conf. Artif. Intell. (AAAI) 33(01), 9038–9045 (2019). https://doi.org/10.1609/aaai.v33i01.33019038

    Article  Google Scholar 

  21. Liu, Z., Lin, G., Yang, S., Liu, F., Lin, W., Goh, W.L.: Towards robust curve text detection with conditional spatial expansion. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 7269–7278, (2019)

  22. Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., Ding, X.: Look more than once: An accurate detector for text of arbitrary shapes. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 10552–10561, (2019)

  23. Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., Jia, J.: Learning shape-aware embedding for scene text detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 4234–4243, (2019)

  24. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 9365–9374, (2019)

  25. Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L. Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 9076–9085, (2019)

  26. Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 5, 1–1 (2019). https://doi.org/10.1109/TPAMI.2019.2937086

    Article  Google Scholar 

  27. Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014). https://doi.org/10.1109/TPAMI.2013.182

    Article  Google Scholar 

  28. Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 1241–1248, (2013)

  29. Lee, J.J., Lee, P.H., Lee, S.W., Yuille, A., Koch, C.: Adaboost for text detection in natural scene. Proc. Int. Conf. Doc. Anal. Recogn. (ICDAR) (2011). https://doi.org/10.1109/ICDAR.2011.93

    Article  Google Scholar 

  30. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. Proc. IEEE Int. Conf. Comput. Vis. (ICCV) (2011). https://doi.org/10.1109/ICCV.2011.6126402

    Article  Google Scholar 

  31. Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. Proc. Int. Conf. Doc. Anal. Recog. (ICDAR) (2011). https://doi.org/10.1109/ICDAR.2011.95

    Article  Google Scholar 

  32. Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 2359–2367, (2017)

  33. Xiong, Y., Liao, R., Zhao, H., Hu, R., Bai, M., Yumer, E., Urtasun, R.: Upsnet: A unified panoptic segmentation network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 8818–8826, (2019)

  34. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. Proc. AAAI Conf. Artif. Intell. (AAAI) 32, 6773–6780 (2018)

    Google Scholar 

  35. Xue, C., Lu, S., Zhan, F.: Accurate scene text detection through border semantics awareness and bootstrapping. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 355–372, (2018)

  36. Liu, Z., Zhou, W., Li, H.: Scene text detection with fully convolutional neural networks. Multimed. Tools Appl. 78(13), 18205–18227 (2019). https://doi.org/10.1007/s11042-019-7177-4

    Article  Google Scholar 

  37. He, W., Zhang, X., Yin, F., Liu, C.: Deep direct regression for multi-oriented scene text detection. Proc. IEEE Int. Conf. Comput. Vis. (ICCV) (2017). https://doi.org/10.1109/ICCV.2017.87

    Article  Google Scholar 

  38. He, W., Zhang, X., Yin, F., Liu, C.: Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans. Image Process. 27(11), 5406–5419 (2018). https://doi.org/10.1109/TIP.2018.2855399

    Article  MathSciNet  Google Scholar 

  39. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 4159–4167, (2016)

  40. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vision Comput. 22(10), 761–767 (2004). https://doi.org/10.1016/j.imavis.2004.02.006

    Article  Google Scholar 

  41. Liu, Y., Jin, L., Fang, C.: Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29, 2918–2930 (2020). https://doi.org/10.1109/TIP.2019.2954218

    Article  Google Scholar 

  42. Shang, M., Gao, J., Sun, J.: Character region awareness network for scene text recognition. Proc. IEEE Int. Conf. Multim. Expo (ICME) (2020). https://doi.org/10.1109/ICME46284.2020.9102785

    Article  Google Scholar 

  43. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask r-cnn. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 2961–2969, (2017)

  44. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp 2980–2988, (2017a)

  45. Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 2117–2125, (2017b)

  46. Wei, L., Dragomir, A., Dumitru, E., Christian, S., Scott, R., Chengyang, F., Alexander, B.: Ssd: Single shot multibox detector. In: Proc. Eur. Conf. Comput. Vis. (ECCV), pp 21–37, (2016)

  47. He, W., Zhang, X.Y., Yin, F., Luo, Z., Ogier, J.M., Liu, C.L.: Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognit. 98, 107026 (2020). https://doi.org/10.1016/j.patcog.2019.107026

    Article  Google Scholar 

  48. Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: Toward arbitrary-shaped text spotting. Proc. AAAI Conf. Artif. Intell. (AAAI) 34, 12160–12167 (2020)

    Google Scholar 

  49. Dai, P., Zhang, H., Cao, X.: Deep multi-scale context aware feature aggregation for curved scene text detection. IEEE Trans. Multim. 22(8), 1969–1984 (2020). https://doi.org/10.1109/TMM.2019.2952978

    Article  Google Scholar 

  50. Wang, F., Chen, Y., Wu, F., Li, X.: Textray: Contour-based geometric modeling for arbitrary-shaped scene text detection. In: Proc. ACM Int. Conf. Multimedia (ACM MM), pp 111–119, (2020)

  51. Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 9809–9818, (2020)

  52. Chen, H., Sun, K., Tian, Z., Shen, C., Huang, Y., Yan, Y.: Blendmask: Top-down meets bottom-up for instance segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 8573–8581, (2020)

  53. Feng, W., Yin, F., Zhang, X.Y., He, W., Liu, C.L.: Residual dual scale scene text spotting by fusing bottom-up and top-down processing. Int. J. Comput. Vision 129(3), 619–637 (2021). https://doi.org/10.1007/s11263-020-01388-x

    Article  Google Scholar 

  54. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 7291–7299, (2017)

  55. He, K., Zhang. X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 770–778, (2016)

  56. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. Int. J. Comput. Vision 128(3), 642–656 (2020)

    Article  Google Scholar 

  57. Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. Proc. Int. Conf. Doc. Anal. Recog. (ICDAR) 1, 935–942 (2017). https://doi.org/10.1109/ICDAR.2017.157

    Article  Google Scholar 

  58. Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit. 90, 337–345 (2019). https://doi.org/10.1016/j.patcog.2019.02.002

    Article  Google Scholar 

  59. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: Icdar 2015 competition on robust reading. Proc. Int. Conf. Doc. Anal. Recog. (ICDAR) (2015). https://doi.org/10.1109/ICDAR.2015.7333942

    Article  Google Scholar 

  60. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 1083–1090, (2012) https://doi.org/10.1109/CVPR.2012.6247787

  61. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 2315–2324, (2016)

  62. Deng, J., Dong, W., Socher, R., Li, L., Li, Kai, Fei-Fei, Li: Imagenet: A large-scale hierarchical image database. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 248–255, (2009)

  63. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Proc. Int. Conf. Learning Representations (ICLR), http://arxiv.org/abs/1412.6980

  64. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 761–769, (2016)

  65. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe B, Matas J, Sebe N, Welling M (eds) Proc. Eur. Conf. Comput. Vis. (ECCV), Springer International Publishing, Cham, pp 56–72, (2016)

  66. Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: Fots: Fast oriented text spotting with a unified network. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 5676–5685, (2018)

  67. Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pp 7553–7563, (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuxin Qin.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, S., Chen, L. Arbitrary-shaped scene text detection with keypoint-based shape representation. IJDAR 25, 115–127 (2022). https://doi.org/10.1007/s10032-022-00396-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-022-00396-6

Keywords

Navigation