Abstract
A number of scene text in natural images have irregular shapes which often cause significant difficulties for a text detector. In this paper, we propose a robust scene text detection method based on a parameterized shape modeling and regression scheme for text with arbitrary shapes. The shape model geometrically depicts a text region with a polynomial centerline and a series of width cues to capture global shape characteristics (e.g.. smoothness) and local shapes of the text respectively for accurate text localization, which differs from previous text region modeling schemes based on discrete boundary points or pixels. We further propose a text detection network PolyPRNet equipped with an iterative regression module for text’s shape parameters, which effectively enhances the detection accuracy of arbitrary-shaped text. Our method achieves state-of-the-art text detection results on several standard benchmarks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)
Yin, X., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36, 970–983 (2014)
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: ICCV, pp. 4651–4659 (2015)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9336–9345 (2019)
Zhang, C., et al.: Look more than once: an accurate detector for text of arbitrary shapes. In: CVPR, pp. 10544–10553 (2019)
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374 (2019)
Wang, X., Jiang, Y., Luo, Z., Liu, C., Choi, H., Kim, S.: Arbitrary shape scene text detection with adaptive text region representation. In: CVPR, pp. 6449–6458 (2019)
Sun, L., Huo, Q., Jia, W., Chen, K.: A robust approach for text detection from natural scene images. Pattern Recogn. 48, 2906–2920 (2015)
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: CVPR, pp. 2558–2567 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Wu, Y., Natarajan, P.: Self-organized text detection with minimal post-processing via border learning. In: ICCV, pp. 5010–5019 (2017)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4161–4167 (2017)
He, W., Zhang, X., Yin, F., Liu, C.: Multi-oriented and multi-lingual scene text detection with direct regression. IEEE Trans. Image Process. 27, 5406–5419 (2018)
Xie, E., Zang, Y., Shao, S., Yu, G., Yao, C., Li, G.: Scene text detection with supervised pyramid context network. In: AAAI, pp. 9038–9045 (2019)
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: ICCV, pp. 8439–8448 (2019)
Zhan, F., Lu, S.: ESIR: end-to-end scene text recognition via iterative image rectification. In: CVPR, pp. 2054–2063 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)
Girshick, R.B., Iandola, F.N., Darrell, T., Malik, J.: Deformable part models are convolutional neural networks. In: CVPR, pp. 437–446 (2015)
Chng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: ICDAR, pp. 935–942 (2017)
Liu, Y., Jin, L., Zhang, S., Zhang, S.: Detecting curve text in the wild: new dataset and new solution. CoRR abs/1712.02170 (2017)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
Nayef, N., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: ICDAR, pp. 1454–1459 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS, pp. 1106–1114 (2012)
Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 3482–3490 (2017)
Feng, W., He, W., Yin, F., Zhang, X., Liu, C.: TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: ICCV, pp. 9075–9084 (2019)
Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Goh, W.L.: Learning Markov clustering networks for scene text detection. In: CVPR, pp. 6936–6944 (2018)
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: CVPR, pp. 7553–7563 (2018)
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: AAAI, pp. 6773–6780 (2018)
Yang, Q., Cheng, M., Zhou, W., Chen, Y., Qiu, M., Lin, W.: IncepText: a new inception-text module with deformable PSROI pooling for multi-oriented scene text detection. In: IJCAI, pp. 1071–1077 (2018)
Liao, M., Zhu, Z., Shi, B., Xia, G.s., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: CVPR, pp. 5909–5918 (2018)
Bušta, M., Patel, Y., Matas, J.: E2E-MLT - an unconstrained end-to-end method for multi-language scene text. In: Carneiro, G., You, S. (eds.) ACCV 2018. LNCS, vol. 11367, pp. 127–143. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21074-8_11
Zhong, Z., Sun, L., Huo, Q.: An anchor-free region proposal network for Faster R-CNN-based text detection approaches. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 315–327 (2019). https://doi.org/10.1007/s10032-019-00335-y
Acknowledgments
The research was supported by the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20171345 and the National Natural Science Foundation of China under Grant Nos. 61003113, 61321491, and 61672273.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Shi, J., Chen, L., Su, F. (2021). Accurate Arbitrary-Shaped Scene Text Detection via Iterative Polynomial Parameter Regression. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12624. Springer, Cham. https://doi.org/10.1007/978-3-030-69535-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-69535-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69534-7
Online ISBN: 978-3-030-69535-4
eBook Packages: Computer ScienceComputer Science (R0)