Abstract
Text detection in natural scenes has evolved considerably in recent years. Segmentation-based methods are widely used for text detection because they are robust to detect text of any shape. However, most previous works focus on word-level detection and neglect the regions between adjacent words, which are helpful when some text instances are very close. In this paper, we propose a novel image feature named affinity area that exploits the area between two adjacent text instances to enhance the detection capability. We design an affinity module to generate annotations based on existing word-level annotations since no open dataset supports that. By optimizing this module, our segmentation-based network TDAE can predict text regions and affinity regions through which we can obtain the final detection results. Inspired by the evolutionary strategy (ES), our network also utilizes an additional novel fine-tuning step to update the parameters by adding adaptive but random perturbations, which is quite different from the traditional gradient descent approach. Competitive results on ICDAR (2013, 2015, 2017), CTW-1500, and SynthText benchmarks further demonstrate the effectiveness of TDAE.
This work was supported by the National Natural Science Foundation of China (Grant No. 92270201).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374. Computer Vision Foundation/IEEE (2019)
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: AAAI, pp. 6773–6780. AAAI Press (2018)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324. IEEE Computer Society (2016)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE Computer Society (2017)
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: ICCV, pp. 3066–3074. IEEE Computer Society (2017)
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: CVPR, pp. 5020–5029. Computer Vision Foundation/IEEE Computer Society (2018)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160. IEEE Computer Society (2015)
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493. IEEE Computer Society (2013)
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 532–548 (2021)
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4161–4167. AAAI Press (2017)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI, pp. 11474–11481. AAAI Press (2020)
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944. IEEE Computer Society (2017)
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685. Computer Vision Foundation/IEEE Computer Society (2018)
Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: CVPR, pp. 3454–3461. IEEE Computer Society (2017)
Liu, Y., Jin, L., Zhang, S., Zhang, S.: Detecting curve text in the wild: new dataset and new solution. CoRR abs/1712.02170 (2017)
Long, S., et al.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: CVPR, pp. 7553–7563. Computer Vision Foundation/IEEE Computer Society (2018)
Nayef, N., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: ICDAR, pp. 1454–1459. IEEE (2017)
Raisi, Z., Naiel, M.A., Younes, G., Wardell, S., Zelek, J.S.: Transformer-based text detection in the wild. In: CVPR Workshops, pp. 3162–3171. Computer Vision Foundation/IEEE (2021)
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2017)
Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 3482–3490. IEEE Computer Society (2017)
Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9336–9345. Computer Vision Foundation/IEEE (2019)
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: ICCV, pp. 8439–8448. IEEE (2019)
Wang, Y., Xie, H., Zha, Z., Xing, M., Fu, Z., Zhang, Y.: Contournet: taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11750–11759. Computer Vision Foundation/IEEE (2020)
Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9125–9135. IEEE (2019)
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: CVPR, pp. 2558–2567. IEEE Computer Society (2015)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651. IEEE Computer Society (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, K., Luo, Y., Huang, Z., Chen, K., Guo, J., Qiu, W. (2023). TDAE: Text Detection with Affinity Areas and Evolution Strategies. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14192. Springer, Cham. https://doi.org/10.1007/978-3-031-41731-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-41731-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41730-6
Online ISBN: 978-3-031-41731-3
eBook Packages: Computer ScienceComputer Science (R0)