TDAE: Text Detection with Affinity Areas and Evolution Strategies

Ma, Kefan; Luo, Yuchen; Huang, Zheng; Chen, Kai; Guo, Jie; Qiu, Weidong

doi:10.1007/978-3-031-41731-3_2

Kefan Ma¹¹,
Yuchen Luo¹¹,
Zheng Huang¹¹,
Kai Chen¹¹,
Jie Guo¹¹ &
…
Weidong Qiu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14192))

Included in the following conference series:

International Conference on Document Analysis and Recognition

381 Accesses

Abstract

Text detection in natural scenes has evolved considerably in recent years. Segmentation-based methods are widely used for text detection because they are robust to detect text of any shape. However, most previous works focus on word-level detection and neglect the regions between adjacent words, which are helpful when some text instances are very close. In this paper, we propose a novel image feature named affinity area that exploits the area between two adjacent text instances to enhance the detection capability. We design an affinity module to generate annotations based on existing word-level annotations since no open dataset supports that. By optimizing this module, our segmentation-based network TDAE can predict text regions and affinity regions through which we can obtain the final detection results. Inspired by the evolutionary strategy (ES), our network also utilizes an additional novel fine-tuning step to update the parameters by adding adaptive but random perturbations, which is quite different from the traditional gradient descent approach. Competitive results on ICDAR (2013, 2015, 2017), CTW-1500, and SynthText benchmarks further demonstrate the effectiveness of TDAE.

This work was supported by the National Natural Science Foundation of China (Grant No. 92270201).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR, pp. 9365–9374. Computer Vision Foundation/IEEE (2019)
Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: AAAI, pp. 6773–6780. AAAI Press (2018)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324. IEEE Computer Society (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: ICCV, pp. 2980–2988. IEEE Computer Society (2017)
Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: ICCV, pp. 3066–3074. IEEE Computer Society (2017)
Google Scholar
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: CVPR, pp. 5020–5029. Computer Vision Foundation/IEEE Computer Society (2018)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160. IEEE Computer Society (2015)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493. IEEE Computer Society (2013)
Google Scholar
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. 43(2), 532–548 (2021)
Article Google Scholar
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Article MathSciNet MATH Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: AAAI, pp. 4161–4167. AAAI Press (2017)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI, pp. 11474–11481. AAAI Press (2020)
Google Scholar
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944. IEEE Computer Society (2017)
Google Scholar
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685. Computer Vision Foundation/IEEE Computer Society (2018)
Google Scholar
Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: CVPR, pp. 3454–3461. IEEE Computer Society (2017)
Google Scholar
Liu, Y., Jin, L., Zhang, S., Zhang, S.: Detecting curve text in the wild: new dataset and new solution. CoRR abs/1712.02170 (2017)
Google Scholar
Long, S., et al.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Chapter Google Scholar
Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: CVPR, pp. 7553–7563. Computer Vision Foundation/IEEE Computer Society (2018)
Google Scholar
Nayef, N., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: ICDAR, pp. 1454–1459. IEEE (2017)
Google Scholar
Raisi, Z., Naiel, M.A., Younes, G., Wardell, S., Zelek, J.S.: Transformer-based text detection in the wild. In: CVPR Workshops, pp. 3162–3171. Computer Vision Foundation/IEEE (2021)
Google Scholar
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2017)
Google Scholar
Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 3482–3490. IEEE Computer Society (2017)
Google Scholar
Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28
Chapter Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR, pp. 9336–9345. Computer Vision Foundation/IEEE (2019)
Google Scholar
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: ICCV, pp. 8439–8448. IEEE (2019)
Google Scholar
Wang, Y., Xie, H., Zha, Z., Xing, M., Fu, Z., Zhang, Y.: Contournet: taking a further step toward accurate arbitrary-shaped scene text detection. In: CVPR, pp. 11750–11759. Computer Vision Foundation/IEEE (2020)
Google Scholar
Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV, pp. 9125–9135. IEEE (2019)
Google Scholar
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: CVPR, pp. 2558–2567. IEEE Computer Society (2015)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651. IEEE Computer Society (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Kefan Ma, Yuchen Luo, Zheng Huang, Kai Chen, Jie Guo & Weidong Qiu

Authors

Kefan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jie Guo
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Huang .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, K., Luo, Y., Huang, Z., Chen, K., Guo, J., Qiu, W. (2023). TDAE: Text Detection with Affinity Areas and Evolution Strategies. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14192. Springer, Cham. https://doi.org/10.1007/978-3-031-41731-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-41731-3_2
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41730-6
Online ISBN: 978-3-031-41731-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

TDAE: Text Detection with Affinity Areas and Evolution Strategies