End-to-End Scene Text Recognition Network with Adaptable Text Rectification

Zhang, Yi; Li, Zhiwen; Guo, Lei; Rao, Wenbi

doi:10.1007/978-3-030-80472-5_15

Yi Zhang⁶,
Zhiwen Li⁷,
Lei Guo⁷ &
…
Wenbi Rao⁶

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 83))

Included in the following conference series:

International Conference on Computer Science, Engineering and Education Applications

363 Accesses

Abstract

Scene text recognition has attracted wide attention of academic, since its irregular shape makes text recognition difficult. Because of the influence of angle, shape and lighting, processing perspective text and curved text still faces various problems. This paper presents an end-to-end scene text recognition network (SRATR). SRATR consists of a rectification network based on spatial transform network and an attention-based sequence recognition network. The rectification network is responsible for rectifying the irregular text, which plays a significant role in text recognition network. In addition, the training needs only scene text images and word-level annotations. The recognition network uses the encoder-decoder mechanism to extract feature sequence from the rectified text. Then we translate the feature sequence into a character sequence to output. In the decoder part, we proposed a fractional pickup method, which can eliminate the interference of noise from the text, make the decoder generate a correct region of focus and improve accuracy of text recognition. This is an end-to-end recognition network. Experiments over several of public datasets prove that SRATR has an outstanding performance in recognizing irregular text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)
Google Scholar
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)
Article Google Scholar
Xiyan, L., Gaofeng, M., Chunhong, P.: Scene text detection and recognition with advances in deep learning: a survey. Int. J. Doc. Anal. Recogn. 22(2), 143–162 (2019)
Article Google Scholar
Singh, P., Budhiraja, S.: Offline handwritten gurmukhi numeral recognition using wavelet transforms. Int. J. Mod. Educ. Comput. Sci. 4(8), 34–39 (2012)
Article Google Scholar
Lu, S., Chen, B.M., Ko, C.C.: Perspective rectification of document images using fuzzy set and morphological operations. Image Vis. Comput. 23(5), 541–553 (2005)
Article Google Scholar
Bartz, C., Yang, H., Meinel, C.: SEE: towards semi-supervised end-to-end scene text recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 6674–6681 (2018)
Google Scholar
Zhan, F., Lu, S.: ESIR: end-to-end scene text recognition via iterative image rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2059–2068 (2019)
Google Scholar
Mohamed, M.: Smart warehouse management using hybrid architecture of neural network with barcode reader 1D / 2D vision technology. Int. J. Intell. Syst. Appl. 11(11), 16–24 (2019)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. In: ICLR, pp. 1130–1150 (2015)
Google Scholar
Baoguang, S., Xiang, B., Cong, Y.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell 39(11), 2298–2304 (2017)
Article Google Scholar
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequenceto-sequence domain adaptation network for robust text image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2740–2749 (2019)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)
Article MathSciNet Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence, pp. 4161–4167 (2017)
Google Scholar
Ahmed, A.U., Masum, T.M., Rahman, M.M.: Design of an automated secure garage system using license plate recognition technique. Int. J. Intell. Syst. Appl. 6(2), 22–28 (2014)
Google Scholar
Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: NIPS, pp. 577–585 (2015)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localization in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Google Scholar
Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2687–2694 (2012)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Google Scholar
Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: IEEE International Conference on Computer Vision, pp. 569–576 (2013)
Google Scholar
Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)
Article Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Google Scholar
Liu, W., Chen, C, Wong, K.Y.K., Su, Z., Han. J.: STAR-net: a spatial attention residue network for scene text recognition. In: British Machine Vision Conference, vol. 43 1–13 (2016)
Google Scholar
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: Towards accurate text recognition in natural images. In: IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)
Google Scholar
Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: AON: towards arbitrarily-oriented text recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5571–5579 (2018)
Google Scholar
Canjie, L., Lianwen, J., Zenghui, S.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Article Google Scholar

Download references

Acknowledgment

This work was supported by National Natural Science Foundation of China (No. 61703316), Funding Project of Postgraduate Joint Training Base of WHUT and CSEPDI.

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430070, China
Yi Zhang & Wenbi Rao
PipeChina West East Gas Pipeline Company, Wuhan, 430074, China
Zhiwen Li & Lei Guo

Authors

Yi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwen Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Wenbi Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenbi Rao .

Editor information

Editors and Affiliations

School of Educational Information Technology, Central China Normal University, Wuhan, China
Zhengbing Hu
Mechanical Engineering Research Institute of the Russian Academy of Sciences, Moscow, Russia
Sergey Petoukhov
Faculty of Applied Mathematics, National Technical University of Ukraine “Igor Sikorsky Kiev Polytechnic Institute”, Kiev, Ukraine
Ivan Dychka
Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Plantation, FL, USA
Matthew He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Li, Z., Guo, L., Rao, W. (2021). End-to-End Scene Text Recognition Network with Adaptable Text Rectification. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education IV. ICCSEEA 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-030-80472-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-80472-5_15
Published: 21 July 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80471-8
Online ISBN: 978-3-030-80472-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics