Abstract
Scene text recognition has attracted wide attention of academic, since its irregular shape makes text recognition difficult. Because of the influence of angle, shape and lighting, processing perspective text and curved text still faces various problems. This paper presents an end-to-end scene text recognition network (SRATR). SRATR consists of a rectification network based on spatial transform network and an attention-based sequence recognition network. The rectification network is responsible for rectifying the irregular text, which plays a significant role in text recognition network. In addition, the training needs only scene text images and word-level annotations. The recognition network uses the encoder-decoder mechanism to extract feature sequence from the rectified text. Then we translate the feature sequence into a character sequence to output. In the decoder part, we proposed a fractional pickup method, which can eliminate the interference of noise from the text, make the decoder generate a correct region of focus and improve accuracy of text recognition. This is an end-to-end recognition network. Experiments over several of public datasets prove that SRATR has an outstanding performance in recognizing irregular text.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)
Xiyan, L., Gaofeng, M., Chunhong, P.: Scene text detection and recognition with advances in deep learning: a survey. Int. J. Doc. Anal. Recogn. 22(2), 143–162 (2019)
Singh, P., Budhiraja, S.: Offline handwritten gurmukhi numeral recognition using wavelet transforms. Int. J. Mod. Educ. Comput. Sci. 4(8), 34–39 (2012)
Lu, S., Chen, B.M., Ko, C.C.: Perspective rectification of document images using fuzzy set and morphological operations. Image Vis. Comput. 23(5), 541–553 (2005)
Bartz, C., Yang, H., Meinel, C.: SEE: towards semi-supervised end-to-end scene text recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 6674–6681 (2018)
Zhan, F., Lu, S.: ESIR: end-to-end scene text recognition via iterative image rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2059–2068 (2019)
Mohamed, M.: Smart warehouse management using hybrid architecture of neural network with barcode reader 1D / 2D vision technology. Int. J. Intell. Syst. Appl. 11(11), 16–24 (2019)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. In: ICLR, pp. 1130–1150 (2015)
Baoguang, S., Xiang, B., Cong, Y.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell 39(11), 2298–2304 (2017)
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequenceto-sequence domain adaptation network for robust text image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2740–2749 (2019)
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence, pp. 4161–4167 (2017)
Ahmed, A.U., Masum, T.M., Rahman, M.M.: Design of an automated secure garage system using license plate recognition technique. Int. J. Intell. Syst. Appl. 6(2), 22–28 (2014)
Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: NIPS, pp. 577–585 (2015)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localization in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2687–2694 (2012)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: IEEE International Conference on Computer Vision, pp. 569–576 (2013)
Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)
Liu, W., Chen, C, Wong, K.Y.K., Su, Z., Han. J.: STAR-net: a spatial attention residue network for scene text recognition. In: British Machine Vision Conference, vol. 43 1–13 (2016)
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: Towards accurate text recognition in natural images. In: IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)
Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: AON: towards arbitrarily-oriented text recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5571–5579 (2018)
Canjie, L., Lianwen, J., Zenghui, S.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Acknowledgment
This work was supported by National Natural Science Foundation of China (No. 61703316), Funding Project of Postgraduate Joint Training Base of WHUT and CSEPDI.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Li, Z., Guo, L., Rao, W. (2021). End-to-End Scene Text Recognition Network with Adaptable Text Rectification. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education IV. ICCSEEA 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-030-80472-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-80472-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80471-8
Online ISBN: 978-3-030-80472-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)