Skip to main content

End-to-End Scene Text Recognition Network with Adaptable Text Rectification

  • Conference paper
  • First Online:
Advances in Computer Science for Engineering and Education IV (ICCSEEA 2021)

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 83))

  • 363 Accesses

Abstract

Scene text recognition has attracted wide attention of academic, since its irregular shape makes text recognition difficult. Because of the influence of angle, shape and lighting, processing perspective text and curved text still faces various problems. This paper presents an end-to-end scene text recognition network (SRATR). SRATR consists of a rectification network based on spatial transform network and an attention-based sequence recognition network. The rectification network is responsible for rectifying the irregular text, which plays a significant role in text recognition network. In addition, the training needs only scene text images and word-level annotations. The recognition network uses the encoder-decoder mechanism to extract feature sequence from the rectified text. Then we translate the feature sequence into a character sequence to output. In the decoder part, we proposed a fractional pickup method, which can eliminate the interference of noise from the text, make the decoder generate a correct region of focus and improve accuracy of text recognition. This is an end-to-end recognition network. Experiments over several of public datasets prove that SRATR has an outstanding performance in recognizing irregular text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)

    Google Scholar 

  2. Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)

    Article  Google Scholar 

  3. Xiyan, L., Gaofeng, M., Chunhong, P.: Scene text detection and recognition with advances in deep learning: a survey. Int. J. Doc. Anal. Recogn. 22(2), 143–162 (2019)

    Article  Google Scholar 

  4. Singh, P., Budhiraja, S.: Offline handwritten gurmukhi numeral recognition using wavelet transforms. Int. J. Mod. Educ. Comput. Sci. 4(8), 34–39 (2012)

    Article  Google Scholar 

  5. Lu, S., Chen, B.M., Ko, C.C.: Perspective rectification of document images using fuzzy set and morphological operations. Image Vis. Comput. 23(5), 541–553 (2005)

    Article  Google Scholar 

  6. Bartz, C., Yang, H., Meinel, C.: SEE: towards semi-supervised end-to-end scene text recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 6674–6681 (2018)

    Google Scholar 

  7. Zhan, F., Lu, S.: ESIR: end-to-end scene text recognition via iterative image rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2059–2068 (2019)

    Google Scholar 

  8. Mohamed, M.: Smart warehouse management using hybrid architecture of neural network with barcode reader 1D / 2D vision technology. Int. J. Intell. Syst. Appl. 11(11), 16–24 (2019)

    Google Scholar 

  9. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition. In: ICLR, pp. 1130–1150 (2015)

    Google Scholar 

  10. Baoguang, S., Xiang, B., Cong, Y.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  11. Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequenceto-sequence domain adaptation network for robust text image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2740–2749 (2019)

    Google Scholar 

  12. Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)

    Article  MathSciNet  Google Scholar 

  13. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence, pp. 4161–4167 (2017)

    Google Scholar 

  14. Ahmed, A.U., Masum, T.M., Rahman, M.M.: Design of an automated secure garage system using license plate recognition technique. Int. J. Intell. Syst. Appl. 6(2), 22–28 (2014)

    Google Scholar 

  15. Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: NIPS, pp. 577–585 (2015)

    Google Scholar 

  16. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localization in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)

    Google Scholar 

  17. Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2687–2694 (2012)

    Google Scholar 

  18. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)

    Google Scholar 

  19. Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: IEEE International Conference on Computer Vision, pp. 569–576 (2013)

    Google Scholar 

  20. Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)

    Article  Google Scholar 

  21. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176 (2016)

    Google Scholar 

  22. Liu, W., Chen, C, Wong, K.Y.K., Su, Z., Han. J.: STAR-net: a spatial attention residue network for scene text recognition. In: British Machine Vision Conference, vol. 43 1–13 (2016)

    Google Scholar 

  23. Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: Towards accurate text recognition in natural images. In: IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)

    Google Scholar 

  24. Cheng, Z., Xu, Y., Bai, F., Niu, Y., Pu, S., Zhou, S.: AON: towards arbitrarily-oriented text recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5571–5579 (2018)

    Google Scholar 

  25. Canjie, L., Lianwen, J., Zenghui, S.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported by National Natural Science Foundation of China (No. 61703316), Funding Project of Postgraduate Joint Training Base of WHUT and CSEPDI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenbi Rao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, Y., Li, Z., Guo, L., Rao, W. (2021). End-to-End Scene Text Recognition Network with Adaptable Text Rectification. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education IV. ICCSEEA 2021. Lecture Notes on Data Engineering and Communications Technologies, vol 83. Springer, Cham. https://doi.org/10.1007/978-3-030-80472-5_15

Download citation

Publish with us

Policies and ethics