Skip to main content
Log in

An end-to-end network for irregular printed Mongolian recognition

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Mongolian is a language spoken in Inner Mongolia, China. In the recognition process, due to the shooting angle and other reasons, the image and text will be deformed, which will cause certain difficulties in recognition. This paper propose a triplet attention Mogrifier network (TAMN) for print Mongolian text recognition. The network uses a spatial transformation network to correct deformed Mongolian images. It uses gated recurrent convolution layers (GRCL) combine with triplet attention module to extract image features for the corrected images. The Mogrifier long short-term memory (LSTM) network gets the context sequence information in the feature and finally uses the decoder’s LSTM attention to get the prediction result. Experimental results show the spatial transformation network can effectively recognize deformed Mongolian images, and the recognition accuracy can reach 90.30%. This network achieves good performance in Mongolian text recognition compare with the current mainstream text recognition network. The dataset has been publicly available at https://github.com/ShaoDonCui/Mongolian-recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. China Mongolian News Network—http://www.mgyxw.net.

References

  1. Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4715–4723 (2019)

  2. Corbetta, M., Shulman, G.L.: Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3(3), 201–215 (2002)

    Article  Google Scholar 

  3. Daoerji, F., Guanglai, G.: Dnn-hmm for large vocabulary mongolian offline handwriting recognition. In: Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE, pp 72–77 (2016)

  4. Feng, W., He, W., Yin, F., Zhang, X.Y., Liu, C.L.: Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9076–9085 (2019)

  5. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp 369–376 (2006)

  6. Hu, H., Wei, H., Liu, Z.: The cnn based machine-printed traditional mongolian characters recognition. In: Proceedings of the 2017 36th Chinese Control Conference (CCC), IEEE, pp 3937–3941 (2017)

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141 (2018)

  8. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. arXiv preprint arXiv:1506.02025 (2015)

  9. Liu, W., Chen, C., Wong, K.Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. BMVC 2, 7 (2016)

    Google Scholar 

  10. Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9809–9818 (2020)

  11. Luo, C., Zhu, Y., Jin, L., Wang, Y.: Learn to augment: Joint data augmentation and network optimization for text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13746–13755 (2020)

  12. Ma, L.L., Liu, J., Wu, J.: A new database for online handwritten mongolian word recognition. In: Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp 1131–1136 (2016)

  13. Melis, G., Kočiskỳ, T., Blunsom, P.: Mogrifier lstm. arXiv preprint arXiv:1909.01792 (2019)

  14. Misra, D., Nalamada, T., Arasanipalai, A.U., Hou, Q.: Rotate to attend: Convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3139–3148 (2021)

  15. Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp 2204–2212 (2014)

  16. Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: Bam: Bottleneck attention module. arXiv preprint arXiv:1807.06514 (2018)

  17. Peng, L., Liu, C., Ding, X., Jin, J., Wu, Y., Wang, H., Bao, Y.: Multi-font printed mongolian document recognition system. Int. J. Doc. Anal. Recogn. (IJDAR) 13(2), 93–106 (2010)

    Article  Google Scholar 

  18. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016a)

    Article  Google Scholar 

  19. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176 (2016b)

  20. Su, X., Gao, G., Wei, H., Bao, F.: A knowledge-based recognition system for historical mongolian documents. Int. J. Doc. Anal. Recogn. (IJDAR) 19(3), 221–235 (2016)

    Article  Google Scholar 

  21. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164 (2017)

  22. Wang, W., Wei, H., Zhang, H.: End-to-end model based on bidirectional lstm and ctc for segmentation-free traditional mongolian recognition. In: Proceedings of the 2019 Chinese Control Conference (CCC), IEEE, pp 8723–8727 (2019)

  23. Wei, H., Gao, G.: Machine-printed traditional mongolian characters recognition using bp neural networks. In: Proceedings of the 2009 International Conference on Computational Intelligence and Software Engineering, IEEE, pp 1–7 (2009)

  24. Wei, H., Gao, G.: A holistic recognition approach for woodblock-print mongolian words based on convolutional neural network. In: Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 2726–2730 (2019)

  25. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19 (2018)

  26. Zhang, H., Wei, H., Bao, F., Gao, G.: Segmentation-free printed traditional mongolian OCR using sequence to sequence with attention model. In: Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, vol 1, pp 585–590 (2017)

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61966027 and 61966028), Inner Mongolia Autonomous Region Science and Technology Plan (Grant Nos. 2021GG0329 and 2021GG0140), National Natural Science Foundation of Inner Mongolia (Grant No. 2021MS06028). The authors wish to thank all the editors and anonymous reviewers for their constructive advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to YiLa Su.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cui, S., Su, Y., Qing dao er ji, R. et al. An end-to-end network for irregular printed Mongolian recognition. IJDAR 25, 41–50 (2022). https://doi.org/10.1007/s10032-021-00388-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-021-00388-y

Keywords

Navigation