Skip to main content

Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12823))

Included in the following conference series:

Abstract

In this paper, we build the deep neural networks for offline handwritten Chinese text recognition (HCTR) with only convolutional layers and one of the mainstream learning methods Connectionist Temporal Classification (CTC). Iteratively, different configurations of network architectures with residual and squeeze-and-excitation structures are explored. We ease the serious overfitting issue by applying high dropout rate 0.9 at the input of the last classification layer, and synthesize new text samples with isolated characters by reusing the character-level bounding boxes from the CASIA-HWDB train set. These empirical and intuitive tricks help us achieve the character error rate (CER) at 6.38% on the ICDAR2013 competition set. To further improve the performance, at each step of the CTC decoding, we propose a novel context beam search (CBS) algorithm, which conducts decoding from both the prediction of the basic visual model and another customized transformer-based language model simultaneously. The final CER is reduced to 2.49%. Code will be available online at https://github.com/intel/handwritten-chinese-ocr.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, C.-L., et al.: CASIA online and offline Chinese handwriting databases. In: 2011 International Conference on Document Analysis and Recognition. IEEE (2011)

    Google Scholar 

  2. Breuel, T.M., et al.: High-performance OCR for printed English and Fraktur using LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition. IEEE (2013)

    Google Scholar 

  3. Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2016)

    Google Scholar 

  4. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  5. Breuel, T.M.: High performance text recognition using a hybrid convolutional-LSTM implementation. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 11–16. IEEE (2017)

    Google Scholar 

  6. Busta, M., Neumann, L., Matas, J.: Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

    Google Scholar 

  7. Peng, D., et al.: A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)

    Google Scholar 

  8. Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. arXiv preprint arXiv:1812.11894 (2018)

  9. Reeve Ingle, R., et al.: A scalable handwritten text recognition system. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)

    Google Scholar 

  10. Liu, B., Xu, X., Zhang, Y.: Offline handwritten Chinese text recognition with convolutional neural networks. arXiv preprint arXiv:2006.15619 (2020)

  11. Ptucha, R., et al.: Intelligent character recognition using fully convolutional neural networks. Pattern Recogn. 88, 604–613 (2019)

    Article  Google Scholar 

  12. Sheng, F., Chen, Z., Xu, B.: NRTR: a no-recurrence sequence-to-sequence model for scene text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)

    Google Scholar 

  13. Lu, N., et al.: Master: multi-aspect non-local network for scene text recognition. arXiv preprint arXiv:1910.02562 (2019)

  14. Wu, Y.-C., Yin, F., Liu, C.-L.: Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recogn. 65, 251–264 (2017)

    Article  Google Scholar 

  15. Wang, Z.-R., Du, J., Wang, J.-M.: Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition. Pattern Recogn. 100, 107102 (2020)

    Article  Google Scholar 

  16. Du, J., et al.: Deep neural network based hidden Markov model for offline handwritten Chinese text recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE (2016)

    Google Scholar 

  17. Wang, Z.-R., Du, J., Wang, W.-C., Zhai, J.-F., Hu, J.-S.: A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 21(4), 241–251 (2018). https://doi.org/10.1007/s10032-018-0307-0

    Article  Google Scholar 

  18. Wang, W., Du, J., Wang, Z.-R.: Parsimonious HMMS for offline handwritten Chinese text recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2018)

    Google Scholar 

  19. Messina, R., Louradour, J.: Segmentation-free handwritten Chinese text recognition with LSTM-RNN. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE (2015)

    Google Scholar 

  20. Xie, C., Lai, S., Liao, Q., Jin, L.: High performance offline handwritten Chinese text recognition with a new data preprocessing and augmentation pipeline. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 45–59. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_4

    Chapter  Google Scholar 

  21. Xiu, Y., et al.: A handwritten Chinese text recognizer applying multi-level multimodal fusion network. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)

    Google Scholar 

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  24. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  25. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  26. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)

  27. Zhang, S., et al.: Spelling error correction with soft-masked BERT. arXiv preprint arXiv:2005.07421 (2020)

  28. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

  29. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)

    Google Scholar 

  30. Graves, A., Fernández, S., Gomez,F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

    Google Scholar 

  31. Hannun, A.: Sequence modeling with CTC. Distill 2(11), e8 (2017)

    Article  Google Scholar 

  32. Weng, L.: Generalized Language Models. http://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html (2019)

  33. Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)

  34. Hannun, A.Y., et al.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs. arXiv preprint arXiv:1408.2873 (2014)

  35. Xu, L.: NLP Chinese corpus: large scale Chinese corpus for NLP (2019). https://doi.org/10.5281/zenodo.3402033

  36. Naren, S.: PyTorch bindings for Warp-CTC. https://github.com/SeanNaren/warp-ctc

  37. Micikevicius, P., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)

  38. Yin, F., et al.: ICDAR 2013 Chinese handwriting recognition competition. In: 2013 12th International Conference on Document Analysis and Recognition. IEEE (2013)

    Google Scholar 

  39. Wang, S., et al.: Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2016)

    Google Scholar 

  40. Wu, Y.-C., et al.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, B., Sun, W., Kang, W., Xu, X. (2021). Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86334-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86333-3

  • Online ISBN: 978-3-030-86334-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics