Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition

Liu, Brian; Sun, Weicong; Kang, Wenjing; Xu, Xianchao

doi:10.1007/978-3-030-86334-0_18

Brian Liu¹¹,
Weicong Sun¹¹,
Wenjing Kang¹¹ &
…
Xianchao Xu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12823))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3708 Accesses
7 Citations

Abstract

In this paper, we build the deep neural networks for offline handwritten Chinese text recognition (HCTR) with only convolutional layers and one of the mainstream learning methods Connectionist Temporal Classification (CTC). Iteratively, different configurations of network architectures with residual and squeeze-and-excitation structures are explored. We ease the serious overfitting issue by applying high dropout rate 0.9 at the input of the last classification layer, and synthesize new text samples with isolated characters by reusing the character-level bounding boxes from the CASIA-HWDB train set. These empirical and intuitive tricks help us achieve the character error rate (CER) at 6.38% on the ICDAR2013 competition set. To further improve the performance, at each step of the CTC decoding, we propose a novel context beam search (CBS) algorithm, which conducts decoding from both the prediction of the basic visual model and another customized transformer-based language model simultaneously. The final CER is reduced to 2.49%. Code will be available online at https://github.com/intel/handwritten-chinese-ocr.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liu, C.-L., et al.: CASIA online and offline Chinese handwriting databases. In: 2011 International Conference on Document Analysis and Recognition. IEEE (2011)
Google Scholar
Breuel, T.M., et al.: High-performance OCR for printed English and Fraktur using LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition. IEEE (2013)
Google Scholar
Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2016)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Breuel, T.M.: High performance text recognition using a hybrid convolutional-LSTM implementation. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 11–16. IEEE (2017)
Google Scholar
Busta, M., Neumann, L., Matas, J.: Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Google Scholar
Peng, D., et al.: A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)
Google Scholar
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. arXiv preprint arXiv:1812.11894 (2018)
Reeve Ingle, R., et al.: A scalable handwritten text recognition system. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)
Google Scholar
Liu, B., Xu, X., Zhang, Y.: Offline handwritten Chinese text recognition with convolutional neural networks. arXiv preprint arXiv:2006.15619 (2020)
Ptucha, R., et al.: Intelligent character recognition using fully convolutional neural networks. Pattern Recogn. 88, 604–613 (2019)
Article Google Scholar
Sheng, F., Chen, Z., Xu, B.: NRTR: a no-recurrence sequence-to-sequence model for scene text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)
Google Scholar
Lu, N., et al.: Master: multi-aspect non-local network for scene text recognition. arXiv preprint arXiv:1910.02562 (2019)
Wu, Y.-C., Yin, F., Liu, C.-L.: Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recogn. 65, 251–264 (2017)
Article Google Scholar
Wang, Z.-R., Du, J., Wang, J.-M.: Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition. Pattern Recogn. 100, 107102 (2020)
Article Google Scholar
Du, J., et al.: Deep neural network based hidden Markov model for offline handwritten Chinese text recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE (2016)
Google Scholar
Wang, Z.-R., Du, J., Wang, W.-C., Zhai, J.-F., Hu, J.-S.: A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 21(4), 241–251 (2018). https://doi.org/10.1007/s10032-018-0307-0
Article Google Scholar
Wang, W., Du, J., Wang, Z.-R.: Parsimonious HMMS for offline handwritten Chinese text recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2018)
Google Scholar
Messina, R., Louradour, J.: Segmentation-free handwritten Chinese text recognition with LSTM-RNN. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE (2015)
Google Scholar
Xie, C., Lai, S., Liao, Q., Jin, L.: High performance offline handwritten Chinese text recognition with a new data preprocessing and augmentation pipeline. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 45–59. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_4
Chapter Google Scholar
Xiu, Y., et al.: A handwritten Chinese text recognizer applying multi-level multimodal fusion network. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Zhang, S., et al.: Spelling error correction with soft-masked BERT. arXiv preprint arXiv:2005.07421 (2020)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
Google Scholar
Graves, A., Fernández, S., Gomez,F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Google Scholar
Hannun, A.: Sequence modeling with CTC. Distill 2(11), e8 (2017)
Article Google Scholar
Weng, L.: Generalized Language Models. http://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html (2019)
Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)
Hannun, A.Y., et al.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs. arXiv preprint arXiv:1408.2873 (2014)
Xu, L.: NLP Chinese corpus: large scale Chinese corpus for NLP (2019). https://doi.org/10.5281/zenodo.3402033
Naren, S.: PyTorch bindings for Warp-CTC. https://github.com/SeanNaren/warp-ctc
Micikevicius, P., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
Yin, F., et al.: ICDAR 2013 Chinese handwriting recognition competition. In: 2013 12th International Conference on Document Analysis and Recognition. IEEE (2013)
Google Scholar
Wang, S., et al.: Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2016)
Google Scholar
Wu, Y.-C., et al.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel Flex, Beijing, China
Brian Liu, Weicong Sun, Wenjing Kang & Xianchao Xu

Authors

Brian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weicong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wenjing Kang
View author publications
You can also search for this author in PubMed Google Scholar
Xianchao Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, B., Sun, W., Kang, W., Xu, X. (2021). Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-86334-0_18
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)