Abstract
In this paper, we build the deep neural networks for offline handwritten Chinese text recognition (HCTR) with only convolutional layers and one of the mainstream learning methods Connectionist Temporal Classification (CTC). Iteratively, different configurations of network architectures with residual and squeeze-and-excitation structures are explored. We ease the serious overfitting issue by applying high dropout rate 0.9 at the input of the last classification layer, and synthesize new text samples with isolated characters by reusing the character-level bounding boxes from the CASIA-HWDB train set. These empirical and intuitive tricks help us achieve the character error rate (CER) at 6.38% on the ICDAR2013 competition set. To further improve the performance, at each step of the CTC decoding, we propose a novel context beam search (CBS) algorithm, which conducts decoding from both the prediction of the basic visual model and another customized transformer-based language model simultaneously. The final CER is reduced to 2.49%. Code will be available online at https://github.com/intel/handwritten-chinese-ocr.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, C.-L., et al.: CASIA online and offline Chinese handwriting databases. In: 2011 International Conference on Document Analysis and Recognition. IEEE (2011)
Breuel, T.M., et al.: High-performance OCR for printed English and Fraktur using LSTM networks. In: 2013 12th International Conference on Document Analysis and Recognition. IEEE (2013)
Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2016)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Breuel, T.M.: High performance text recognition using a hybrid convolutional-LSTM implementation. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 11–16. IEEE (2017)
Busta, M., Neumann, L., Matas, J.: Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
Peng, D., et al.: A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. arXiv preprint arXiv:1812.11894 (2018)
Reeve Ingle, R., et al.: A scalable handwritten text recognition system. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)
Liu, B., Xu, X., Zhang, Y.: Offline handwritten Chinese text recognition with convolutional neural networks. arXiv preprint arXiv:2006.15619 (2020)
Ptucha, R., et al.: Intelligent character recognition using fully convolutional neural networks. Pattern Recogn. 88, 604–613 (2019)
Sheng, F., Chen, Z., Xu, B.: NRTR: a no-recurrence sequence-to-sequence model for scene text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)
Lu, N., et al.: Master: multi-aspect non-local network for scene text recognition. arXiv preprint arXiv:1910.02562 (2019)
Wu, Y.-C., Yin, F., Liu, C.-L.: Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recogn. 65, 251–264 (2017)
Wang, Z.-R., Du, J., Wang, J.-M.: Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition. Pattern Recogn. 100, 107102 (2020)
Du, J., et al.: Deep neural network based hidden Markov model for offline handwritten Chinese text recognition. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE (2016)
Wang, Z.-R., Du, J., Wang, W.-C., Zhai, J.-F., Hu, J.-S.: A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition. Int. J. Doc. Anal. Recogn. (IJDAR) 21(4), 241–251 (2018). https://doi.org/10.1007/s10032-018-0307-0
Wang, W., Du, J., Wang, Z.-R.: Parsimonious HMMS for offline handwritten Chinese text recognition. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2018)
Messina, R., Louradour, J.: Segmentation-free handwritten Chinese text recognition with LSTM-RNN. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE (2015)
Xie, C., Lai, S., Liao, Q., Jin, L.: High performance offline handwritten Chinese text recognition with a new data preprocessing and augmentation pipeline. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 45–59. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_4
Xiu, Y., et al.: A handwritten Chinese text recognizer applying multi-level multimodal fusion network. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Zhang, S., et al.: Spelling error correction with soft-masked BERT. arXiv preprint arXiv:2005.07421 (2020)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
Graves, A., Fernández, S., Gomez,F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)
Hannun, A.: Sequence modeling with CTC. Distill 2(11), e8 (2017)
Weng, L.: Generalized Language Models. http://lilianweng.github.io/lil-log/2019/01/31/generalized-language-models.html (2019)
Ott, M., et al.: fairseq: a fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019)
Hannun, A.Y., et al.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs. arXiv preprint arXiv:1408.2873 (2014)
Xu, L.: NLP Chinese corpus: large scale Chinese corpus for NLP (2019). https://doi.org/10.5281/zenodo.3402033
Naren, S.: PyTorch bindings for Warp-CTC. https://github.com/SeanNaren/warp-ctc
Micikevicius, P., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
Yin, F., et al.: ICDAR 2013 Chinese handwriting recognition competition. In: 2013 12th International Conference on Document Analysis and Recognition. IEEE (2013)
Wang, S., et al.: Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2016)
Wu, Y.-C., et al.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1. IEEE (2017)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, B., Sun, W., Kang, W., Xu, X. (2021). Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-86334-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)