An Encoder-Decoder Approach to Handwritten Mathematical Expression Recognition with Multi-head Attention and Stacked Decoder

Ding, Haisong; Chen, Kai; Huo, Qiang

doi:10.1007/978-3-030-86331-9_39

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12822))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3960 Accesses
15 Citations

Abstract

Encoder-decoder framework with attention mechanism has become a mainstream solution to handwritten mathematical expression recognition (HMER) since “watch, attend and parse (WAP)" approach was proposed in 2017, where a convolutional neural network is used as encoder and a gated recurrent unit with attention is used in decoder. Inspired by the recent success of Transformer in many applications, in this paper, we adopt the design of multi-head attention and stacked decoder in Transformer to improve the decoder part of the WAP framework for HMER. Experimental results on CROHME tasks show that multi-head attention can boost the expression recognition rate (ExpRate) of WAP from 54.32%/58.05% to 56.76%/59.72% and stacked decoder can further improve ExpRate to 57.72%/61.38% on CROHME 2016/2019 test sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anderson, R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics. In: Symposium on Interactive Systems for Experimental Applied Mathematics: Proceedings of the Association for Computing Machinery Inc., Symposium, pp. 436–459. ACM (1967)
Google Scholar
Chan, K.-F., Yeung, D.-Y.: Mathematical expression recognition: a survey. Int. J. Doc. Anal. Recognit. 3(1), 3–15 (2000). https://doi.org/10.1007/PL00013549
Article Google Scholar
Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recognit. 15(4), 331–357 (2012). https://doi.org/10.1007/s10032-011-0174-4
Article Google Scholar
Zanibbi, R., Blostein, D., Cordy, J.R.: Recognizing mathematical expressions using tree transformation. IEEE Trans. Pattern Anal. Mach. Intell. 24(11), 1455–1467 (2002)
Article Google Scholar
Álvaro, F., Sánchez, J.-A., Benedí, J.-M.: Recognition of on-line handwritten mathematical expressions using 2d stochastic context-free grammars and hidden Markov models. Pattern Recognit. Lett. 35, 58–67 (2014)
Article Google Scholar
Awal, A.-M., Mouchère, H., Viard-Gaudin, C.: A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognit. Lett. 35, 68–77 (2014)
Article Google Scholar
Álvaro, F., Sánchez, J.-A., Benedí, J.-M.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recognit. 51, 135–147 (2016)
Article Google Scholar
Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (NIPS 2014)
Google Scholar
Bahdanau, D., Cho, H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 2014 International Conference on Learning Representations (ICLR)
Google Scholar
Zhang, J., Du, J., Zhang, S., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 71, 196–206 (2017)
Article Google Scholar
Deng, Y., Kanervisto. A., Ling. J., et al.: Image-to-markup generation with coarse-to-fine attention. In: 2017 International Conference on Machine Learning (ICML), pp. 980–989
Google Scholar
Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: 2018 International Conference on Pattern Recognition (ICPR), pp. 2245–2250
Google Scholar
Wu, J.-W., Yin, F., Zhang, Y.-M., Zhang, X.-Y., Liu, C.-L.: Image-to-markup generation via paired adversarial learning. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 18–34. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_2
Chapter Google Scholar
Wu, J.-W., Yin, F., Zhang, Y.-M., Zhang, X.-Y., Liu, C.-L.: Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vis. 128, 2386–2401 (2020). https://doi.org/10.1007/s11263-020-01291-5
Article MathSciNet Google Scholar
Hong, Z., You, N., Tan, J., et al.: Residual BiRNN based Seq2Seq model with transition probability matrix for online handwritten mathematical expression recognition, In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 635–640. IEEE
Google Scholar
Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: 2020 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 175–180
Google Scholar
Le, A.D., Indurkhya, B., Nakagawa, M.: Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognit. Lett. 128, 255–262 (2019)
Article Google Scholar
Truong, T. -N., Nguyen, C.T., Phan, K.M., et al.: Improvement of end-to-end offline handwritten mathematical expression recognition by weakly supervised learning. In: 2020 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 181–186
Google Scholar
Zhang, J., Du, J., Yang, Y., et al.: A tree-structured decoder for image-to-markup generation. In: 2020 International Conference on Machine Learning (ICML), pp 11076–11085
Google Scholar
Pang, N., Yang, C., Zhu, X., et al.: Global context-based network with transformer for image2latex. In: 2020 International Conference on Pattern Recognition (ICPR)
Google Scholar
Yan, Z., Zhang, X., Gao, L., et al.: ConvMath: a convolutional sequence network for mathematical expression recognition. In: 2020 International Conference on Pattern Recognition (ICPR)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: 2017 Advances in Neural Information Processing Systems (NeurIPS), pp. 6000–6010
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 Words: transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)
Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers, arXiv preprint arXiv:2005.12872 (2020)
Wang, Y., Mohamed, A., Le, D., et al.: Transformer-based acoustic modeling for hybrid speech recognition. In: 2020 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6874–6878. IEEE
Google Scholar
Gulati, A., Qin, J., Chiu, C.-C., et al.: Conformer: Convolution-augmented transformer for speech recognition, arXiv preprint arXiv:2005.08100 (2020)
Wang, J., Du, J., Zhang, J., et al.: Multi-modal attention network for handwritten mathematical expression recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1181–1186. IEEE
Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: 2015 Empirical Methods in Natural Language Processing (EMNLP), pp. 1412–1421
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE
Google Scholar
Baevski, A., Auli, M.: Adaptive input representations for neural language modeling. In: 2019 International Conference on Learning Representations (ICLR)
Google Scholar
Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-Line handwritten mathematical expressions. In: 2014 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 791–796
Google Scholar
Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: 2016 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 607–612
Google Scholar
Mahdavi, M., Zanibbi, R., Mouchère, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME+cTFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538
Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method, arXiv preprint arXiv:1212.5701 (2012)
Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)
Ott, M., Edunov, S., Baevski, A., et al.: Fairseq: a fast, extensible toolkit for sequence modeling. In: 2019 Human Language Technology: Conference of the North American Chapter of the Association of Computational Linguistics (NAACL-HLT) Demonstrations
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708
Google Scholar
Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models. In: 2017 Annual Conference of the International Speech Communication Association (Interspeech), pp. 523–527
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Asia, Beijing, China
Haisong Ding, Kai Chen & Qiang Huo

Authors

Haisong Ding
View author publications
You can also search for this author in PubMed Google Scholar
Kai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Huo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Chen .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, H., Chen, K., Huo, Q. (2021). An Encoder-Decoder Approach to Handwritten Mathematical Expression Recognition with Multi-head Attention and Stacked Decoder. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12822. Springer, Cham. https://doi.org/10.1007/978-3-030-86331-9_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-86331-9_39
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86330-2
Online ISBN: 978-3-030-86331-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)