Skip to main content

An Encoder-Decoder Approach to Handwritten Mathematical Expression Recognition with Multi-head Attention and Stacked Decoder

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Abstract

Encoder-decoder framework with attention mechanism has become a mainstream solution to handwritten mathematical expression recognition (HMER) since “watch, attend and parse (WAP)" approach was proposed in 2017, where a convolutional neural network is used as encoder and a gated recurrent unit with attention is used in decoder. Inspired by the recent success of Transformer in many applications, in this paper, we adopt the design of multi-head attention and stacked decoder in Transformer to improve the decoder part of the WAP framework for HMER. Experimental results on CROHME tasks show that multi-head attention can boost the expression recognition rate (ExpRate) of WAP from 54.32%/58.05% to 56.76%/59.72% and stacked decoder can further improve ExpRate to 57.72%/61.38% on CROHME 2016/2019 test sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderson, R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics. In: Symposium on Interactive Systems for Experimental Applied Mathematics: Proceedings of the Association for Computing Machinery Inc., Symposium, pp. 436–459. ACM (1967)

    Google Scholar 

  2. Chan, K.-F., Yeung, D.-Y.: Mathematical expression recognition: a survey. Int. J. Doc. Anal. Recognit. 3(1), 3–15 (2000). https://doi.org/10.1007/PL00013549

    Article  Google Scholar 

  3. Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recognit. 15(4), 331–357 (2012). https://doi.org/10.1007/s10032-011-0174-4

    Article  Google Scholar 

  4. Zanibbi, R., Blostein, D., Cordy, J.R.: Recognizing mathematical expressions using tree transformation. IEEE Trans. Pattern Anal. Mach. Intell. 24(11), 1455–1467 (2002)

    Article  Google Scholar 

  5. Álvaro, F., Sánchez, J.-A., Benedí, J.-M.: Recognition of on-line handwritten mathematical expressions using 2d stochastic context-free grammars and hidden Markov models. Pattern Recognit. Lett. 35, 58–67 (2014)

    Article  Google Scholar 

  6. Awal, A.-M., Mouchère, H., Viard-Gaudin, C.: A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognit. Lett. 35, 68–77 (2014)

    Article  Google Scholar 

  7. Álvaro, F., Sánchez, J.-A., Benedí, J.-M.: An integrated grammar-based approach for mathematical expression recognition. Pattern Recognit. 51, 135–147 (2016)

    Article  Google Scholar 

  8. Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (NIPS 2014)

    Google Scholar 

  9. Bahdanau, D., Cho, H., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 2014 International Conference on Learning Representations (ICLR)

    Google Scholar 

  10. Zhang, J., Du, J., Zhang, S., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 71, 196–206 (2017)

    Article  Google Scholar 

  11. Deng, Y., Kanervisto. A., Ling. J., et al.: Image-to-markup generation with coarse-to-fine attention. In: 2017 International Conference on Machine Learning (ICML), pp. 980–989

    Google Scholar 

  12. Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: 2018 International Conference on Pattern Recognition (ICPR), pp. 2245–2250

    Google Scholar 

  13. Wu, J.-W., Yin, F., Zhang, Y.-M., Zhang, X.-Y., Liu, C.-L.: Image-to-markup generation via paired adversarial learning. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 18–34. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_2

    Chapter  Google Scholar 

  14. Wu, J.-W., Yin, F., Zhang, Y.-M., Zhang, X.-Y., Liu, C.-L.: Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vis. 128, 2386–2401 (2020). https://doi.org/10.1007/s11263-020-01291-5

    Article  MathSciNet  Google Scholar 

  15. Hong, Z., You, N., Tan, J., et al.: Residual BiRNN based Seq2Seq model with transition probability matrix for online handwritten mathematical expression recognition, In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 635–640. IEEE

    Google Scholar 

  16. Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: 2020 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 175–180

    Google Scholar 

  17. Le, A.D., Indurkhya, B., Nakagawa, M.: Pattern generation strategies for improving recognition of handwritten mathematical expressions. Pattern Recognit. Lett. 128, 255–262 (2019)

    Article  Google Scholar 

  18. Truong, T. -N., Nguyen, C.T., Phan, K.M., et al.: Improvement of end-to-end offline handwritten mathematical expression recognition by weakly supervised learning. In: 2020 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 181–186

    Google Scholar 

  19. Zhang, J., Du, J., Yang, Y., et al.: A tree-structured decoder for image-to-markup generation. In: 2020 International Conference on Machine Learning (ICML), pp 11076–11085

    Google Scholar 

  20. Pang, N., Yang, C., Zhu, X., et al.: Global context-based network with transformer for image2latex. In: 2020 International Conference on Pattern Recognition (ICPR)

    Google Scholar 

  21. Yan, Z., Zhang, X., Gao, L., et al.: ConvMath: a convolutional sequence network for mathematical expression recognition. In: 2020 International Conference on Pattern Recognition (ICPR)

    Google Scholar 

  22. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: 2017 Advances in Neural Information Processing Systems (NeurIPS), pp. 6000–6010

    Google Scholar 

  23. Devlin, J., Chang, M.-W., Lee, K., et al.: BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  24. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 Words: transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020)

  25. Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers, arXiv preprint arXiv:2005.12872 (2020)

  26. Wang, Y., Mohamed, A., Le, D., et al.: Transformer-based acoustic modeling for hybrid speech recognition. In: 2020 International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6874–6878. IEEE

    Google Scholar 

  27. Gulati, A., Qin, J., Chiu, C.-C., et al.: Conformer: Convolution-augmented transformer for speech recognition, arXiv preprint arXiv:2005.08100 (2020)

  28. Wang, J., Du, J., Zhang, J., et al.: Multi-modal attention network for handwritten mathematical expression recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1181–1186. IEEE

    Google Scholar 

  29. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: 2015 Empirical Methods in Natural Language Processing (EMNLP), pp. 1412–1421

    Google Scholar 

  30. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: 2015 International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE

    Google Scholar 

  31. Baevski, A., Auli, M.: Adaptive input representations for neural language modeling. In: 2019 International Conference on Learning Representations (ICLR)

    Google Scholar 

  32. Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-Line handwritten mathematical expressions. In: 2014 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 791–796

    Google Scholar 

  33. Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: 2016 International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 607–612

    Google Scholar 

  34. Mahdavi, M., Zanibbi, R., Mouchère, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME+cTFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538

    Google Scholar 

  35. Zeiler, M.D.: ADADELTA: an adaptive learning rate method, arXiv preprint arXiv:1212.5701 (2012)

  36. Paszke, A., Gross, S., Massa, F., et al.: Pytorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)

  37. Ott, M., Edunov, S., Baevski, A., et al.: Fairseq: a fast, extensible toolkit for sequence modeling. In: 2019 Human Language Technology: Conference of the North American Chapter of the Association of Computational Linguistics (NAACL-HLT) Demonstrations

    Google Scholar 

  38. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708

    Google Scholar 

  39. Chorowski, J., Jaitly, N.: Towards better decoding and language model integration in sequence to sequence models. In: 2017 Annual Conference of the International Speech Communication Association (Interspeech), pp. 523–527

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ding, H., Chen, K., Huo, Q. (2021). An Encoder-Decoder Approach to Handwritten Mathematical Expression Recognition with Multi-head Attention and Stacked Decoder. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12822. Springer, Cham. https://doi.org/10.1007/978-3-030-86331-9_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86331-9_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86330-2

  • Online ISBN: 978-3-030-86331-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics