Skip to main content
Log in

SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

The complex two-dimensional structure poses huge challenges for handwritten mathematical expression recognition (HMER). Many researchers process the LaTeX sequence into a tree structure and then design tree decoders based on RNN to address this issue. However, RNNs have problems with long-term dependency due to their structural characteristics. Although Transformers solve the long-term dependency problem, tree decoders based on Transformers are rarely used for HMER because the attention coverage is significantly insufficient when the distance between parent and child nodes is large in tree structures. In this paper, we propose a novel offline HMER model SATD incorporating a tree decoder based on Transformer to learn the implicit structural relationships in LaTeX strings. Moreover, to address the issue of distant parent–child nodes, we introduce a multi-scale attention aggregation module to refine attention weights using contextual information with different receptive fields. Experiments on CROHME2014/2016/2019 and HME100K datasets demonstrate performance improvements, achieving accuracy rates of 63.45%/60.42%/61.05% on the CROHME 2014/2016/2019 test sets. The source code https://github.com/EnderXiao/SATD/ of this work will be publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

  1. Mouchere, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: Icfhr2016 crohme: competition on recognition of online handwritten mathematical expressions. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 607–612. IEEE, Shenzhen (2016). https://doi.org/10.1109/ICFHR.2016.0116

  2. Sinwar, D., Dhaka, V.S., Pradhan, N., Pandey, S.: Offline script recognition from handwritten and printed multilingual documents: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 24, 97–121 (2021). https://doi.org/10.1007/s10032-021-00365-5

    Article  Google Scholar 

  3. Chan, K.-F., Yeung, D.-Y.: Mathematical expression recognition: a survey. Int. J. Doc. Anal. Recognit. 3(1), 3–15 (2000). https://doi.org/10.1007/PL00013549

    Article  MathSciNet  Google Scholar 

  4. Li, F., Fang, H., Wang, D., Liu, R., Hou, Q., Xie, B.: Offline handwritten mathematical expression recognition based on YOLOv5s. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02859-1

    Article  Google Scholar 

  5. Tang, J.-M., Guo, H.-Y., Wu, J.-W., Yin, F., Huang, L.-L.: Offline handwritten mathematical expression recognition with graph encoder and transformer decoder. Pattern Recognit. 148, 110155 (2024). https://doi.org/10.1016/j.patcog.2023.110155

    Article  Google Scholar 

  6. Kolen, J.F., Kremer, S.C.: Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies, pp. 237–243 (2001). https://doi.org/10.1109/9780470544037.ch14

  7. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  8. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2048–2057. PMLR, Lille (2015). https://proceedings.mlr.press/v37/xuc15.html

  9. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017). https://doi.org/10.1109/TPAMI.2016.2587640

    Article  Google Scholar 

  10. Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., Wu, Y.: Coca: Contrastive captioners are image-text foundation models (2022). arXiv:2205.01917

  11. Huang, L., Wang, W., Chen, J., Wei, X.-Y.: Attention on attention for image captioning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

  12. Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 508–524 (2023). https://doi.org/10.1109/TPAMI.2022.3144899

    Article  Google Scholar 

  13. Altan, A., Karasu, S., Zio, E.: A new hybrid model for wind speed forecasting combining long short-term memory neural network, decomposition methods and grey wolf optimizer. Appl. Soft Comput. 100, 106996 (2021). https://doi.org/10.1016/j.asoc.2020.106996

    Article  Google Scholar 

  14. Zhang, J., Du, J., Zhang, S., Liu, D., Hu, Y., Hu, J., Wei, S., Dai, L.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 71, 196–206 (2017). https://doi.org/10.1016/j.patcog.2017.06.017

    Article  Google Scholar 

  15. Zhang, J., Du, J., Yang, Y., Song, Y.-Z., Dai, L.: SRD: a tree structure based decoder for online handwritten mathematical expression recognition. IEEE Trans. Multim. 23, 2471–2480 (2021). https://doi.org/10.1109/TMM.2020.3011316

    Article  Google Scholar 

  16. Zhang, J., Du, J., Yang, Y., Song, Y.-Z., Wei, S., Dai, L.: A tree-structured decoder for image-to-markup generation. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 11076–11085 (2020). https://proceedings.mlr.press/v119/zhang20g.html

  17. Lin, Z., Li, J., Yang, F., Huang, S., Yang, X., Lin, J., Yang, M.: Spatial attention and syntax rule enhanced tree decoder for offline handwritten mathematical expression recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition. Lecture Notes in Computer Science, pp. 213–227. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21648-0_15

  18. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093

    Article  Google Scholar 

  19. Yuan, Y., Liu, X., Dikubab, W., Liu, H., Ji, Z., Wu, Z., Bai, X.: Syntax-aware network for handwritten mathematical expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4553–4562 (2022)

  20. Zhao, W., Gao, L., Yan, Z., Peng, S., Du, L., Zhang, Z.: Handwritten mathematical expression recognition with bidirectionally trained transformer. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition—ICDAR 2021. Lecture Notes in Computer Science, pp. 570–584. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_37

  21. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

  22. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1423

  23. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer-xl: Attentive language models beyond a fixed-length context. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/P19-1285

  24. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale (2020). arxiv:2010.11929

  25. Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers (2021). arXiv:2106.08254

  26. Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: CVT: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 22–31 (2021)

  27. Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu, Q., Aggarwal, K., Mohammed, O.K., Singhal, S., Som, S., Wei, F.: Image as a foreign language: Beit pretraining for vision and vision-language tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19175–19186 (2023)

  28. Chen, Z., Duan, Y., Wang, W., He, J., Lu, T., Dai, J., Qiao, Y.: Vision transformer adapter for dense predictions (2022). arxiv:2205.08534

  29. Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multim. 25, 50–61 (2023). https://doi.org/10.1109/TMM.2021.3120873

    Article  Google Scholar 

  30. Li, L., Huang, T., Li, Y., Li, P.: Trajectory-bert: Pre-training and fine-tuning bidirectional transformers for crowd trajectory enhancement. Comput. Animat. Virtual Worlds (2023). https://doi.org/10.1002/CAV.2190

    Article  Google Scholar 

  31. Li, Z., Yang, W., Qi, H., Jin, L., Huang, Y., Ding, K.: A tree-based model with branch parallel decoding for handwritten mathematical expression recognition. Pattern Recognit. 149, 110220 (2024). https://doi.org/10.1016/j.patcog.2023.110220

    Article  Google Scholar 

  32. Ujjwal Thakur, A.S.: Offline handwritten mathematical recognition using adversarial learning and transformers. Int. J. Doc. Anal. Recognit. (IJDAR) (2023). https://doi.org/10.1007/s10032-023-00451-w

    Article  Google Scholar 

  33. Bengio, Y., Frasconi, P., Simard, P.: The problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, pp. 1183–1188. IEEE, San Francisco (1993). https://doi.org/10.1109/ICNN.1993.298725

  34. Bian, X., Qin, B., Xin, X., Li, J., Su, X., Wang, Y.: Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. Proc. AAAI Conf. Artif. Intell. 36(1), 113–121 (2022). https://doi.org/10.1609/aaai.v36i1.19885

    Article  Google Scholar 

  35. Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 76–85. Association for Computational Linguistics, Berlin (2016). https://doi.org/10.18653/v1/P16-1008

  36. Ahmad, W., Bai, X., Lee, S., Chang, K.-W.: Select, extract and generate: Neural keyphrase generation with layer-wise coverage attention. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1389–1404. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.111

  37. Zhang, X., Liu, G.: Selective and coverage multi-head attention for abstractive summarization. J. Phys.: Conf. Ser. 1453, 012004 (2020). https://doi.org/10.1088/1742-6596/1453/1/012004

    Article  Google Scholar 

  38. Zhao, W., Gao, L.: Comer: modeling coverage for transformer-based handwritten mathematical expression recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision—ECCV 2022, pp. 392–408. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_23

  39. Sun, Z., Zhu, Q., Xiong, Y., Sun, Y., Mou, L., Zhang, L.: TreeGen: a tree-based transformer architecture for code generation. Proc. AAAI Conf. Artif. Intell. 34(05), 8984–8991 (2020). https://doi.org/10.1609/aaai.v34i05.6430

    Article  Google Scholar 

  40. Wang, Y.-S., Lee, H.-Y., Chen, Y.-N.: Tree transformer: integrating tree structures into self-attention (2019). arXiv:1909.06639

  41. Harer, J., Reale, C., Chin, P.: Tree-transformer: a transformer-based method for correction of tree-structured data (2019). arXiv:1908.00449

  42. Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. IEEE Computer Society, Los Alamitos (2017). https://doi.org/10.1109/CVPR.2017.243

  43. Chen, X., Liu, C., Song, D.: Tree-to-tree neural networks for program translation. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

  44. Chakraborty, S., Ding, Y., Allamanis, M., Ray, B.: Codit: code editing with tree-based neural models. IEEE Trans. Softw. Eng. 48(4), 1385–1399 (2022). https://doi.org/10.1109/TSE.2020.3020502

    Article  Google Scholar 

  45. Alvarez-Melis, D., Jaakkola, T.S.: Tree-structured decoding with doubly-recurrent neural networks. In: International Conference on Learning Representations (2016)

  46. Freitag, M., Al-Onaizan, Y.: Beam search strategies for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, pp. 56–60 (2017). https://doi.org/10.18653/v1/W17-3207

  47. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  48. ...Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook (2019)

    Google Scholar 

  49. Mahdavi, M., Zanibbi, R., Mouchere, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538. IEEE, Sydney (2019). https://doi.org/10.1109/ICDAR.2019.00247

  50. Ding, H., Chen, K., Huo, Q.: An encoder-decoder approach to handwritten mathematical expression recognition with multi-head attention and stacked decoder. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition—ICDAR 2021. Lecture Notes in Computer Science, pp. 602–616. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_39

  51. Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2245–2250 (2018). https://doi.org/10.1109/ICPR.2018.8546031

  52. Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 175–180 (2020). https://doi.org/10.1109/ICFHR2020.2020.00041

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huirong Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, P., Xiao, G. & Yang, H. SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03372-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-024-03372-9

Keywords

Navigation