SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder

Fu, Pengbin; Xiao, Ganyun; Yang, Huirong

doi:10.1007/s00371-024-03372-9

SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder

Research
Published: 03 May 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Pengbin Fu¹,
Ganyun Xiao¹ &
Huirong Yang¹

67 Accesses
Explore all metrics

Abstract

The complex two-dimensional structure poses huge challenges for handwritten mathematical expression recognition (HMER). Many researchers process the LaTeX sequence into a tree structure and then design tree decoders based on RNN to address this issue. However, RNNs have problems with long-term dependency due to their structural characteristics. Although Transformers solve the long-term dependency problem, tree decoders based on Transformers are rarely used for HMER because the attention coverage is significantly insufficient when the distance between parent and child nodes is large in tree structures. In this paper, we propose a novel offline HMER model SATD incorporating a tree decoder based on Transformer to learn the implicit structural relationships in LaTeX strings. Moreover, to address the issue of distant parent–child nodes, we introduce a multi-scale attention aggregation module to refine attention weights using contextual information with different receptive fields. Experiments on CROHME2014/2016/2019 and HME100K datasets demonstrate performance improvements, achieving accuracy rates of 63.45%/60.42%/61.05% on the CROHME 2014/2016/2019 test sets. The source code https://github.com/EnderXiao/SATD/ of this work will be publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial Attention and Syntax Rule Enhanced Tree Decoder for Offline Handwritten Mathematical Expression Recognition

CoMER: Modeling Coverage for Transformer-Based Handwritten Mathematical Expression Recognition

Semantic-Aware Non-local Network for Handwritten Mathematical Expression Recognition

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

Mouchere, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: Icfhr2016 crohme: competition on recognition of online handwritten mathematical expressions. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 607–612. IEEE, Shenzhen (2016). https://doi.org/10.1109/ICFHR.2016.0116
Sinwar, D., Dhaka, V.S., Pradhan, N., Pandey, S.: Offline script recognition from handwritten and printed multilingual documents: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 24, 97–121 (2021). https://doi.org/10.1007/s10032-021-00365-5
Article Google Scholar
Chan, K.-F., Yeung, D.-Y.: Mathematical expression recognition: a survey. Int. J. Doc. Anal. Recognit. 3(1), 3–15 (2000). https://doi.org/10.1007/PL00013549
Article MathSciNet Google Scholar
Li, F., Fang, H., Wang, D., Liu, R., Hou, Q., Xie, B.: Offline handwritten mathematical expression recognition based on YOLOv5s. Vis. Comput. (2023). https://doi.org/10.1007/s00371-023-02859-1
Article Google Scholar
Tang, J.-M., Guo, H.-Y., Wu, J.-W., Yin, F., Huang, L.-L.: Offline handwritten mathematical expression recognition with graph encoder and transformer decoder. Pattern Recognit. 148, 110155 (2024). https://doi.org/10.1016/j.patcog.2023.110155
Article Google Scholar
Kolen, J.F., Kremer, S.C.: Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies, pp. 237–243 (2001). https://doi.org/10.1109/9780470544037.ch14
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2048–2057. PMLR, Lille (2015). https://proceedings.mlr.press/v37/xuc15.html
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017). https://doi.org/10.1109/TPAMI.2016.2587640
Article Google Scholar
Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M., Wu, Y.: Coca: Contrastive captioners are image-text foundation models (2022). arXiv:2205.01917
Huang, L., Wang, W., Chen, J., Wei, X.-Y.: Attention on attention for image captioning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 508–524 (2023). https://doi.org/10.1109/TPAMI.2022.3144899
Article Google Scholar
Altan, A., Karasu, S., Zio, E.: A new hybrid model for wind speed forecasting combining long short-term memory neural network, decomposition methods and grey wolf optimizer. Appl. Soft Comput. 100, 106996 (2021). https://doi.org/10.1016/j.asoc.2020.106996
Article Google Scholar
Zhang, J., Du, J., Zhang, S., Liu, D., Hu, Y., Hu, J., Wei, S., Dai, L.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognit. 71, 196–206 (2017). https://doi.org/10.1016/j.patcog.2017.06.017
Article Google Scholar
Zhang, J., Du, J., Yang, Y., Song, Y.-Z., Dai, L.: SRD: a tree structure based decoder for online handwritten mathematical expression recognition. IEEE Trans. Multim. 23, 2471–2480 (2021). https://doi.org/10.1109/TMM.2020.3011316
Article Google Scholar
Zhang, J., Du, J., Yang, Y., Song, Y.-Z., Wei, S., Dai, L.: A tree-structured decoder for image-to-markup generation. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 11076–11085 (2020). https://proceedings.mlr.press/v119/zhang20g.html
Lin, Z., Li, J., Yang, F., Huang, S., Yang, X., Lin, J., Yang, M.: Spatial attention and syntax rule enhanced tree decoder for offline handwritten mathematical expression recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition. Lecture Notes in Computer Science, pp. 213–227. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21648-0_15
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997). https://doi.org/10.1109/78.650093
Article Google Scholar
Yuan, Y., Liu, X., Dikubab, W., Liu, H., Ji, Z., Wu, Z., Bai, X.: Syntax-aware network for handwritten mathematical expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4553–4562 (2022)
Zhao, W., Gao, L., Yan, Z., Peng, S., Du, L., Zhang, Z.: Handwritten mathematical expression recognition with bidirectionally trained transformer. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition—ICDAR 2021. Lecture Notes in Computer Science, pp. 570–584. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_37
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1423
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., Salakhutdinov, R.: Transformer-xl: Attentive language models beyond a fixed-length context. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988. Association for Computational Linguistics, Florence (2019). https://doi.org/10.18653/v1/P19-1285
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale (2020). arxiv:2010.11929
Bao, H., Dong, L., Piao, S., Wei, F.: Beit: Bert pre-training of image transformers (2021). arXiv:2106.08254
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: CVT: introducing convolutions to vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 22–31 (2021)
Wang, W., Bao, H., Dong, L., Bjorck, J., Peng, Z., Liu, Q., Aggarwal, K., Mohammed, O.K., Singhal, S., Som, S., Wei, F.: Image as a foreign language: Beit pretraining for vision and vision-language tasks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19175–19186 (2023)
Chen, Z., Duan, Y., Wang, W., He, J., Lu, T., Dai, J., Qiao, Y.: Vision transformer adapter for dense predictions (2022). arxiv:2205.08534
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multim. 25, 50–61 (2023). https://doi.org/10.1109/TMM.2021.3120873
Article Google Scholar
Li, L., Huang, T., Li, Y., Li, P.: Trajectory-bert: Pre-training and fine-tuning bidirectional transformers for crowd trajectory enhancement. Comput. Animat. Virtual Worlds (2023). https://doi.org/10.1002/CAV.2190
Article Google Scholar
Li, Z., Yang, W., Qi, H., Jin, L., Huang, Y., Ding, K.: A tree-based model with branch parallel decoding for handwritten mathematical expression recognition. Pattern Recognit. 149, 110220 (2024). https://doi.org/10.1016/j.patcog.2023.110220
Article Google Scholar
Ujjwal Thakur, A.S.: Offline handwritten mathematical recognition using adversarial learning and transformers. Int. J. Doc. Anal. Recognit. (IJDAR) (2023). https://doi.org/10.1007/s10032-023-00451-w
Article Google Scholar
Bengio, Y., Frasconi, P., Simard, P.: The problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, pp. 1183–1188. IEEE, San Francisco (1993). https://doi.org/10.1109/ICNN.1993.298725
Bian, X., Qin, B., Xin, X., Li, J., Su, X., Wang, Y.: Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. Proc. AAAI Conf. Artif. Intell. 36(1), 113–121 (2022). https://doi.org/10.1609/aaai.v36i1.19885
Article Google Scholar
Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Erk, K., Smith, N.A. (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 76–85. Association for Computational Linguistics, Berlin (2016). https://doi.org/10.18653/v1/P16-1008
Ahmad, W., Bai, X., Lee, S., Chang, K.-W.: Select, extract and generate: Neural keyphrase generation with layer-wise coverage attention. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1389–1404. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.111
Zhang, X., Liu, G.: Selective and coverage multi-head attention for abstractive summarization. J. Phys.: Conf. Ser. 1453, 012004 (2020). https://doi.org/10.1088/1742-6596/1453/1/012004
Article Google Scholar
Zhao, W., Gao, L.: Comer: modeling coverage for transformer-based handwritten mathematical expression recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision—ECCV 2022, pp. 392–408. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_23
Sun, Z., Zhu, Q., Xiong, Y., Sun, Y., Mou, L., Zhang, L.: TreeGen: a tree-based transformer architecture for code generation. Proc. AAAI Conf. Artif. Intell. 34(05), 8984–8991 (2020). https://doi.org/10.1609/aaai.v34i05.6430
Article Google Scholar
Wang, Y.-S., Lee, H.-Y., Chen, Y.-N.: Tree transformer: integrating tree structures into self-attention (2019). arXiv:1909.06639
Harer, J., Reale, C., Chin, P.: Tree-transformer: a transformer-based method for correction of tree-structured data (2019). arXiv:1908.00449
Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269. IEEE Computer Society, Los Alamitos (2017). https://doi.org/10.1109/CVPR.2017.243
Chen, X., Liu, C., Song, D.: Tree-to-tree neural networks for program translation. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Chakraborty, S., Ding, Y., Allamanis, M., Ray, B.: Codit: code editing with tree-based neural models. IEEE Trans. Softw. Eng. 48(4), 1385–1399 (2022). https://doi.org/10.1109/TSE.2020.3020502
Article Google Scholar
Alvarez-Melis, D., Jaakkola, T.S.: Tree-structured decoding with doubly-recurrent neural networks. In: International Conference on Learning Representations (2016)
Freitag, M., Al-Onaizan, Y.: Beam search strategies for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, pp. 56–60 (2017). https://doi.org/10.18653/v1/W17-3207
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
...Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook (2019)
Google Scholar
Mahdavi, M., Zanibbi, R., Mouchere, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME + TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538. IEEE, Sydney (2019). https://doi.org/10.1109/ICDAR.2019.00247
Ding, H., Chen, K., Huo, Q.: An encoder-decoder approach to handwritten mathematical expression recognition with multi-head attention and stacked decoder. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition—ICDAR 2021. Lecture Notes in Computer Science, pp. 602–616. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_39
Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2245–2250 (2018). https://doi.org/10.1109/ICPR.2018.8546031
Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 175–180 (2020). https://doi.org/10.1109/ICFHR2020.2020.00041

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing university of technology, Xidawang Road, Beijing, 100124, China
Pengbin Fu, Ganyun Xiao & Huirong Yang

Authors

Pengbin Fu
View author publications
You can also search for this author in PubMed Google Scholar
Ganyun Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Huirong Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huirong Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fu, P., Xiao, G. & Yang, H. SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder. Vis Comput (2024). https://doi.org/10.1007/s00371-024-03372-9

Download citation

Accepted: 13 March 2024
Published: 03 May 2024
DOI: https://doi.org/10.1007/s00371-024-03372-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder

Abstract

Access this article

Similar content being viewed by others

Spatial Attention and Syntax Rule Enhanced Tree Decoder for Offline Handwritten Mathematical Expression Recognition

CoMER: Modeling Coverage for Transformer-Based Handwritten Mathematical Expression Recognition

Semantic-Aware Non-local Network for Handwritten Mathematical Expression Recognition

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder

Abstract

Access this article

Similar content being viewed by others

Spatial Attention and Syntax Rule Enhanced Tree Decoder for Offline Handwritten Mathematical Expression Recognition

CoMER: Modeling Coverage for Transformer-Based Handwritten Mathematical Expression Recognition

Semantic-Aware Non-local Network for Handwritten Mathematical Expression Recognition

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation