When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

Li, Bohan; Yuan, Ye; Liang, Dingkang; Liu, Xiao; Ji, Zhilong; Bai, Jinfeng; Liu, Wenyu; Bai, Xiang

doi:10.1007/978-3-031-19815-1_12

Bohan Li^12,13,
Ye Yuan¹²,
Dingkang Liang¹³,
Xiao Liu¹²,
Zhilong Ji¹²,
Jinfeng Bai¹²,
Wenyu Liu¹³ &
…
Xiang Bai¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13688))

Included in the following conference series:

European Conference on Computer Vision

2385 Accesses
14 Citations

Abstract

Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism. However, such methods may fail to accurately read formulas with complicated structure or generate long markup sequences, as the attention results are often inaccurate due to the large variance of writing styles or spatial layouts. To alleviate this problem, we propose an unconventional network for HMER named Counting-Aware Network (CAN), which jointly optimizes two tasks: HMER and symbol counting. Specifically, we design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations, and then plug it into a typical attention-based encoder-decoder model for HMER. Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models, and CAN consistently outperforms the state-of-the-art methods. In particular, compared with an encoder-decoder model for HMER, the extra time cost caused by the proposed counting module is marginal. The source code is available at https://github.com/LBH1024/CAN.

B. Li and Y. Yuan—Contribute equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bian, X., Qin, B., Xin, X., Li, J., Su, X., Wang, Y.: Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 113–121 (2022)
Google Scholar
Chan, K.F., Yeung, D.Y.: Elastic structural matching for online handwritten alphanumeric character recognition. In: Proceedings of International Conference on Pattern Recognition, vol. 2, pp. 1508–1511 (1998)
Google Scholar
Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recogn. 34(8), 1671–1684 (2001)
Article Google Scholar
Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (2014)
Google Scholar
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: Proceedings of International Conference on Machine Learning, pp. 980–989 (2017)
Google Scholar
Ding, H., Chen, K., Huo, Q.: An encoder-decoder approach to handwritten mathematical expression recognition with multi-head attention and stacked decoder. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 602–616. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_39
Chapter Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Hu, L., Zanibbi, R.: HMM-based recognition of online handwritten mathematical symbols using segmental k-means initialization and a modified pen-up/down feature. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 457–462 (2011)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Keshari, B., Watt, S.: Hybrid mathematical symbol recognition using support vector machines. In: Proceedings of International Conference on Document Analysis and Recognition, vol. 2, pp. 859–863 (2007)
Google Scholar
Kosmala, A., Rigoll, G., Lavirotte, S., Pottier, L.: On-line handwritten formula recognition using hidden Markov models and context dependent graph grammars. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 107–110 (1999)
Google Scholar
Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Where are the blobs: counting by localization with point supervision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 560–576. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_34
Chapter Google Scholar
Lavirotte, S., Pottier, L.: Mathematical formula recognition using graph grammar. In: Document Recognition V, vol. 3305, pp. 44–52 (1998)
Google Scholar
Le, A.D.: Recognizing handwritten mathematical expressions via paired dual loss attention network and printed mathematical expressions. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops, pp. 566–567 (2020)
Google Scholar
Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 175–180 (2020)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Mouchere, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 791–796 (2014)
Google Scholar
Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 607–612 (2016)
Google Scholar
Parmar, N., et al.: Image transformer. In: Proceedings of International Conference on Machine Learning, pp. 4055–4064 (2018)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems 28 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(06), 1137–1149 (2017)
Article Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Truong, T.N., Nguyen, C.T., Phan, K.M., Nakagawa, M.: Improvement of end-to-end offline handwritten mathematical expression recognition by weakly supervised learning. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 181–186 (2020)
Google Scholar
Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Proceedings of the Association for Computational Linguistics, pp. 76–85 (2016)
Google Scholar
Vuong, B.Q., He, Y., Hui, S.C.: Towards a web-based progressive handwriting recognition environment for mathematical problem solving. Expert Syst. Appl. 37(1), 886–893 (2010)
Article Google Scholar
Wang, C., Zhang, H., Yang, L., Liu, S., Cao, X.: Deep people counting in extremely dense crowds. In: Proceedings of ACM Multimedia, pp. 1299–1302 (2015)
Google Scholar
Wang, J., Du, J., Zhang, J., Wang, Z.R.: Multi-modal attention network for handwritten mathematical expression recognition. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 1181–1186 (2019)
Google Scholar
Winkler, H.J.: HMM-based handwritten symbol recognition using on-line and off-line features. In: IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 6, pp. 3438–3441 (1996)
Google Scholar
Wu, J.-W., Yin, F., Zhang, Y.-M., Zhang, X.-Y., Liu, C.-L.: Image-to-markup generation via paired adversarial learning. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 18–34. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_2
Chapter Google Scholar
Wu, J.W., Yin, F., Zhang, Y.M., Zhang, X.Y., Liu, C.L.: Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vis. 128(10), 2386–2401 (2020)
Article MathSciNet Google Scholar
Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 6538–6547 (2019)
Google Scholar
Xu, C., Liang, D., Xu, Y., Bai, S., Zhan, W., Bai, X., Tomizuka, M.: AutoScale: learning to scale for crowd counting. Int. J. Comput. Vis. 130, 405–434 (2021). https://doi.org/10.1007/s11263-021-01542-z
Article Google Scholar
Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of International Conference on Machine Learning, pp. 2048–2057 (2015)
Google Scholar
Yan, Z., et al.: Perspective-guided convolution networks for crowd counting. In: Proceedings of IEEE International Conference Computer Vision (2019)
Google Scholar
Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N.: Weakly-supervised crowd counting learns from sorting rather than locations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 1–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_1
Chapter Google Scholar
Yuan, Y., et al.: Syntax-aware network for handwritten mathematical expression recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4553–4562 (2022)
Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
Zhang, A., et al.: Attentional neural fields for crowd counting. In: Proceedings of IEEE International Conference on Computer Vision (2019)
Google Scholar
Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: Proceedings of International Conference on Pattern Recognition, pp. 2245–2250 (2018)
Google Scholar
Zhang, J., Du, J., Dai, L.: Track, attend, and parse (TAP): an end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia 21(1), 221–233 (2018)
Article Google Scholar
Zhang, J., Du, J., Yang, Y., Song, Y.Z., Wei, S., Dai, L.: A tree-structured decoder for image-to-markup generation. In: Proceedings of International Conference on Machine Learning, pp. 11076–11085 (2020)
Google Scholar
Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)
Article Google Scholar
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Zhang, Z., He, T., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of freebies for training object detection neural networks. arXiv preprint arXiv:1902.04103 (2019)
Zhao, W., Gao, L., Yan, Z., Peng, S., Du, L., Zhang, Z.: Handwritten mathematical expression recognition with bidirectionally trained transformer. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 570–584 (2021)
Google Scholar

Download references

Acknowledgements

This work was done when Bohan Li was an intern at Tomorrow Advancing Life, and was supported in part by the National Natural Science Foundation of China 61733007 and the National Key R &D Program of China under Grant No. 2020AAA0104500.

Author information

Authors and Affiliations

Tomorrow Advancing Life, Beijing, China
Bohan Li, Ye Yuan, Xiao Liu, Zhilong Ji & Jinfeng Bai
Huazhong University of Science and Technology, Wuhan, China
Bohan Li, Dingkang Liang, Wenyu Liu & Xiang Bai

Authors

Bohan Li
View author publications
You can also search for this author in PubMed Google Scholar
Ye Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Dingkang Liang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhilong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Jinfeng Bai
View author publications
You can also search for this author in PubMed Google Scholar
Wenyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Bai .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, B. et al. (2022). When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13688. Springer, Cham. https://doi.org/10.1007/978-3-031-19815-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-19815-1_12
Published: 20 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19814-4
Online ISBN: 978-3-031-19815-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition