Skip to main content

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13688))

Included in the following conference series:

Abstract

Recently, most handwritten mathematical expression recognition (HMER) methods adopt the encoder-decoder networks, which directly predict the markup sequences from formula images with the attention mechanism. However, such methods may fail to accurately read formulas with complicated structure or generate long markup sequences, as the attention results are often inaccurate due to the large variance of writing styles or spatial layouts. To alleviate this problem, we propose an unconventional network for HMER named Counting-Aware Network (CAN), which jointly optimizes two tasks: HMER and symbol counting. Specifically, we design a weakly-supervised counting module that can predict the number of each symbol class without the symbol-level position annotations, and then plug it into a typical attention-based encoder-decoder model for HMER. Experiments on the benchmark datasets for HMER validate that both joint optimization and counting results are beneficial for correcting the prediction errors of encoder-decoder models, and CAN consistently outperforms the state-of-the-art methods. In particular, compared with an encoder-decoder model for HMER, the extra time cost caused by the proposed counting module is marginal. The source code is available at https://github.com/LBH1024/CAN.

B. Li and Y. Yuan—Contribute equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bian, X., Qin, B., Xin, X., Li, J., Su, X., Wang, Y.: Handwritten mathematical expression recognition via attention aggregation based bi-directional mutual learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 113–121 (2022)

    Google Scholar 

  2. Chan, K.F., Yeung, D.Y.: Elastic structural matching for online handwritten alphanumeric character recognition. In: Proceedings of International Conference on Pattern Recognition, vol. 2, pp. 1508–1511 (1998)

    Google Scholar 

  3. Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recogn. 34(8), 1671–1684 (2001)

    Article  Google Scholar 

  4. Cho, K., van Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (2014)

    Google Scholar 

  5. Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: Proceedings of International Conference on Machine Learning, pp. 980–989 (2017)

    Google Scholar 

  6. Ding, H., Chen, K., Huo, Q.: An encoder-decoder approach to handwritten mathematical expression recognition with multi-head attention and stacked decoder. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 602–616. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_39

    Chapter  Google Scholar 

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

    Google Scholar 

  8. Hu, L., Zanibbi, R.: HMM-based recognition of online handwritten mathematical symbols using segmental k-means initialization and a modified pen-up/down feature. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 457–462 (2011)

    Google Scholar 

  9. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)

    Google Scholar 

  10. Keshari, B., Watt, S.: Hybrid mathematical symbol recognition using support vector machines. In: Proceedings of International Conference on Document Analysis and Recognition, vol. 2, pp. 859–863 (2007)

    Google Scholar 

  11. Kosmala, A., Rigoll, G., Lavirotte, S., Pottier, L.: On-line handwritten formula recognition using hidden Markov models and context dependent graph grammars. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 107–110 (1999)

    Google Scholar 

  12. Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Where are the blobs: counting by localization with point supervision. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 560–576. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_34

    Chapter  Google Scholar 

  13. Lavirotte, S., Pottier, L.: Mathematical formula recognition using graph grammar. In: Document Recognition V, vol. 3305, pp. 44–52 (1998)

    Google Scholar 

  14. Le, A.D.: Recognizing handwritten mathematical expressions via paired dual loss attention network and printed mathematical expressions. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition Workshops, pp. 566–567 (2020)

    Google Scholar 

  15. Li, Y., Zhang, X., Chen, D.: CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  16. Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 175–180 (2020)

    Google Scholar 

  17. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  18. Mouchere, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 791–796 (2014)

    Google Scholar 

  19. Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 607–612 (2016)

    Google Scholar 

  20. Parmar, N., et al.: Image transformer. In: Proceedings of International Conference on Machine Learning, pp. 4055–4064 (2018)

    Google Scholar 

  21. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems 28 (2015)

    Google Scholar 

  22. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(06), 1137–1149 (2017)

    Article  Google Scholar 

  23. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  24. Truong, T.N., Nguyen, C.T., Phan, K.M., Nakagawa, M.: Improvement of end-to-end offline handwritten mathematical expression recognition by weakly supervised learning. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 181–186 (2020)

    Google Scholar 

  25. Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Proceedings of the Association for Computational Linguistics, pp. 76–85 (2016)

    Google Scholar 

  26. Vuong, B.Q., He, Y., Hui, S.C.: Towards a web-based progressive handwriting recognition environment for mathematical problem solving. Expert Syst. Appl. 37(1), 886–893 (2010)

    Article  Google Scholar 

  27. Wang, C., Zhang, H., Yang, L., Liu, S., Cao, X.: Deep people counting in extremely dense crowds. In: Proceedings of ACM Multimedia, pp. 1299–1302 (2015)

    Google Scholar 

  28. Wang, J., Du, J., Zhang, J., Wang, Z.R.: Multi-modal attention network for handwritten mathematical expression recognition. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 1181–1186 (2019)

    Google Scholar 

  29. Winkler, H.J.: HMM-based handwritten symbol recognition using on-line and off-line features. In: IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 6, pp. 3438–3441 (1996)

    Google Scholar 

  30. Wu, J.-W., Yin, F., Zhang, Y.-M., Zhang, X.-Y., Liu, C.-L.: Image-to-markup generation via paired adversarial learning. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11051, pp. 18–34. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10925-7_2

    Chapter  Google Scholar 

  31. Wu, J.W., Yin, F., Zhang, Y.M., Zhang, X.Y., Liu, C.L.: Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vis. 128(10), 2386–2401 (2020)

    Article  MathSciNet  Google Scholar 

  32. Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 6538–6547 (2019)

    Google Scholar 

  33. Xu, C., Liang, D., Xu, Y., Bai, S., Zhan, W., Bai, X., Tomizuka, M.: AutoScale: learning to scale for crowd counting. Int. J. Comput. Vis. 130, 405–434 (2021). https://doi.org/10.1007/s11263-021-01542-z

    Article  Google Scholar 

  34. Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of International Conference on Machine Learning, pp. 2048–2057 (2015)

    Google Scholar 

  35. Yan, Z., et al.: Perspective-guided convolution networks for crowd counting. In: Proceedings of IEEE International Conference Computer Vision (2019)

    Google Scholar 

  36. Yang, Y., Li, G., Wu, Z., Su, L., Huang, Q., Sebe, N.: Weakly-supervised crowd counting learns from sorting rather than locations. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 1–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_1

    Chapter  Google Scholar 

  37. Yuan, Y., et al.: Syntax-aware network for handwritten mathematical expression recognition. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4553–4562 (2022)

    Google Scholar 

  38. Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

  39. Zhang, A., et al.: Attentional neural fields for crowd counting. In: Proceedings of IEEE International Conference on Computer Vision (2019)

    Google Scholar 

  40. Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: Proceedings of International Conference on Pattern Recognition, pp. 2245–2250 (2018)

    Google Scholar 

  41. Zhang, J., Du, J., Dai, L.: Track, attend, and parse (TAP): an end-to-end framework for online handwritten mathematical expression recognition. IEEE Trans. Multimedia 21(1), 221–233 (2018)

    Article  Google Scholar 

  42. Zhang, J., Du, J., Yang, Y., Song, Y.Z., Wei, S., Dai, L.: A tree-structured decoder for image-to-markup generation. In: Proceedings of International Conference on Machine Learning, pp. 11076–11085 (2020)

    Google Scholar 

  43. Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)

    Article  Google Scholar 

  44. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  45. Zhang, Z., He, T., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of freebies for training object detection neural networks. arXiv preprint arXiv:1902.04103 (2019)

  46. Zhao, W., Gao, L., Yan, Z., Peng, S., Du, L., Zhang, Z.: Handwritten mathematical expression recognition with bidirectionally trained transformer. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 570–584 (2021)

    Google Scholar 

Download references

Acknowledgements

This work was done when Bohan Li was an intern at Tomorrow Advancing Life, and was supported in part by the National Natural Science Foundation of China 61733007 and the National Key R &D Program of China under Grant No. 2020AAA0104500.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Bai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, B. et al. (2022). When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13688. Springer, Cham. https://doi.org/10.1007/978-3-031-19815-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19815-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19814-4

  • Online ISBN: 978-3-031-19815-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics