Abstract
Automatic Math Word Problem (MWP) solving plays an important role in AI-tutoring, which aims to generate corresponding math expressions and results from a series of MWP. For the applicability of the MWP solving model, two aspects are considered to be optimized. Firstly, to address the weak linguistic representation of RNN which leads to the poor accuracy of MWP solution models, we propose to use Bidirectional Encoder Representation from Transformers (BERT) as an encoder and combine it with Transformer decoder to form a model framework. It is about 8% higher on the dataset Math23K compared to GTS, reaching 82.6%. However, pre-trained models tend to be large in size, which is not conducive to the deployment on the web server. A knowledge distillation strategy integrating teacher model’s evaluation is proposed. By enabling a model to patiently learn from and imitate the teacher through multi-layer distillation, the above BERT based model is compressed into a shallow structured student model. It achieves accuracy of 76.3% on Math23K, while the model weighs only 0.61 times that of the teacher’s model, and improves the prediction speed by 1.71 times.
This work is partially supported by the National Nature Science Foundation (Project No. 62177015.)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sundaram, S.S., Khemani, D.: Natural language processing for solving simple word problems. In: Proceedings of the 12th International Conference on Natural Language Processing (2015)
Zhou, L., Dai, S., Chen, L.: Learn to solve algebra word problems using quadratic programming. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)
Huang, D., et al.: Learning fine-grained expressions to solve math word problems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)
Shi, S., et al.: Automatically solving number word problems by semantic parsing and reasoning. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (2015)
Wang, Y., Liu, X., Shi, S.: Deep neural solver for math word problems. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017)
Wang, L., et al.: Translating a math word problem to an expression tree. arXiv preprint arXiv:1811.05632 (2018)
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Zhu, H., et al.: Learning to ask unanswerable questions for machine reading comprehension. arXiv preprint arXiv:1906.06045 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Devlin, J., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Liu, Y., Ott, M., Goyal, N., et al.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Liang, Z., et al.: Mwp-bert: A strong baseline for math word problems. arXiv preprint arXiv:2107.13435 (2021)
Liu, Q., et al.: Tree-structured decoding for solving math word problems. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (2019)
Chiang, T.R., Chen, Y.N.: Semantically-aligned equation generation for solving and reasoning math word problems. arXiv preprint arXiv:1811.00720 (2018)
Xie, Z., Shichao, S.: A goal-driven tree-structured neural model for math word problems. In: IJCAI (2019)
Zhang, J., et al.: Graph-to-tree learning for solving math word problems. In: Association for Computational Linguistics (2020)
Hong, Y., et al.: Learning by fixing: solving math word problems with weak supervision. In: AAAI Conference on Artificial Intelligence (2021)
Hinton, G., Oriol, V., Jeff, D.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Sun, S., et al.: Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355 (2019)
Jiao, X., et al.: Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351 (2019)
Han, S., Huizi, M., William, J.D.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
Lan, Z., et al.; Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vision 129(6), 1789–1819 (2021). https://doi.org/10.1007/s11263-021-01453-z
Yang, Z., et al.: Textbrewer: An open-source knowledge distillation toolkit for natural language processing. arXiv preprint arXiv:2002.12620 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Fan, W., Xiao, J., Cao, Y. (2023). A Framework for Math Word Problem Solving Based on Pre-training Models and Spatial Optimization Strategies. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2022. Communications in Computer and Information Science, vol 1682. Springer, Singapore. https://doi.org/10.1007/978-981-99-2385-4_37
Download citation
DOI: https://doi.org/10.1007/978-981-99-2385-4_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2384-7
Online ISBN: 978-981-99-2385-4
eBook Packages: Computer ScienceComputer Science (R0)