Automatic Chinese Reading Comprehension Grading by LSTM with Knowledge Adaptation

  • Yuwei Huang
  • Xi Yang
  • Fuzhen Zhuang
  • Lishan Zhang
  • Shengquan Yu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10937)


Owing to the subjectivity of graders and the complexity of assessment standard, grading is a tough problem in the field of education. This paper presents an algorithm for automatic grading of open-ended Chinese reading comprehension questions. Due to the high complexity of feature engineering and the lack of consideration for word order in frequency based word embedding models, we utilize long-short term memory recurrent neural network to extract semantic feature in student answers automatically. In addition, we also try to impose the knowledge adaptation from web corpus to student answers, and represent the students’ responses to vectors which are fed into the memory network. Along this line, the workload of teacher and the subjectivity in reading comprehension grading can both be reduced obviously. What’s more, the automatic grading methods for Chinese reading comprehension will be more thorough. The experimental results on five Chinese and two English data sets demonstrate the superior performance over compared baselines.


Automatic grading Knowledge adaptation Reading comprehension LSTM Text classification 



This work is supported by the National Natural Science Foundation of China (No. 61773361, 61473273), the Youth Innovation Promotion Association CAS 2017146, the China Postdoctoral Science Foundation (No. 2017M610054).


  1. 1.
    Bailey, S., Meurers, D.: Diagnosing meaning errors in short answers to reading comprehension questions. In: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pp. 107–115. Association for Computational Linguistics (2008)Google Scholar
  2. 2.
    Basu, S., Jacobs, C., Vanderwende, L.: Powergrading: a clustering approach to amplify human effort for short answer grading. Trans. Assoc. Comput. Linguist. 1, 391–402 (2013)Google Scholar
  3. 3.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(6), 1137–1155 (2003)zbMATHGoogle Scholar
  4. 4.
    Boer, P.T.D., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Chollet, F., et al.: Keras (2015).
  6. 6.
    Dunne, R.A., Campbell, N.A.: On the pairing of the softmax activation and cross-entropy penalty functions and the derivation of the softmax activation function. In: Proceedings of the 8th Australian Conference on the Neural Networks, Melbourne, vol. 185, p. 181 (1997)Google Scholar
  7. 7.
    Dzikovska, M.O., Nielsen, R.D., Brew, C.: Towards effective tutorial feedback for explanation questions: a dataset and baselines. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 200–210. Association for Computational Linguistics (2012)Google Scholar
  8. 8.
    Graves, A., Mohamed, A.r., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)Google Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Horbach, A., Palmer, A., Pinkal, M.: Using the text to evaluate short answers for reading comprehension exercises. In: * SEM@ NAACL-HLT, pp. 286–295 (2013)Google Scholar
  11. 11.
    Hou, W.-J., Tsao, J.-H., Li, S.-Y., Chen, L.: Automatic assessment of students’ free-text answers with support vector machines. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS (LNAI), vol. 6096, pp. 235–243. Springer, Heidelberg (2010). Scholar
  12. 12.
    Jimenez, S., Becerra, C.J., Gelbukh, A.F., Bátiz, A.J.D., Mendizábal, A.: Softcardinality: hierarchical text overlap for student response analysis. In: SemEval@ NAACL-HLT, pp. 280–284 (2013)Google Scholar
  13. 13.
    Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 60(5), 493–502 (2013)CrossRefGoogle Scholar
  14. 14.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  15. 15.
    Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. AAAI 333, 2267–2273 (2015)Google Scholar
  16. 16.
    Lui, A.K.-F., Lee, L.-K., Lau, H.-W.: Automated grading of short literal comprehension questions. In: Lam, J., Ng, K.K., Cheung, S.K.S., Wong, T.L., Li, K.C., Wang, F.L. (eds.) ICTE 2015. CCIS, vol. 559, pp. 251–262. Springer, Heidelberg (2015). Scholar
  17. 17.
    Madnani, N., Burstein, J., Sabatini, J., O’Reilly, T.: Automated scoring of a summary-writing task designed to measure reading comprehension. In: BEA@ NAACL-HLT, pp. 163–168 (2013)Google Scholar
  18. 18.
    Meurers, D., Ziai, R., Ott, N., Bailey, S.M.: Integrating parallel analysis modules to evaluate the meaning of answers to reading comprehension questions. Int. J. Contin. Eng. Educ. Life Long Learn. 21(4), 355–369 (2011)CrossRefGoogle Scholar
  19. 19.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  20. 20.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  21. 21.
    Mnih, A., Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine Learning, pp. 641–648. ACM (2007)Google Scholar
  22. 22.
    Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, Malta, May 2010Google Scholar
  23. 23.
    Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: EMNLP, pp. 1422–1432 (2015)Google Scholar
  24. 24.
    Wang, H.C., Chang, C.Y., Li, T.Y.: Assessing creative problem-solving with automated text grading. Comput. Educ. 51(4), 1450–1466 (2008)CrossRefGoogle Scholar
  25. 25.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)Google Scholar
  26. 26.
    Zesch, T., Levy, O., Gurevych, I., Dagan, I.: UKP-BIU: similarity and entailment metrics for student response analysis, Atlanta, Georgia, USA, p. 285 (2013)Google Scholar
  27. 27.
    Zhang, Y., Shah, R., Chi, M.: Deep learning + student modeling + clustering: a recipe for effective automatic short answer grading. In: EDM, pp. 562–567 (2016)Google Scholar
  28. 28.
    Zhila, A., Yih, W.t., Meek, C., Zweig, G., Mikolov, T.: Combining heterogeneous models for measuring relational similarity. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1000–1009 (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Beijing University of Chemical TechnologyBeijingChina
  2. 2.Beijing Advanced Innovation Center for Future EducationBeijing Normal UniversityBeijingChina
  3. 3.Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology, CASBeijingChina
  4. 4.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations