Skip to main content

Deep Learning Techniques for Automatic Short Answer Grading: Predicting Scores for English and German Answers

  • 172 Accesses

Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT,volume 104)


We investigate and compare state-of-the-art deep learning techniques for Automatic Short Answer Grading. Our experiments demonstrate that systems based on the Bidirectional Encoder Representations from Transformers (BERT) [1] performed best for English and German. Our system achieves a Pearson correlation coefficient of 0.73 and a Mean Absolute Error of 0.4 points on the Short Answer Grading data set of the University of North Texas [2]. On our German data set we report a Pearson correlation coefficient of 0.78 and a Mean Absolute Error of 1.2 points. Our approach has the potential to greatly simplify the life of proofreaders and to be used for learning systems that prepare students for exams: 31% of the student answers are correctly graded and in 40% the system deviates on average by only 1 point out of 6, 8 and 10 points.


  • Automatic short answer grading
  • Artificial intelligence in education
  • Natural language processing
  • Deep learning

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-981-16-7527-0_5
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   129.00
Price excludes VAT (USA)
  • ISBN: 978-981-16-7527-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   169.99
Price excludes VAT (USA)


  1. 1.

  2. 2.

    Questions and answers were modified for the German data set due to confidentiality.

  3. 3.

  4. 4.

  5. 5.

  6. 6.


  1. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186

    Google Scholar 

  2. Mohler M, Bunescu R, Mihalcea R (2011) Learning to Grade Short Answer Questions Using Semantic Similarity Measures and Dependency Graph Alignments. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, pp 752–762

    Google Scholar 

  3. Libbrecht P, Declerck T, Schlippe T, Mandl T, Schiffner D (2020) NLP for Student and Teacher: Concept for an AI based Information Literacy Tutoring System. In: The 29th ACM International Conference on Information and Knowledge Management (CIKM2020), Galway, Ireland

    Google Scholar 

  4. Burrows S, Gurevych I, Stein B (2015) The Eras and Trends of Aautomatic Short Answer Grading. Int J Artif Intell Educ 25(1):60–117

    CrossRef  Google Scholar 

  5. Süzen N, Gorban AN, Levesley J, Mirkes EM (2020) Automatic Short Answer Grading and Feedback Using Text Mining Methods. Proedia Comput Sci 169:726–743

    CrossRef  Google Scholar 

  6. Zehner F (2016) Automatic Processing of Text Responses in Large-scale Assessments. Ph.D. thesis, TU München

    Google Scholar 

  7. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient Estimation of Word Representations in Vector Space. In: 1st International Conference on Learning Representations, ICLR 2013, Workshop Track Proceedings, Scottsdale, Arizona, USA

    Google Scholar 

  8. Gomaa WH, Fahmy AA (2019) Ans2vec: A Scoring System for Short Answers. In: The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2019). Springer International Publishing, Cham, pp 586–595

    Google Scholar 

  9. Kiros R, Zhu Y, Salakhutdinov R, Zemel RS, Torralba A, Urtasun R, Fidler S (2015) Skip-Thought Vectors. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, vol 2. MIT Press, Cambridge, MA, USA, pp 3294–3302

    Google Scholar 

  10. Dzikovska M, Nielsen R, Brew C, Leacock C, Giampiccolo D, Bentivogli L, Clark P, Dagan I, Dang HT (2013) SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol 2, pp 263–274. Association for Computational Linguistics, Atlanta, Georgia, USA

    Google Scholar 

  11. Krishnamurthy S, Gayakwad E, Kailasanathan N (2019) Deep Learning for Short Answer Scoring. Int J Recent Technol Eng 7:1712–1715

    Google Scholar 

  12. Sung C, Dhamecha T, Mukhi N (2019) Improving Short Answer Grading using Transformer-Based Pre-Training. Artif Intell Educ 469–481

    Google Scholar 

  13. Camus L, Filighera A (2020) Investigating Transformers for Automatic Short Answer Grading. Artif Intell Educ 12164:43–48

    CrossRef  Google Scholar 

  14. Meurers D, Ziai R, Ott N, Kopp J (2011) Evaluating Answers to Reading Comprehension Questions in Context: Results for German and the Role of Information Structure. In: Proceedings of the TextInfer 2011 Workshop on Textual Entailment. Association for Computational Linguistics, Edinburgh, Scottland, UK, pp 1–9

    Google Scholar 

  15. Pado U, Kiefer C (2015) Short Answer Grading: When Sorting Helps and when It Doesn’t. In: Proceedings of the 4th Workshop on NLP for Computer Assisted Language Learning, NODALIDA 2015, Linköping Electronic Conference Proceedings. LiU Electronic Press and ACL Anthology, Wilna, pp 42–50

    Google Scholar 

  16. Pearson K (1895) Note on Regression and Inheritance in the Case of Two Parents. Proc R Soc Lond 58:240–242

    CrossRef  Google Scholar 

  17. Yang Y, Cer D, Ahmad A, Guo M, Law J, Constant N, Abrego GH, Yuan S, Tar C, Sung Y-H, Strope B, Kurzweil R (2019) Multilingual Universal Sentence Encoder for Semantic Retrieval. arXiv:1907.04307

  18. Evgeniou T, Pontil M (2001) Support Vector Machines: Theory and Applications. Mach Learn Appl: Adv Lect 2049:249–257

    MATH  Google Scholar 

  19. Wölfel M (2021) Towards the Automatic Generation of Pedagogical Conversational Agents from Lecture Slides. In: 3rd EAI International Conference on Multimedia Technology and Enhanced Learning (EAI ICMTEL 2021). Cyberspace

    Google Scholar 

  20. Schlippe T, Sawatzki J (2021) AI-based Multilingual Interactive Exam Preparation. In: The Learning Ideas Conference 2021 (14th annual conference). ALICE—Special Conference Track on Adaptive Learning via Interactive, Collaborative and Emotional Approaches. New York, New York, USA

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Tim Schlippe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Sawatzki, J., Schlippe, T., Benner-Wickner, M. (2022). Deep Learning Techniques for Automatic Short Answer Grading: Predicting Scores for English and German Answers. In: Cheng, E.C.K., Koul, R.B., Wang, T., Yu, X. (eds) Artificial Intelligence in Education: Emerging Technologies, Models and Applications. Lecture Notes on Data Engineering and Communications Technologies, vol 104. Springer, Singapore.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-7526-3

  • Online ISBN: 978-981-16-7527-0

  • eBook Packages: EngineeringEngineering (R0)