Abstract
Automated grading of free-text exam responses is a very challenging task due to the complex nature of the problem, such as lack of training data and biased ground-truth of the graders. In this paper, we focus on the automated grading of free-text responses. We formulate the problem as a binary classification problem of two class labels: low- and high-grade. We present a benchmark on four machine learning methods using three experiment protocols on two real-world datasets, one from Cyber-crime exams in Arabic and one from Data Mining exams in English that is presented first time in this work. By providing various metrics for binary classification and answer ranking, we illustrate the benefits and drawbacks of the benchmarked methods. Our results suggest that standard models with individual word representations can in some cases achieve competitive predictive performance against deep neural language models using context-based representations on both binary classification and answer ranking for free-text response grading tasks. Lastly, we discuss the pedagogical implications of our findings by identifying potential pitfalls and challenges when building predictive models for such tasks.
Supported by the AutoGrade project of Stockholm University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use TfidfVectorizer for feature extraction with all parameters set to default.
- 2.
- 3.
The size of flatten restricts us from running the models for more repetitions.
- 4.
References
Anderson, R.C., Biddle, W.B.: On asking people questions about what they are reading. In: Psychology of Learning and Motivation, vol. 9, pp. 89–132. Elsevier (1975)
Basu, S., Jacobs, C., Vanderwende, L.: Powergrading: a clustering approach to amplify human effort for short answer grading. TACL 1, 391–402 (2013)
Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. IJAIED 25(1), 60–117 (2014). https://doi.org/10.1007/s40593-014-0026-8
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: ACL, pp. 8440–8451 (2020)
Horbach, A., Pinkal, M.: Semi-supervised clustering for short answer scoring. In: International Conference on Language Resources and Evaluation (2018)
Karpicke, J., Roediger, H.: The critical importance of retrieval for learning. Science 319, 966–968 (2008)
Kim, S.W., Gil, J.M.: Research paper classification systems based on TF-IDF and LDA schemes. Hum. Cent. Comput. Inf. Sci. 9, 30 (2019). https://doi.org/10.1186/s13673-019-0192-7
Kudo, T.: Subword regularization: improving neural network translation models with multiple subword candidates (2018)
Kudo, T., Richardson, J.: SentencePiece: a simple and language independent subword tokenizer and detokenizer for neural text processing (2018)
Kumar, S., Chakrabarti, S., Roy, S.: Earth mover’s distance pooling over Siamese LSTMs for automatic short answer grading. In: IJCAI, pp. 2046–2052 (2017)
Lample, G., Conneau, A.: Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291 (2019)
Leacock, C., Chodorow, M.: C-rater: Automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
McDaniel, M., Anderson, J.L., Derbish, M.H., Morrisette, N.: Testing the testing effect in the classroom. Eur. J. Cogn. Psychol. 19, 494–513 (2007)
Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: ACL, pp. 752–762 (2011)
Nandini, V., Uma Maheswari, P.: Automatic assessment of descriptive answers in online examination system using semantic relational features. J. Supercomput. 76(6), 4430–4448 (2018). https://doi.org/10.1007/s11227-018-2381-y
Ouahrani, L., Bennouar, D.: AR-ASAG an Arabic dataset for automatic short answer grading evaluation. In: LREC, pp. 2634–2643 (2020)
Padó, U.: Get semantic with me! The usefulness of different feature types for short-answer grading. In: COLING, pp. 2186–2195 (2016)
Pavlopoulos, J., Malakasiotis, P., Androutsopoulos, I.: Deep learning for user comment moderation. In: WOAH, pp. 25–35. ACL (2017)
Pedersen, T., Patwardhan, S., Michelizzi, J., et al.: WordNet: similarity-measuring the relatedness of concepts. In: AAAI, vol. 4, pp. 25–29 (2004)
Picard, R.R., Cook, R.D.: Cross-validation of regression models. J. Am. Stat. Assoc. 79, 575–583 (1984)
Rodrigues, F., Oliveira, P.: A system for formative assessment and monitoring of students’ progress. Comput. Educ. 76, 30–41 (2014)
Saha, S., Dhamecha, T.I., Marvaniya, S., Sindhgatta, R., Sengupta, B.: Sentence level or token level features for automatic short answer grading?: use both. In: AIED, pp. 503–517 (2018)
Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: AIED, pp. 469–481 (2019)
Süzen, N., Gorban, A.N., Levesley, J., Mirkes, E.M.: Automatic short answer grading and feedback using text mining methods. Procedia Comput. Sci. 169, 726–743 (2020)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS, pp. 6000–6010 (2017)
Williamson, D., Xi, X., Breyer, F.: A framework for evaluation and use of automated scoring. Educa. Meas. Issues Pract. 31, 2–13 (2012)
Willis, A.: Using NLP to support scalable assessment of short free text responses. In: BEA, pp. 243–253 (2015)
Acknowledgements
This work was supported by the AutoGrade project (https://datascience.dsv.su.se/projects/autograding.html) of the Dept. of Computer and Systems Sciences at Stockholm University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ljungman, J. et al. (2021). Automated Grading of Exam Responses: An Extensive Classification Benchmark. In: Soares, C., Torgo, L. (eds) Discovery Science. DS 2021. Lecture Notes in Computer Science(), vol 12986. Springer, Cham. https://doi.org/10.1007/978-3-030-88942-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-88942-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88941-8
Online ISBN: 978-3-030-88942-5
eBook Packages: Computer ScienceComputer Science (R0)