Abstract
Short answer scoring (SAS) is the task of grading short text written by a learner. In recent years, deep-learning-based approaches have substantially improved the performance of SAS models, but how to guarantee high-quality predictions still remains a critical issue when applying such models to the education field. Towards guaranteeing high-quality predictions, we present the first study of exploring the use of human-in-the-loop framework for minimizing the grading cost while guaranteeing the grading quality by allowing a SAS model to share the grading task with a human grader. Specifically, by introducing a confidence estimation method for indicating the reliability of the model predictions, one can guarantee the scoring quality by utilizing only predictions with high reliability for the scoring results and casting predictions with low reliability to human graders. In our experiments, we investigate the feasibility of the proposed framework using multiple confidence estimation methods and multiple SAS datasets. We find that our human-in-the-loop framework allows automatic scoring models and human graders to achieve the target scoring quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We assume that the acceptable scoring error is determined by test administrators.
- 3.
- 4.
- 5.
- 6.
We used pretrained BERTs from https://huggingface.co/bert-base-uncased for English and https://github.com/cl-tohoku/bert-japanese for Japanese.
- 7.
- 8.
Posterior is used because there is no significant difference in performance among the three methods and the most widely used way to estimate confidence score is posterior.
References
Attali, Y., Burstein, J.: Automated essay scoring with e-rater®v.2. J. Technol. Learn. Assess. 4(3) (2006)
Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186, June 2019. https://doi.org/10.18653/v1/N19-1423
Ding, Y., Riordan, B., Horbach, A., Cahill, A., Zesch, T.: Don’t take “nswvtnvakgxpm" for an answer -the surprising vulnerability of automatic content scoring systems to adversarial input. In: COLING, pp. 882–892. International Committee on Computational Linguistics, December 2020. https://doi.org/10.18653/v1/2020.coling-main.76
Funayama, H., et al.: Preventing critical scoring errors in short answer scoring with confidence estimation. In: ACL-SRW, pp. 237–243. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-srw.32
Gardner, J.R., Pleiss, G., Bindel, D., Weinberger, K.Q., Wilson, A.G.: Gpytorch: Blackbox matrix-matrix gaussian process inference with GPU acceleration (2021)
Herwanto, G., Sari, Y., Prastowo, B., Riasetiawan, M., Bustoni, I.A., Hidayatulloh, I.: Ukara: a fast and simple automatic short answer scoring system for bahasa indonesia. In: ICEAP Proceeding Book, vol. 2, pp. 48–53, December 2018
Horbach, A., Palmer, A.: Investigating active learning for short-answer scoring. In: BEA. Association for Computational Linguistics, San Diego, June 2016
Jang, E.S., Kang, S., Noh, E.H., Kim, M.H., Sung, K.H., Seong, T.J.: Kass: korean automatic scoring system for short-answer questions. CSEDU 2014(2), 226–230 (2014)
Jiang, H., Kim, B., Guan, M.Y., Gupta, M.R.: To trust or not to trust a classifier. In: NIPS, pp. 5546–5557 (2018)
Johan Berggren, S., Rama, T., Øvrelid, L.: Regression or classification? automated essay scoring for Norwegian. In: BEA, pp. 92–102. Association for Computational Linguistics, Florence, August 2019. https://doi.org/10.18653/v1/W19-4409
Krishnamurthy, S., Gayakwad, E., Kailasanathan, N.: Deep learning for short answer scoring. Int. J. Rec.Technol. Eng. 7, 1712–1715 (2019)
Kumar, Y., Aggarwal, S., Mahata, D., Shah, R.R., Kumaraguru, P., Zimmermann, R.: Get it scored using autosas - an automated system for scoring short answers. In: AAAI/IAAI/EAAI. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33019662
Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Human. 37(4), 389–405 (2003). https://doi.org/10.1023/A:1025779619903
Mizumoto, T., et al.: Analytic score prediction and justification identification in automated short answer scoring. In: BEA, pp. 316–325 (2019). https://doi.org/10.18653/v1/W19-4433
Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4
Riordan, B., Horbach, A., Cahill, A., Zesch, T., Lee, C.M.: Investigating neural architectures for short answer scoring. In: BEA, pp. 159–168 (2017). https://doi.org/10.18653/v1/W17-5017
Sychev, O., Anikin, A., Prokudin, A.: Automatic grading and hinting in open-ended text questions. Cogn. Syst. Res. 59, 264–272 (2020)
Wang, T., Inoue, N., Ouchi, H., Mizumoto, T., Inui, K.: Inject rubrics into short answer grading system. In: DeepLo, pp. 175–182, November 2019. https://doi.org/10.18653/v1/D19-6119
Williamson, D.M., Xi, X., Breyer, F.J.: A framework for evaluation and use of automated scoring. Educ. Meas. Issues Pract. 31(1), 2–13 (2012). https://doi.org/10.1111/j.1745-3992.2011.00223.x
Woods, B., Adamson, D., Miel, S., Mayfield, E.: Formative essay feedback using predictive scoring models. In: KDD 2017, pp. 2071–2080. Association for Computing Machinery (2017). https://doi.org/10.1145/3097983.3098160
Acknowledgments
This work was supported by JSPS KAKENHI Grant Number JP22H00524, JP19K12112, and JP21H04901.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Funayama, H., Sato, T., Matsubayashi, Y., Mizumoto, T., Suzuki, J., Inui, K. (2022). Balancing Cost and Quality: An Exploration of Human-in-the-Loop Frameworks for Automated Short Answer Scoring. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-031-11644-5_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11643-8
Online ISBN: 978-3-031-11644-5
eBook Packages: Computer ScienceComputer Science (R0)