Skip to main content

Balancing Cost and Quality: An Exploration of Human-in-the-Loop Frameworks for Automated Short Answer Scoring

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13355))

Included in the following conference series:

Abstract

Short answer scoring (SAS) is the task of grading short text written by a learner. In recent years, deep-learning-based approaches have substantially improved the performance of SAS models, but how to guarantee high-quality predictions still remains a critical issue when applying such models to the education field. Towards guaranteeing high-quality predictions, we present the first study of exploring the use of human-in-the-loop framework for minimizing the grading cost while guaranteeing the grading quality by allowing a SAS model to share the grading task with a human grader. Specifically, by introducing a confidence estimation method for indicating the reliability of the model predictions, one can guarantee the scoring quality by utilizing only predictions with high reliability for the scoring results and casting predictions with low reliability to human graders. In our experiments, we investigate the feasibility of the proposed framework using multiple confidence estimation methods and multiple SAS datasets. We find that our human-in-the-loop framework allows automatic scoring models and human graders to achieve the target scoring quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/hiro819/HITL_framework_for_ASAS.

  2. 2.

    We assume that the acceptable scoring error is determined by test administrators.

  3. 3.

    https://github.com/hiro819/HITL_framework_for_ASAS.

  4. 4.

    https://www.kaggle.com/c/asap-sas.

  5. 5.

    https://aip-nlu.gitlab.io/resources/sas-japanese.

  6. 6.

    We used pretrained BERTs from https://huggingface.co/bert-base-uncased for English and https://github.com/cl-tohoku/bert-japanese for Japanese.

  7. 7.

    Quadratic weghted kappa (QWK) of our model is 0.722 for the ASAP-SAS dataset, which is comparable to previous studies [12, 17].

  8. 8.

    Posterior is used because there is no significant difference in performance among the three methods and the most widely used way to estimate confidence score is posterior.

References

  1. Attali, Y., Burstein, J.: Automated essay scoring with e-rater®v.2. J. Technol. Learn. Assess. 4(3) (2006)

    Google Scholar 

  2. Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)

    Article  Google Scholar 

  3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT, pp. 4171–4186, June 2019. https://doi.org/10.18653/v1/N19-1423

  4. Ding, Y., Riordan, B., Horbach, A., Cahill, A., Zesch, T.: Don’t take “nswvtnvakgxpm" for an answer -the surprising vulnerability of automatic content scoring systems to adversarial input. In: COLING, pp. 882–892. International Committee on Computational Linguistics, December 2020. https://doi.org/10.18653/v1/2020.coling-main.76

  5. Funayama, H., et al.: Preventing critical scoring errors in short answer scoring with confidence estimation. In: ACL-SRW, pp. 237–243. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-srw.32

  6. Gardner, J.R., Pleiss, G., Bindel, D., Weinberger, K.Q., Wilson, A.G.: Gpytorch: Blackbox matrix-matrix gaussian process inference with GPU acceleration (2021)

    Google Scholar 

  7. Herwanto, G., Sari, Y., Prastowo, B., Riasetiawan, M., Bustoni, I.A., Hidayatulloh, I.: Ukara: a fast and simple automatic short answer scoring system for bahasa indonesia. In: ICEAP Proceeding Book, vol. 2, pp. 48–53, December 2018

    Google Scholar 

  8. Horbach, A., Palmer, A.: Investigating active learning for short-answer scoring. In: BEA. Association for Computational Linguistics, San Diego, June 2016

    Google Scholar 

  9. Jang, E.S., Kang, S., Noh, E.H., Kim, M.H., Sung, K.H., Seong, T.J.: Kass: korean automatic scoring system for short-answer questions. CSEDU 2014(2), 226–230 (2014)

    Google Scholar 

  10. Jiang, H., Kim, B., Guan, M.Y., Gupta, M.R.: To trust or not to trust a classifier. In: NIPS, pp. 5546–5557 (2018)

    Google Scholar 

  11. Johan Berggren, S., Rama, T., Øvrelid, L.: Regression or classification? automated essay scoring for Norwegian. In: BEA, pp. 92–102. Association for Computational Linguistics, Florence, August 2019. https://doi.org/10.18653/v1/W19-4409

  12. Krishnamurthy, S., Gayakwad, E., Kailasanathan, N.: Deep learning for short answer scoring. Int. J. Rec.Technol. Eng. 7, 1712–1715 (2019)

    Google Scholar 

  13. Kumar, Y., Aggarwal, S., Mahata, D., Shah, R.R., Kumaraguru, P., Zimmermann, R.: Get it scored using autosas - an automated system for scoring short answers. In: AAAI/IAAI/EAAI. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33019662

  14. Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Human. 37(4), 389–405 (2003). https://doi.org/10.1023/A:1025779619903

  15. Mizumoto, T., et al.: Analytic score prediction and justification identification in automated short answer scoring. In: BEA, pp. 316–325 (2019). https://doi.org/10.18653/v1/W19-4433

  16. Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4

    Chapter  Google Scholar 

  17. Riordan, B., Horbach, A., Cahill, A., Zesch, T., Lee, C.M.: Investigating neural architectures for short answer scoring. In: BEA, pp. 159–168 (2017). https://doi.org/10.18653/v1/W17-5017

  18. Sychev, O., Anikin, A., Prokudin, A.: Automatic grading and hinting in open-ended text questions. Cogn. Syst. Res. 59, 264–272 (2020)

    Article  Google Scholar 

  19. Wang, T., Inoue, N., Ouchi, H., Mizumoto, T., Inui, K.: Inject rubrics into short answer grading system. In: DeepLo, pp. 175–182, November 2019. https://doi.org/10.18653/v1/D19-6119

  20. Williamson, D.M., Xi, X., Breyer, F.J.: A framework for evaluation and use of automated scoring. Educ. Meas. Issues Pract. 31(1), 2–13 (2012). https://doi.org/10.1111/j.1745-3992.2011.00223.x

    Article  Google Scholar 

  21. Woods, B., Adamson, D., Miel, S., Mayfield, E.: Formative essay feedback using predictive scoring models. In: KDD 2017, pp. 2071–2080. Association for Computing Machinery (2017). https://doi.org/10.1145/3097983.3098160

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP22H00524, JP19K12112, and JP21H04901.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroaki Funayama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Funayama, H., Sato, T., Matsubayashi, Y., Mizumoto, T., Suzuki, J., Inui, K. (2022). Balancing Cost and Quality: An Exploration of Human-in-the-Loop Frameworks for Automated Short Answer Scoring. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2022. Lecture Notes in Computer Science, vol 13355. Springer, Cham. https://doi.org/10.1007/978-3-031-11644-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11644-5_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11643-8

  • Online ISBN: 978-3-031-11644-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics