Skip to main content

Abstract

Massive open online courses and other online study opportunities are providing easier access to education for more and more people around the world. To cope with the large number of exams to be assessed in these courses, AI-driven automatic short answer grading can recommend teaching staff to assign points when evaluating free text answers, leading to faster and fairer grading. But what would be the best way to work with the AI? In this paper, we investigate and evaluate different methods for explainability in automatic short answer grading. Our survey of over 70 professors, lecturers and teachers with grading experience showed that displaying the predicted points together with matches between student answer and model answer is rated better than the other tested explainable AI (XAI) methods in the aspects trust, informative content, speed, consistency and fairness, fun, comprehensibility, applicability, use in exam preparation, and in general.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Also called sample answer or sample response in literature.

  2. 2.

    https://docs.google.com/forms.

References

  1. United Nations: Sustainable development goals: 17 goals to transform our world (2021). https://www.un.org/sustainabledevelopment/sustainable-development-goals

  2. Correia, A.P., Liu, C., Xu, F.: Evaluating videoconferencing systems for the quality of the educational experience. Distance Educ. 41(4), 429–452 (2020). https://doi.org/10.1080/01587919.2020.1821607

  3. Koravuna, S., Surepally, U.K.: Educational gamification and artificial intelligence for promoting digital literacy. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  4. Chen, L., Chen, P., Lin, Z.: Artificial intelligence in education: A review. IEEE Access 8, 75264–75278 (2020). https://doi.org/10.1109/ACCESS.2020.2988510

  5. Heffernan, N.T., Heffernan, C.L.: The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. Int. J. Artif. Intell. Educ. 24(4), 470–497 (2014). https://doi.org/10.1007/s40593-014-0024-x

    Article  MathSciNet  Google Scholar 

  6. Libbrecht, P., Declerck, T., Schlippe, T., Mandl, T., Schiffner, D.: NLP for student and teacher: Concept for an AI based information literacy tutoring system. In: The 29th ACM International Conference on Information and Knowledge Management (CIKM2020). Galway, Ireland (2020)

    Google Scholar 

  7. Schlippe, T., Sawatzki, J.: Cross-lingual automatic short answer grading. In: Proceedings of the 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021)

    Google Scholar 

  8. Adadi, A., Berrada, M.: Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052(2018)

    Article  Google Scholar 

  9. Ng, A.: Machine learning yearning. Online draft. https://github.com/ajaymache/machine-learning-yearning (2017)

  10. Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning (2017). arXiv:1702.08608

  11. Hansen, L.K., Rieger, L.: Interpretability in intelligent systems—a new concept? In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 41–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_3

    Chapter  Google Scholar 

  12. Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., Rinzivillo, S.: Benchmarking and survey of explanation methods for black box models (2021). arXiv:2102.13076

  13. Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning interpretability: A survey on methods and metrics. Electronics 8(8) (2019). https://doi.org/10.3390/electronics8080832

  14. Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., Sen, P.: A survey of the state of explainable AI for natural language processing (2020). arXiv:2010.00711

  15. Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.R.: Explaining deep neural networks and beyond: A review of methods and applications. Proc. IEEE 109(3), 247–278 (2021). https://doi.org/10.1109/JPROC.2021.3060483

  16. Rudin, C., Radin, J.: Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition. Harvard Data Science Issue 1.2 (2019). https://doi.org/10.1162/99608f92.5a8a3a3d

  17. Sawatzki, J., Schlippe, T., Benner-Wickner, M.: Deep learning techniques for automatic short answer grading: Predicting scores for English and German answers. In: Proceedings of The 2nd International Conference on Artificial Intelligence in Education Technology (AIET 2021). Wuhan, China (2021)

    Google Scholar 

  18. Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2014). https://doi.org/10.1007/s40593-014-0026-8

    Article  Google Scholar 

  19. Camus, L., Filighera, A.: Investigating transformers for automatic short answer grading. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12164, pp. 43–48. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_8

    Chapter  Google Scholar 

  20. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A robustly optimized BERT pretraining approach. CoRR (2019). arXiv:1907.11692

  21. Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual BERT? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, pp. 4996–5001 (2019). https://doi.org/10.18653/v1/P19-1493

  22. Poulton, A., Eliens, S.: Explaining transformer-based models for automatic short answer grading. In: Proceedings of the 5th International Conference on Digital Technology in Education (ICDTE 2021). Association for Computing Machinery, New York, NY, USA, pp. 110–116 (2021). https://doi.org/10.1145/3488466.3488479

  23. van der Waa, J., Schoonderwoerd, T., van Diggelen, J., Neerincx, M.: Interpretable confidence measures for decision support systems. Int. J. Hum.-Comput. Stud. 144 (2020). https://doi.org/10.1016/j.ijhcs.2020.102493

  24. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, pp. 1135–1144 (2016). https://doi.org/10.1145/2939672.2939778

  25. Kim, B., Wattenberg, M., Gilmer, J., Cai, C.J., Wexler, J., Viégas, F., Sayres, R.: Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In: ICML 2018

    Google Scholar 

  26. Hanna, R.N., Linden, L.L.: Discrimination in grading. Am. Econ. J. Econ. Policy 4(4), 146–168 (2012). http://www.jstor.org/stable/23358248

  27. Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, USA, pp. 752–762 (2011)

    Google Scholar 

  28. Schlippe, T., Sawatzki, J.: AI-based multilingual interactive exam preparation. In: Guralnick, D., Auer, M.E., Poce, A. (eds.) TLIC 2021. LNNS, vol. 349, pp. 396–408. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-90677-1_38

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tim Schlippe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schlippe, T., Stierstorfer, Q., Koppel, M.t., Libbrecht, P. (2023). Explainability in Automatic Short Answer Grading. In: Cheng, E.C.K., Wang, T., Schlippe, T., Beligiannis, G.N. (eds) Artificial Intelligence in Education Technologies: New Development and Innovative Practices. AIET 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 154. Springer, Singapore. https://doi.org/10.1007/978-981-19-8040-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8040-4_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8039-8

  • Online ISBN: 978-981-19-8040-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics