Advertisement

Automated Grading of Short Text Answers: Preliminary Results in a Course of Health Informatics

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11841)

Abstract

Students learning Health Informatics in the degree course of Medicine and Surgery of the University of L’Aquila (Italy) are required – to pass the exam – to submit solutions to assignments concerning the execution and interpretation of statistical analyses. The paper presents a tool for the automated grading of such a kind of solutions, where the statistical analyses are made up R commands and outputs, and the interpretations are short text answers. The tool performs a static analysis of the R commands with the respective output, and uses Natural Language Processing techniques for the short text answers. The paper summarises the solution regarding the R commands and output, and delves into the method and the results used for the automated classification of the short text answers. In particular, we show that through FastText sentence embeddings and a tuned Support Vector Machines classifier, we obtained an accuracy of 0.89, Cohen’s K = 0.76, and F1 score of 0.91 on a binary classification task (i.e. pass or fail). Other experiments including additional linguistically-motivated features, whose goal was to capture lexical differences between the students’ answer and the gold standard sentence, did not yield any significant improvement. The paper ends with a discussion of the findings and the next steps to be taken in our research.

Keywords

Automated grading Short text answers NLP SVM 

References

  1. 1.
    Angelone, A.M., Menini, S., Tonelli, S., Vittorini, P.: Dataset: short sentences on R analyses in a health informatics subject, June 2019.  https://doi.org/10.5281/ZENODO.3257363
  2. 2.
    Angelone, A.M., Vittorini, P.: The automated grading of R code snippets: preliminary results in a course of health informatics. In: Gennari, R., et al. (eds.) MIS4TEL 2019. AISC, vol. 1007, pp. 19–27. Springer, Cham (2020).  https://doi.org/10.1007/978-3-030-23990-9_3CrossRefGoogle Scholar
  3. 3.
    Aprosio, A.P., Moretti, G.: Tint 2.0: an all-inclusive suite for NLP in Italian. In: Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Torino, Italy, 10–12 December 2018 (2018). http://ceur-ws.org/Vol-2253/paper58.pdf
  4. 4.
    Bernardi, A., et al.: On the design and development of an assessment system with adaptive capabilities. In: Di Mascio, T., et al. (eds.) MIS4TEL 2018. AISC, vol. 804, pp. 190–199. Springer, Cham (2019).  https://doi.org/10.1007/978-3-319-98872-6_23CrossRefGoogle Scholar
  5. 5.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017).  https://doi.org/10.1162/tacl_a_00051, https://www.aclweb.org/anthology/Q17-1010
  6. 6.
    Bowman, S.R., Angeli, G., Potts, C., Manning, C.D.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 632–642. Association for Computational Linguistics, Lisbon, September 2015.  https://doi.org/10.18653/v1/D15-1075, https://www.aclweb.org/anthology/D15-1075
  7. 7.
    Burrows, S., Gurevych, I., Stein, B.: The eras and trends of automatic short answer grading. Int. J. Artif. Intell. Educ. 25(1), 60–117 (2015)CrossRefGoogle Scholar
  8. 8.
    Cer, D., et al.: Universal sentence encoder. In: Submission to: EMNLP Demonstration, Brussels, Belgium (2018). https://arxiv.org/abs/1803.11175
  9. 9.
    Cicchetti, D.V.: Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol. Assess. 6(4), 284–290 (1994)CrossRefGoogle Scholar
  10. 10.
    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, June 2019. https://www.aclweb.org/anthology/N19-1423
  11. 11.
    Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013).  https://doi.org/10.5120/11638-7118CrossRefGoogle Scholar
  12. 12.
    Harlen, W., James, M.: Assessment and learning: differences and relationships between formative and summative assessment. Assess. Educ.: Principles Policy Pract. 4(3), 365–379 (1997).  https://doi.org/10.1080/0969594970040304CrossRefGoogle Scholar
  13. 13.
    Hsu, C.W., Chang, C.C., Lin, C.J.: A practical guide to support vector classification. Technical report, National Taiwan University (2016)Google Scholar
  14. 14.
    Kiros, J., Chan, W.: InferLite: simple universal sentence representations from natural language inference data. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018, pp. 4868–4874 (2018). https://aclanthology.info/papers/D18-1524/d18-1524
  15. 15.
    Kuhn, M.: Building predictive models in R using the caret package. J. Stat. Softw. 28(5), 1–26 (2008).  https://doi.org/10.18637/jss.v028.i05CrossRefGoogle Scholar
  16. 16.
    Kusner, M., Sun, Y., Kolkin, N., Weinberger, K.: From word embeddings to document distances. In: International Conference on Machine Learning, pp. 957–966 (2015)Google Scholar
  17. 17.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, vol. 10, p. 707 (1966)Google Scholar
  18. 18.
    Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien (2019). https://CRAN.R-project.org/package=e1071. Accessed July 2019
  19. 19.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)Google Scholar
  20. 20.
    Mohler, M., Bunescu, R., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT 2011, pp. 752–762. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2002472.2002568
  21. 21.
    Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP (2014)Google Scholar
  22. 22.
    Peters, M.E., et al.: Deep contextualized word representations. In: Walker, M.A., Ji, H., Stent, A. (eds.) NAACL-HLT, pp. 2227–2237. Association for Computational Linguistics (2018). http://dblp.uni-trier.de/db/conf/naacl/naacl2018-1.html#PetersNIGCLZ18
  23. 23.
    R Core Team: R: A Language and Environment for Statistical Computing (2018). https://www.R-project.org/
  24. 24.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  25. 25.
    Souza, D.M., Felizardo, K.R., Barbosa, E.F.: A systematic literature review of assessment tools for programming assignments. In: 2016 IEEE 29th International Conference on Software Engineering Education and Training (CSEET), pp. 147–156. IEEE, April 2016.  https://doi.org/10.1109/CSEET.2016.48
  26. 26.
    Urbanek, S.: rJava: Low-Level R to Java Interface, R package version 0.9-11 (2019). https://CRAN.R-project.org/package=rJava. Accessed July 2019

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.DISIMUniversity of L’AquilaL’AquilaItaly
  2. 2.FBK-DHPovoItaly
  3. 3.MESVAUniversity of L’AquilaL’AquilaItaly

Personalised recommendations