Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading

Ouahrani, Leila; Bennouar, Djamal

doi:10.1007/s40593-023-00391-w

Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading

ARTICLE
Published: 26 January 2024

(2024)
Cite this article

International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

301 Accesses
Explore all metrics

Abstract

We consider the reference-based approach for Automatic Short Answer Grading (ASAG) that involves scoring a textual constructed student answer comparing to a teacher-provided reference answer. The reference answer does not cover the variety of student answers as it contains only specific examples of correct answers. Considering other language variants of the reference answer can handle variability in student responses and improve scoring accuracy. Alternative reference answers may be possible, but manually creating them is expensive and time-consuming. In this paper, we consider two issues: First, we need to automatically generate various reference answers that can handle the diversity of student answers. Second, we should provide an accurate grading model that improves sentence similarity computation using multiple reference answers. Therefore, our proposed approach to solve both problems highlights two components. First, we provide a sequence-to-sequence deep learning model that targets generating plausible paraphrased reference answers conditioned on the provided reference answer. Secondly, we propose a supervised grading model based on sentence embedding features. The grading model enriches features to improve accuracy considering multiple reference answers. Experiments are conducted both in Arabic and English. They show that the paraphrase generator produces accurate paraphrases. Using multiple reference answers, the proposed grading model achieves a Root Mean Square Error of 0,6955, a Pearson correlation of 88,92% for the Arabic dataset, an RMSE of 0,7790, and a Pearson correlation of 73,50% for the English dataset. While fine-tuning pre-trained transformers on the English dataset provided state-of-the-art performance (RMSE: 0.7620), our approach yields comparable results. Simple to construct, load, and embed into the Learning Management System question engine with low computational complexity, the proposed approach can be easily integrated into the Learning Management System to support the assessment of short answers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigating Transformers for Automatic Short Answer Grading

Improving Short Answer Grading Using Transformer-Based Pre-training

Automatic Short Answer Grading via Multiway Attention Networks

Data Availability

The datasets used during the current study are available in:

• Al-Raisi Arabic Dataset: http://www.cs.cmu.edu/~fraisi/arabic/arparallel/

• The Quora English Dataset: https://github.com/jakartaresearch/quora-question-pairs

• AR-ASAG Dataset (2020): https://data.mendeley.com/datasets/dj95jh332j/1

• Mohler et al. (2011)Dataset: https://web.eecs.umich.edu/~mihalcea/downloads.html

Notes

References

Ab Aziz, M. J., Ahmad, F. D., Ghani, A. A. A., & Mahmod, R. (2009). Automated marking system for short answer examination (AMS-SAE). Undefined, 1, 47–51. https://doi.org/10.1109/ISIEA.2009.5356500
Article Google Scholar
Adams, O., Roy, S., & Krishnapuram, R. (2016). Distributed vector representations for unsupervised automatic short answer grading. In Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016) (pp. 20–29). https://aclanthology.org/W16-4904. Accessed 22 Feb 2022.
Agarwal, R., Khurana, V., Grover, K., Mohania, M., & Goyal, V. (2022). Multi-Relational Graph Transformer for Automatic Short Answer Grading. NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 2001–2012. https://doi.org/10.18653/v1/2022.naacl-main.146
Alkhatib, M., & Shaalan, K. (2018). Paraphrasing Arabic metaphor with neural machine translation. Procedia Computer Science, 142, 308–314. https://doi.org/10.1016/j.procs.2018.10.493
Article Google Scholar
Al-Raisi, F., Bourai, A., & Lin, W. (2018a). Neural symbolic arabic paraphrasing with automatic evaluation. Computer Science & Information Technology, 01–13. https://doi.org/10.5121/CSIT.2018.80601
Al-Raisi, F., Lin, W., & Bourai, A. (2018b). A monolingual parallel corpus of Arabic. Procedia Computer Science, 142, 334–338. https://doi.org/10.1016/J.PROCS.2018.10.487
Article Google Scholar
Ashton, H. S., Beevers, C. E., Milligan, C. D., Schofield, D. K., Thomas, R. C., & Youngson, M. A. (2005). Moving beyond objective testing in online assessment. In Online Assessment and Measurement: Case Studies from Higher Education, K-12 and Corporate (pp. 116–128). IGI Global. https://doi.org/10.4018/978-1-59140-497-2.ch008
Azad, S., Chen, B., Fowler, M., West, M., & Zilles, C. (2020). Strategies for deploying unreliable AI graders in high-transparency high-stakes exams. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12163 LNAI, 16–28. https://doi.org/10.1007/978-3-030-52237-7_2
Babych, B. (2014). Automated MT evaluation metrics and their limitations. Tradumàtica: Tecnologies de La Traducció, 12, 464. https://doi.org/10.5565/rev/tradumatica.70
Article Google Scholar
Bahdanau, D., Cho, K. H., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. https://arxiv.org/abs/1409.0473v7. Accessed 24 Feb 2022.
Beckman, K., Apps, T., Bennett, S., Dalgarno, B., Kennedy, G., & Lockyer, L. (2019). Self-regulation in open-ended online assignment tasks: The importance of initial task interpretation and goal setting. Studies in Higher Education. https://doi.org/10.1080/03075079.2019.1654450
Article Google Scholar
Bloom, B. S. (1984). Taxonomy of educational objectives book 1: Cognitive domain. In nancybroz.com. http://nancybroz.com/nancybroz/Literacy_I_files/BloomIntro.doc. Accessed 31 Aug 2021.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051
Article Google Scholar
Brown, S., & Glasner, A. (Eds.). (1999). Assessment matters in higher education: Choosing and using diverse approaches. https://eric.ed.gov/?id=ED434545. Accessed 24 Feb 2021.
Burrows, S., Gurevych, I., & Stein, B. (2015). The eras and trends of automatic short answer grading. In International Journal of Artificial Intelligence in Education (Vol. 25, Issue 1, pp. 60–117). Springer New York LLC. https://doi.org/10.1007/s40593-014-0026-8
Cahuantzi, R., Chen, X., & Güttel, S. (2021). A comparison of LSTM and GRU networks for learning symbolic sequences. http://eprints.maths.manchester.ac.uk/. Accessed 25 May 2023.
Carbonell, J., & Goldstein, J. (1998). Use of MMR, diversity-based reranking for reordering documents and producing summaries. SIGIR Forum (ACM Special Interest Group on Information Retrieval), 335–336. https://doi.org/10.1145/290941.291025
Carneiro, T., Da Nobrega, R. V. M., Nepomuceno, T., Bian, G. B., De Albuquerque, V. H. C., & Filho, P. P. R. (2018). Performance analysis of google colaboratory as a tool for accelerating deep learning applications. IEEE Access, 6, 61677–61685. https://doi.org/10.1109/ACCESS.2018.2874767
Article Google Scholar
Chaganty, A. T., Mussmann, S., & Liang, P. (2018). The price of debiasing automatic metrics in natural language evaluation. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 1, 643–653. https://doi.org/10.48550/arxiv.1807.02202
Chen, M., Tang, Q., Wiseman, S., & Gimpel, K. (2020). Controllable paraphrase generation with a syntactic exemplar. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 5972–5984. https://doi.org/10.18653/v1/p19-1599
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 1724–1734. https://doi.org/10.3115/v1/d14-1179
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. https://arxiv.org/abs/1412.3555v1. Accessed 20 Dec 2022.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference (vol. 1, pp. 4171–4186). https://github.com/tensorflow/tensor2tensor. Accessed 27 Sept 2022.
Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302. https://doi.org/10.2307/1932409
Article Google Scholar
Dzikovska, M., Steinhauser, N., Farrow, E., Moore, J., & Campbell, G. (2014). BEETLE II: Deep natural language understanding and automatic feedback generation for intelligent tutoring in basic electricity and electronics. International Journal of Artificial Intelligence in Education, 24(3), 284–332. https://doi.org/10.1007/s40593-014-0017-9
Article Google Scholar
Gaddipati, S. K., Nair, D., & Plöger, P. G. (2020). Comparative evaluation of pretrained transfer learning models on automatic short answer grading. https://arxiv.org/abs/2009.01303v1. Accessed 27 May 2023.
Gomaa, W. H., & Fahmy, A. A. (2020). Ans2vec: A scoring system for short answers. Advances in Intelligent Systems and Computing, 921, 586–595. https://doi.org/10.1007/978-3-030-14118-9_59
Article Google Scholar
Goyal, T., & Durrett, G. (2020). Neural Syntactic Preordering for Controlled Paraphrase Generation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 238–252. https://doi.org/10.18653/v1/2020.acl-main.22
Gupta, A., Agarwal, A., Singh, P., & Rai, P. (2018). A deep generative framework for paraphrase generation. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 5149–5156. https://doi.org/10.5555/3504035.3504666
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/NECO.1997.9.8.1735
Article Google Scholar
Hsu, S., Wentin, T., Zhang, Z., & Fowler, M. (2021). Atitudes surrounding an imperfect ai autograder. Conference on Human Factors in Computing Systems - Proceedings. https://doi.org/10.1145/3411764.3445424
Article Google Scholar
Huang, S., Wu, Y., Wei, F., & Luan, Z. (2019). Dictionary-guided editing networks for paraphrase generation. 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, 6546–6553. https://doi.org/10.1609/AAAI.V33I01.33016546
Huang, X., Bidart, R., Khetan, A., & Karnin, Z. (2022). Pyramid-BERT: Reducing complexity via successive core-set based token selection. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1, 8798–8817. https://doi.org/10.18653/v1/2022.acl-long.602
Article Google Scholar
Islam, A., & Inkpen, D. (2008). Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data, 2(2), 1–25. https://doi.org/10.1145/1376815.1376819
Article Google Scholar
Jayashankar, S., & Sridaran, R. (2017). Superlative model using word cloud for short answers evaluation in eLearning. Education and Information Technologies, 22(5), 2383–2402. https://doi.org/10.1007/s10639-016-9547-0
Article Google Scholar
Jordan, S. (2013). E-assessment: Past, present and future. New Directions, 9(1), 87–106. https://doi.org/10.11120/ndir.2013.00009
Article Google Scholar
Jordan, S., & Butcher, P. (2013). Does the Sun orbit the Earth? Challenges in using short free-text computer-marked questions. In HEA STEM Annual Learning and Teaching Conference 2013: Where Practice and Pedagogy Meet. http://www.heacademy.ac.uk/events/detail/2012/17_18_Apr_HEA_STEM_2013_Conf_Bham. Accessed 1 June 2021.
Kazemnejad, A., Salehi, M., & Soleymani Baghshah, M. (2020). Paraphrase generation by learning how to edit from samples. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (pp. 6010–6021). https://doi.org/10.18653/v1/2020.acl-main.535
Khan, S., & Khan, R. A. (2019). Online assessments: Exploring perspectives of university students. Education and Information Technologies, 24(1), 661–677. https://doi.org/10.1007/s10639-018-9797-0
Article MathSciNet Google Scholar
Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. https://doi.org/10.48550/arxiv.1412.6980. Accessed 20 Feb 2022.
Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings. https://arxiv.org/abs/1312.6114v10
Kreutzer, J., Caswell, I., Wang, L., Wahab, A., Van Esch, D., Ulzii-Orshikh, N., Tapo, A., Subramani, N., Sokolov, A., Sikasote, C., Setyawan, M., Sarin, S., Samb, S., Sagot, B., Rivera, C., Rios, A., Papadimitriou, I., Osei, S., Suarez, P. O., … Adeyemi, M. (2022). Quality at a glance: An audit of web-crawled multilingual datasets. Transactions of the Association for Computational Linguistics, 10, 50–72. https://doi.org/10.1162/tacl_a_00447
Kumar, S., Chakrabarti, S., & Roy, S. (2017). Earth mover’s distance pooling over siamese LSTMs for Automatic short answer grading. IJCAI International Joint Conference on Artificial Intelligence, 0, 2046–2052. https://doi.org/10.24963/ijcai.2017/284
Article Google Scholar
Kumar, A., Ahuja, K., Vadapalli, R., & Talukdar, P. (2020). Syntax-guided controlled generation of paraphrases. Transactions of the Association for Computational Linguistics, 8, 330–345. https://doi.org/10.1162/tacl_a_00318
Article Google Scholar
Kumaran, V. S., & Sankar, A. (2015). Towards an automated system for short-answer assessment using ontology mapping. International Arab Journal of E-Technology, 4(1), 17–24. https://dblp.org/db/journals/iajet/iajet4.html%0A, http://www.iajet.org/Pages/archive-vol-4.aspx%0A, http://www.iajet.org/documents/vol.4/no.1/3.pdf. Accessed 17 Feb 2022.
Lai, H., Mao, J., Toral, A., & Nissim, M. (2022). Human Judgement as a Compass to Navigate Automatic Metrics for Formality Transfer. HumEval 2022 - 2nd Workshop on Human Evaluation of NLP Systems, Proceedings of the Workshop, 102–115. https://doi.org/10.18653/v1/2022.humeval-1.9
Lavie, A. (2010). Evaluating the output of machine translation systems. In AMTA 2010 - 9th Conference of the Association for Machine Translation in the Americas. https://www.cs.cmu.edu/~alavie/Presentations/MT-Evaluation-MT-Summit-Tutorial-19Sep11.pdf. Accessed 3 Mar 2022.
Lavie, A., & Agarwal, A. (2007). METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the Second Workshop on Statistical Machine Translation, June (pp. 228–231). https://aclanthology.org/W07-0734/. Accessed 20 Feb 2022.
Leacock, C., & Chodorow, M. (2003). C-rater: Automated scoring of short-answer questions. Computers and the Humanities, 37(4), 389–405. https://doi.org/10.1023/A:1025779619903
Article Google Scholar
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703
Marvaniya, S., Foltz, P., Saha, S., Sindhgatta, R., Dhamecha, T. I., & Sengupta, B. (2018). Creating scoring rubric from representative student answers for improved short answer grading. International Conference on Information and Knowledge Management, Proceedings, 993–1002. https://doi.org/10.1145/3269206.3271755
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations ofwords and phrases and their compositionality. Advances in Neural Information Processing Systems. https://arxiv.org/abs/1310.4546v1
Mohler, M., & Mihalcea, R. (2009). Text-to-text semantic similarity for automatic short answer grading. Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on - EACL ’09, 567–575. https://doi.org/10.3115/1609067.1609130
Mohler, M., Bunescu, R., & Mihalcea, R. (2011). Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 752–762. ejournal.narotama.ac.id/files/Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments..pdf
Moodle. (2011). Regular expression short-Answer question type. https://docs.moodle.org/310/en/Regular_Expression_Short-Answer_question_type. Accessed 27 Dec 2020.
Nagoudi, E. M. B., Elmadany, A., & Abdul-Mageed, M. (2022). AraT5: Text-to-text transformers for arabic language generation. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 628–647. https://doi.org/10.18653/v1/2022.acl-long.47
Napoles, C., Sakaguchi, K., Post, M., & Tetreault, J. (2015). Ground Truth for Grammaticality Correction Metrics. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 588–593. https://doi.org/10.3115/v1/p15-2097
Noorbehbahani, F., & Kardan, A. A. (2011). The automatic assessment of free text answers using a modified BLEU algorithm. Computers and Education, 56(2), 337–345. https://doi.org/10.1016/j.compedu.2010.07.013
Article Google Scholar
Omran, A. M. B., & Ab Aziz, M. J. (2013). Automatic essay grading system for short answers in English language. Journal of Computer Science, 9(10), 1369–1382. https://doi.org/10.3844/jcssp.2013.1369.1382
Article Google Scholar
Ott, N., Ziai, R., & Meurers, D. (2012). Creation and analysis of a reading comprehension exercise corpus (pp. 47–69). John Benjamins Publishing Company. https://doi.org/10.1075/hsm.14.05ott
Book Google Scholar
Ouahrani, L., & Bennouar, D. (2020). AR-ASAG an Arabic dataset for automatic short answer grading evaluation. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) (pp. 2634–2643). https://aclanthology.org/2020.lrec-1.321. Accessed 13 Dec 2021.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, (pp. 311–318). https://doi.org/10.3115/1073083.1073135
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, (vol. 1, pp. 2227–2237). https://doi.org/10.18653/v1/n18-1202
Prakash, A., Hasan, S. A., Lee, K., Datla, V., Qadir, A., Liu, J., & Farri, O. (2016). Neural paraphrase generation with stacked residual LSTM networks - ACL anthology. In Proceedings of {COLING} 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 2923–2934). https://aclanthology.org/C16-1275/. Accessed 19 Feb 2022.
Pribadi, F. S., Permanasari, A. E., & Adji, T. B. (2018). Short answer scoring system using automatic reference answer generation and geometric average normalized-longest common subsequence (GAN-LCS). Education and Information Technologies, 23(6), 2855–2866. https://doi.org/10.1007/S10639-018-9745-Z
Article Google Scholar
Qiu, R. G. (2019). A systemic approach to leveraging student engagement in collaborative learning to improve online engineering education. International Journal of Technology Enhanced Learning, 11(1), 1–19. https://dl.acm.org/doi/10.5555/3302810.3302811. Accessed 19 Feb 2022.
Article Google Scholar
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. Homology, Homotopy and Applications, 9(1), 399–438. https://www.bibsonomy.org/bibtex/273ced32c0d4588eb95b6986dc2c8147c/jonaskaiser. Accessed 30 May 2023.
Google Scholar
Radford, A., Jeffrey, W., Rewon, C., David, L., Dario, A., & Ilya, S. (2019). Language models are unsupervised multitask learners | enhanced reader. OpenAI Blog, 1(8), 9. https://github.com/codelucas/newspaper. Accessed 30 May 2023.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2020). Language models are unsupervised multitask learners. OpenAI Blog, 1(May), 1–7. https://github.com/codelucas/newspaper. Accessed 30 May 2023.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21, 1–67. https://doi.org/10.48550/arxiv.1910.10683
Article MathSciNet Google Scholar
Ramachandran, L., & Foltz, P. (2015). Generating reference texts for short answer scoring using graph-based summarization. 10th Workshop on Innovative Use of NLP for Building Educational Applications, BEA 2015 at the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2015, 207–212. https://doi.org/10.3115/v1/w15-0624
Ramachandran, L., Cheng, J., & Foltz, P. (2015). Identifying patterns for short answer scoring using graph-based lexico-semantic text matching. Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, 97–106. https://doi.org/10.3115/v1/W15-0612
Real, R., & Vargas, J. M. (1996). The probabilistic basis of Jaccard’s index of similarity. In Systematic Biology (Vol. 45, Issue 3, pp. 380–385). Taylor and Francis Inc. https://doi.org/10.1093/sysbio/45.3.380
Rocchio, J. (1971). Relevance feedback in information retrieval. In editor Salton, G. (Ed.), The Smart Re- trieval System - Experiments in Automatic Document Processing (pp. 313–323). Prentice-Hall, Inc. https://www.bibsonomy.org/bibtex/1c18d843e34fe4f8bd1d2438227857225/bsmyth
Saha, S., Dhamecha, T. I., Marvaniya, S., Sindhgatta, R., & Sengupta, B. (2018). Sentence level or token level features for automatic short answer grading?: use both. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10947 LNAI, 503–517. https://doi.org/10.1007/978-3-319-93843-1_37
Sakaguchi, K., Heilman, M., & Madnani, N. (2015). Effective feature integration for automated short answer scoring. NAACL HLT 2015 - 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference.
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0
Article Google Scholar
Schneider, J., Richner, R., & Riser, M. (2023). Towards trustworthy AutoGrading of short, multi-lingual, multi-type answers. International Journal of Artificial Intelligence in Education, 33(1), 88–118. https://doi.org/10.1007/s40593-022-00289-z
Article Google Scholar
Scikit-learn. (2019). scikit-learn: machine learning in Python — scikit-learn 0.21.0. https://scikit-learn.org/stable/
Shermis, M. D. (2015). Contrasting state-of-the-art in the machine scoring of short-form constructed responses. Educational Assessment, 20(1), 46–65. https://doi.org/10.1080/10627197.2015.997617
Article Google Scholar
Sukkarieh, J. Z., & Blackmore, J. (2009). c-rater: Automatic Content Scoring for Short Constructed Responses. In Proceedings of the 22nd International FLAIRS Conference. Association for the Advancement of Artificial Intelligence (pp. 290–295). https://www.ets.org/research/policy_research_reports/publications/chapter/2009/imsb. Accessed 26 Mar 2022
Sultan, M. A., Salazar, C., & Sumner, T. (2016). Fast and easy short answer grading with high accuracy. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference, 1070–1075. https://doi.org/10.18653/v1/n16-1123
Sun, J., Ma, X., & Peng, N. (2021). AESOP: Paraphrase generation with adaptive syntactic control. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 5176–5189. https://doi.org/10.18653/v1/2021.emnlp-main.420
Sychev, O., Anikin, A., & Prokudin, A. (2020). Automatic grading and hinting in open-ended text questions. Cognitive Systems Research, 59, 264–272. https://doi.org/10.1016/j.cogsys.2019.09.025
Article Google Scholar
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003, 252–259. https://doi.org/10.3115/1073445.1073478
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-Decem, 5999–6009.
Vijayakumar, A. K., Cogswell, M., Selvaraju, R. R., Sun, Q., Lee, S., Crandall, D., D. B. (2018). Diverse beam search: decoding diverse solutions from neural sequence models. The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 7371–7379.
Whitelock, D., & Bektik, D. (2018). Progress and Challenges for Automated Scoring and Feedback Systems for Large-Scale Assessments (pp. 1–18). https://doi.org/10.1007/978-3-319-53803-7_39-1
Wubben, S., van den Bosch, A., & Krahmer, E. (2010). Paraphrase generation as monolingual translation: Data and evaluation. In Belgian/Netherlands Artificial Intelligence Conference. http://ilk.uvt.nl/. Accessed 22 Feb 2022.
Xu, P., Kumar, D., Yang, W., Zi, W., Tang, K., Huang, C., Cheung, J.C.K., Prince, S.J.D., Cao, Y., 2021. Optimizing deeper transformers on small datasets, in: ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference. Association for Computational Linguistics (ACL), pp. 2089–2102. https://doi.org/10.18653/v1/2021.acl-long.163
Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., & Raffel, C. (2021). mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. 483–498. https://doi.org/10.18653/v1/2021.naacl-main.41
Yang, Q., Huo, Z., Shen, D., Cheng, Y., Wang, W., Wang, G., & Carin, L. (2020). An end-to-end generative architecture for paraphrase generation. EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 3132–3142. https://doi.org/10.18653/v1/d19-1309
Zahran, M. A., Magooda, A., Mahgoub, A. Y., Raafat, H., Rashwan, M., & Atyia, A. (2015). Word Representations in Vector Space and their Applications for Arabic. In A. Gelbukh (Ed.) (Ed.), 16th international conference, CICLing 2015 Cairo, Egypt, april 14 (Vol. 9041, Issue April, pp. 430–443). Springer International Publishing Switzerland. https://doi.org/10.1007/978-3-319-18111-0_32
Zeng, D., Zhang, H., Xiang, L., Wang, J., & Ji, G. (2019). User-oriented paraphrase generation with keywords controlled network. IEEE Access, 7, 80542–80551. https://doi.org/10.1109/ACCESS.2019.2923057
Article Google Scholar
Zhao, J., Zhu, T., & Lan, M. (2014). ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment. 271–277. https://doi.org/10.3115/v1/s14-2044
Ziai, R., Ott, N., & Meurers, D. (2012). Short Answer Assessment : Establishing Links Between Research Strands. Proceedings of the Seventh Workshop on Building Educational Applications Using NLP. Association for Computational Linguistics, 2(2005), 190–200.

Download references

Acknowledgements

The authors would like to thank Selena LAMARI, Oussama HAMEL, Ahmed Hadjersi, and Oussama Benguergoura for their technical help during the experimentation phase.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

LIM Laboratory, Computer Science Department, Faculty of Applied Sciences, Bouira University, 10000, Bouira, Algeria
Leila Ouahrani & Djamal Bennouar

Authors

Leila Ouahrani
View author publications
You can also search for this author in PubMed Google Scholar
Djamal Bennouar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors worked collaboratively to formulate research questions, conduct the search, select data, and perform analysis, experiments, and discussion. The corresponding author worked on writing the initial draft. All authors reviewed, read, and approved the final manuscript.

Corresponding author

Correspondence to Leila Ouahrani.

Ethics declarations

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ouahrani, L., Bennouar, D. Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading. Int J Artif Intell Educ (2024). https://doi.org/10.1007/s40593-023-00391-w

Download citation

Accepted: 22 December 2023
Published: 26 January 2024
DOI: https://doi.org/10.1007/s40593-023-00391-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading

Abstract

Access this article

Similar content being viewed by others

Investigating Transformers for Automatic Short Answer Grading

Improving Short Answer Grading Using Transformer-Based Pre-training

Automatic Short Answer Grading via Multiway Attention Networks

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Paraphrase Generation and Supervised Learning for Improved Automatic Short Answer Grading

Abstract

Access this article

Similar content being viewed by others

Investigating Transformers for Automatic Short Answer Grading

Improving Short Answer Grading Using Transformer-Based Pre-training

Automatic Short Answer Grading via Multiway Attention Networks

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation