Abstract
We present an extrinsic evaluation of a clustering-based approach to computer-assisted scoring of short constructed response items, as encountered in educational assessment. Due to their open-ended nature, constructed response items need to be graded by human readers, which makes the overall testing process costly and time-consuming. In this paper we investigate the prospects for streamlining the grading task by grouping similar responses for scoring. The efficiency of scoring clustered responses is compared both with the traditional mode of grading individual test-takers’ sheets and with by-item scoring of non-clustered responses. Evaluation of the three grading modes is carried out during real-life language proficiency tests of German as a Foreign Language. We show that a system based on basic clustering techniques and shallow features yields a promising trend of reducing grading time and performs as well as a system displaying test-taker sheets for scoring.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12(4), 461–486 (2009)
Bachman, L.F., Carr, N., Kamei, G., Kim, M., Pan, M.J., Salvador, C., Sawaki, Y.: A reliable approach to automatic assessment of short answer free responses. In: Proceedings of the 19th International Conference on Computational Linguistics, pp. 1–4 (2002)
Basu, S., Jacobs, C., Vanderwende, L.: Powergrading: a clustering approach to amplify human effort for short answer grading. Transactions of the Association of Computational Linguistics 1, 391–402 (2013)
Brooks, M., Basu, S., Jacobs, C., Vanderwende, L.: Divide and correct: Using clusters to grade short answers at scale. In: Proceedings of the 1st Conference on Learning at Scale, pp. 89–98 (2014)
Hahn, M., Meurers, D.: Evaluating the meaning of answers to reading comprehension questions: A semantics-based approach. In: Proceedings of the 7th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 326–336 (2012)
Hill, D.: A vector clustering technique. In: Proceedings of the FID/IFIP Joint Conference: Mechanised Information Storage, Retrieval and Dissemination, pp. 225–234 (1968)
Horbach, A., Palmer, A., Pinkal, M.: Using the text to evaluate short answers for reading comprehension exercises. In: Proceedings of *SEM 2013: 2nd Joint Conference on Lexical and Computational Semantics, pp. 286–295 (2013)
Horbach, A., Palmer, A., Wolska, M.: Finding a tradeoff between accuracy and rater’s workload in grading clustered short answers. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, pp. 588–595 (2014)
Leacock, C., Chodorow, M.: C-rater: Automated Scoring of Short-Answer Questions. Computers and the Humanities 37(4), 389–405 (2003)
Meurers, D., Ziai, R., Ott, N., Kopp, J.: Evaluating Answers to Reading Comprehension Questions in Context: Results for German and the Role of Information Structure. In: Proceedings of the TextInfer Workshop on Textual Entailment, pp. 1–9 (2011)
Mohler, M., Bunescu, R.C., Mihalcea, R.: Learning to grade short answer questions using semantic similarity measures and dependency graph alignments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 752–762 (2011)
Mohler, M., Mihalcea, R.: Text-to-text semantic similarity for automatic short answer grading. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 567–575 (2009)
Pulman, S.G., Sukkarieh, J.: Automatic short answer marking. In: Proceedings of the 2nd Workshop on Building Educational Applications Using NLP, pp. 9–16 (2005)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Shermis, M.D., Hamner, B.: Contrasting state-of-the-art automated scoring of essays. In: Handbook of Automated Essay Evaluation: Current Applications and New Directions, pp. 313–346 (2013)
Sukkarieh, J.Z., Blackmore, J.: c-rater: Automatic content scoring for short constructed responses. In: Proceedings of the 22nd International Conference of the Florida Artificial Intelligence Research Society, pp. 290–295 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wolska, M., Horbach, A., Palmer, A. (2014). Computer-Assisted Scoring of Short Responses: The Efficiency of a Clustering-Based Approach in a Real-Life Task. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-10888-9_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10887-2
Online ISBN: 978-3-319-10888-9
eBook Packages: Computer ScienceComputer Science (R0)