Abstract
Scoring reliability of constructed-response items is a key concern in high-stakes testing. Constructed-response items, often used for their authenticity, potentially allow for a multitude of acceptable answers that were neither intended nor anticipated, and can therefore be problematic for reliable scoring. This chapter examines the use of a specially developed marker support system for the Austrian EFL school-leaving exam, which uses such items but without centralized marking and therefore potentially suffers from inconsistent scoring that could affect 40,000 students annually. The study investigates the impact of three different scoring guide conditions on test taker results in four constructed-response tasks for listening at CEFR B2 level. The first scoring condition (A) is exact scoring based on the scoring guide developed by the item writing team before the task had been field-tested. The second scoring condition (B) is based on an extended scoring guide that was improved in a centrally run scoring session after piloting the items. The third scoring condition (C) is based on the highly comprehensive scoring guide that was enhanced during the scoring of the national live exam through a marker support system in the form of an online helpdesk and a telephone hotline. The statistical analyzes show an overall improvement in the reliability of the test from scoring condition A to scoring condition C. Consequently, the findings of the study suggest that the practice of improving and refining the scoring guides through the implemented marker support system increase the comparability, reliability, and fairness in test taker scores.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press.
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford, UK: Oxford University Press.
BMUKK [Bundesministerium für Unterricht, Kunst und Kultur]. (2004). Oberstufenlehrplan für die Erste und Zweite Lebende Fremdsprache für Allgemein Bildende Höhere Schulen. http://www.bmukk.gv.at/medienpool/11854/lebendefremdsprache_ost_neu0.pdf. Accessed 13 October 2013.
Brindley, G. (1998). Assessing listening abilities. Annual Review of Applied Linguistics, 18, 171–191.
Brown, G., & Yule, G. (1983). Teaching the spoken language. Cambridge: Cambridge University Press.
Buck, G. (2001). Assessing listening. Cambridge, UK: Cambridge University Press.
Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. http://www.coe.int/t/dg4/linguistic/Source/Framework_EN.pdf. Accessed 2 November 2013.
Eberharter, K., & Frötscher, D. (2013). Quality control in marking open-ended listening and reading test items. In D. Tsagari, S. Papadima-Sophocleous, & S. Ioannou-Georgiou (Eds.), International experiences in language testing and assessment: Selected papers in memory of Pavlos Pavlou (pp. 229–242). Frankfurt: Peter Lang.
Elliott, W., & Wilson, J. (2013). Context validity. In A. Geranpayeh & L. B. Taylor (Eds.), Examining listening: Research and practice in assessing second language listening (pp. 152–241). Cambridge, UK: Cambridge University Press.
Field, J. (2013). Cognitive validity. In A. Geranpayeh & L. B. Taylor (Eds.), Examining listening: Research and practice in assessing second language listening (pp. 77–151). Cambridge, UK: Cambridge University Press.
Green, R. (2013). Statistical analyses for language testing. Basingstoke: Palgrave Macmillan.
Hackett, E., Geranpayeh, A., & Somers, A. (2006). Listening skills group spelling project: Investigating the impact of the revision of an FCE 4 productive task mark scheme based on the recommendations of four external consultants (Cambridge ESOL Internal Report).
Harding, L., Pill, J., & Ryan, K. (2011). Assessor decision making while marking a note-taking listening test: The case of the OET. Language Assessment Quarterly, 8(2), 108–126.
Harding, L., & Ryan, K. (2009). Decision making in marking open-ended listening test items: The case of the OET. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 7, 99–114.
Henning, G. (1987). A guide to language testing: Development, evaluation and research. Cambridge, MA: Newbury House.
Khalifa, H., & Weir, C. J. (2009). Examining reading: Research and practice in assessing second language reading. Cambridge, UK: Cambridge University Press.
Lynch, T. (2009). Teaching second language listening. Oxford, UK: Oxford University Press.
Spöttl, C., Eberharter, K., Holzknecht, F., Kremmel, B., & Zehentner, M. (2018). Delivering reform in a high stakes context: From content-based assessment to communicative and competence-based assessment. In G. Sigott (Ed.), Language testing in Austria: Taking stock (pp. 219–240). Berlin: Peter Lang.
Spöttl, C., Kremmel, B., Holzknecht, F., & Alderson, J. C. (2016). Evaluating the achievements and challenges in reforming a national language exam: The reform team’s perspective. Papers in Language Testing and Assessment, 5(1), 1–22.
Taylor, L. (2013). Introduction. In A. Geranpayeh & L. B. Taylor (Eds.), Examining listening: Research and practice in assessing second language listening (pp. 1–35). Cambridge, UK: Cambridge University Press.
Taylor, L., & Geranpayeh, A. (2011). Assessing listening for academic purposes: Defining and operationalising the test construct. Journal of English for Academic Purposes, 10(2), 89–101.
Weiler, T., & Frötscher, D. (2018). Ensuring sustainability and managing quality in producing the standardized matriculation examination in the foreign languages. In G. Sigott (Ed.), Language testing in Austria: Taking stock (pp. 241–260). Berlin: Peter Lang.
Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Basingstoke: Palgrave Macmillan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Leitner, K., Kremmel, B. (2021). Avoiding Scoring Malpractice: Supporting Reliable Scoring of Constructed-Response Items in High-Stakes Exams. In: Lanteigne, B., Coombe, C., Brown, J.D. (eds) Challenges in Language Testing Around the World. Springer, Singapore. https://doi.org/10.1007/978-981-33-4232-3_10
Download citation
DOI: https://doi.org/10.1007/978-981-33-4232-3_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4231-6
Online ISBN: 978-981-33-4232-3
eBook Packages: EducationEducation (R0)