Abstract
It is very difficult and time consuming to assess texts. Even after great effort there is a small chance independent raters would agree on their mutual ratings which undermines the reliability of the rating. Several assessment methods and their merits are described in literature, among them the use of rubrics and the use of comparative judgement (CJ). In this study we investigate which of the two methods is more efficient in obtaining reliable outcomes when used for assessing texts. The same 12 texts are assessed in both a rubric and CJ condition by the same 6 raters. Results show an inter-rater reliability of .30 for the rubric condition and an inter-rater reliability of .84 in the CJ condition after the same amount of time invested in the respective methods. Therefore we conclude that CJ is far more efficient in obtaining high reliabilities when used to asses texts. Also suggestions for further research are made.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bevan, R.M., Daugherty, R., Dudley, P., Gardner, J., Harlen, W., Stobart, G.: A systematic review of the evidence of reliability and validity of assessment by teachers used for summative purposes (2004)
Jonsson, A., Svingby, G.: The use of scoring rubrics: reliability, validity and educational consequences. Educ. Res. Rev. 2(2), 130–144 (2007)
Tisi, J., Whitehouse, G., Maughan, S., Burdett, N.: A review of literature on marking reliability research (2011)
Hamp-Lyons, L.: The scope of writing assessment. Assess. Writ. 8(1), 5–16 (2002)
Bloxham, S.: Marking and moderation in the UK: false assumptions and wasted resources. Assess. Eval. High. Educ. 34(2), 209–220 (2009)
Stuhlmann, J., Daniel, C., Dellinger, A., Kenton, R., Powers, T.: A generalizability study of the effects of training on teachers’ abilities to rate children’s writing using a rubric. Read. Psychol. 20(2), 107–127 (1999)
Marzano, R.J.: A comparison of selected methods of scoring classroom assessments. Appl. Meas. Educ. 15(3), 249–268 (2002)
Lesterhuis, M., Verhavert, S., Coertjens, L., Donche, V., De Maeyer, S.: Comparative judgement as a promising alternative to score competences. In: Innovative Practices for Higher Education Assessment and Measurement, p. 119 (2016)
Pollitt, A.: Comparative judgement for assessment. Int. J. Technol. Des. Educ. 22(2), 157–170 (2012)
Jones, I., Alcock, L.: Peer assessment without assessment criteria. Stud. High. Educ. 39(10), 1774–1787 (2014)
Whitehouse, C., Pollitt, A.: Using adaptive comparative judgement to obtain a highly reliable rank order in summative assessment (2012)
Heldsinger, S., Humphry, S.: Using the method of pairwise comparison to obtain reliable teacher assessments. Aust. Educ. Res. 37(2), 1–19 (2010)
van Daal, T., Lesterhuis, M., Coertjens, L., Donche, V., De Maeyer, S.: Validity of comparative judgement to assess academic writing: examining implications of its holistic character and building on a shared consensus. Assess. Educ.: Princ. Policy Pract. 1–16 (2016)
Bramley, T.: Investigating the reliability of adaptive comparative judgment. Cambridge Assessment Research Report. Cambridge Assessment, Cambridge (2015). http://www.cambridgeassessment.org.uk/Images/232694-investigating-the-reliability-ofadaptive-comparative-judgment.pdf
Jones, I., Inglis, M.: The problem of assessing problem solving: can comparative judgement help?’. Educ. Stud. Math. 89, 337–355 (2015)
Yeates, P., O’neill, P., Mann, K., Eva, K.: ‘You’re certainly relatively competent’: assessor bias due to recent experiences. Med. Educ. 47(9), 910–922 (2013)
Pollitt, A.: The method of adaptive comparative judgement. Assess. Educ.: Princ. Policy Pract. 19(3), 281–300 (2012)
Bouwer, R., Koster, M.: Bringing writing research into the classroom: the effectiveness of Tekster, a newly developed writing program for elementary students, Utrecht (2016)
Bloxham, S., den-Outer, B., Hudson, J., Price, M.: Let’s stop the pretence of consistent marking exploring the multiple limitations of assessment criteria. Assess. Eval. High. Educ. 41(3), 466–481 (2016)
Gwet, K.L.: Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Advanced Analytics, LLC, Gaithersburg (2014)
Lumley, T., McNamara, T.F.: Rater characteristics and rater bias: implications for training. Lang. Test. 12(1), 54–71 (1995)
Thurstone, L.L.: Psychophysical analysis. Am. J. Psychol. 38(3), 368–389 (1927)
Webb, N.M., Shavelson, R.J., Haertel, E.H.: 4 reliability coefficients and generalizability theory. Handb. stat. 26, 81–124 (2006)
Jones, I., Swan, M., Pollitt, A.: Assessing mathematical problem solving using comparative judgement. Int. J. Sci. Math. Educ. 13(1), 151–177 (2015)
McMahon, S., Jones, I.: A comparative judgement approach to teacher assessment. Assess. Educ.: Princ. Policy Pract. 22, 1–22 (2014). (ahead-of-print)
Panadero, E., Jonsson, A.: The use of scoring rubrics for formative assessment purposes revisited: a review. Educ. Res. Rev. 9, 129–144 (2013)
Arter, J., McTighe, J.: Scoring Rubrics in the Classroom: Using Performance Criteria for Assessing and Improving Student Performance. Corwin Press, Thousand Oaks (2000)
Fraile, J., Panadero, E., Pardo, R.: Co-creating rubrics: the effects on self-regulated learning, self-efficacy and performance of establishing assessment criteria with students. Stud. Educ. Eval. 53, 69–76 (2017)
Andrich, D.: Relationships between the Thurstone and Rasch approaches to item scaling. Appl. Psychol. Meas. 2(3), 451–462 (1978)
Bloxham, S., Price, M.: External examining: fit for purpose? Stud. High. Educ. 40(2), 195–211 (2015)
Shrout, P.E., Fleiss, J.L.: Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86(2), 420 (1979)
Linacre, J., Wright, B.: Chi-square fit statistics. Rasch Meas. Trans. 8(2), 350 (1994)
Pollitt, A.: Let’s stop marking exams (2004)
Acknowledgements
Jan ‘T Sas, Elies Ghysebrechts, Jolien Polus & Tine Van Reeth.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Goossens, M., De Maeyer, S. (2018). How to Obtain Efficient High Reliabilities in Assessing Texts: Rubrics vs Comparative Judgement. In: Ras, E., Guerrero Roldán, A. (eds) Technology Enhanced Assessment. TEA 2017. Communications in Computer and Information Science, vol 829. Springer, Cham. https://doi.org/10.1007/978-3-319-97807-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-97807-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97806-2
Online ISBN: 978-3-319-97807-9
eBook Packages: Computer ScienceComputer Science (R0)