Skip to main content

How to Obtain Efficient High Reliabilities in Assessing Texts: Rubrics vs Comparative Judgement

  • Conference paper
  • First Online:
Technology Enhanced Assessment (TEA 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 829))

Included in the following conference series:

Abstract

It is very difficult and time consuming to assess texts. Even after great effort there is a small chance independent raters would agree on their mutual ratings which undermines the reliability of the rating. Several assessment methods and their merits are described in literature, among them the use of rubrics and the use of comparative judgement (CJ). In this study we investigate which of the two methods is more efficient in obtaining reliable outcomes when used for assessing texts. The same 12 texts are assessed in both a rubric and CJ condition by the same 6 raters. Results show an inter-rater reliability of .30 for the rubric condition and an inter-rater reliability of .84 in the CJ condition after the same amount of time invested in the respective methods. Therefore we conclude that CJ is far more efficient in obtaining high reliabilities when used to asses texts. Also suggestions for further research are made.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bevan, R.M., Daugherty, R., Dudley, P., Gardner, J., Harlen, W., Stobart, G.: A systematic review of the evidence of reliability and validity of assessment by teachers used for summative purposes (2004)

    Google Scholar 

  2. Jonsson, A., Svingby, G.: The use of scoring rubrics: reliability, validity and educational consequences. Educ. Res. Rev. 2(2), 130–144 (2007)

    Article  Google Scholar 

  3. Tisi, J., Whitehouse, G., Maughan, S., Burdett, N.: A review of literature on marking reliability research (2011)

    Google Scholar 

  4. Hamp-Lyons, L.: The scope of writing assessment. Assess. Writ. 8(1), 5–16 (2002)

    Article  Google Scholar 

  5. Bloxham, S.: Marking and moderation in the UK: false assumptions and wasted resources. Assess. Eval. High. Educ. 34(2), 209–220 (2009)

    Article  Google Scholar 

  6. Stuhlmann, J., Daniel, C., Dellinger, A., Kenton, R., Powers, T.: A generalizability study of the effects of training on teachers’ abilities to rate children’s writing using a rubric. Read. Psychol. 20(2), 107–127 (1999)

    Article  Google Scholar 

  7. Marzano, R.J.: A comparison of selected methods of scoring classroom assessments. Appl. Meas. Educ. 15(3), 249–268 (2002)

    Article  Google Scholar 

  8. Lesterhuis, M., Verhavert, S., Coertjens, L., Donche, V., De Maeyer, S.: Comparative judgement as a promising alternative to score competences. In: Innovative Practices for Higher Education Assessment and Measurement, p. 119 (2016)

    Google Scholar 

  9. Pollitt, A.: Comparative judgement for assessment. Int. J. Technol. Des. Educ. 22(2), 157–170 (2012)

    Article  Google Scholar 

  10. Jones, I., Alcock, L.: Peer assessment without assessment criteria. Stud. High. Educ. 39(10), 1774–1787 (2014)

    Article  Google Scholar 

  11. Whitehouse, C., Pollitt, A.: Using adaptive comparative judgement to obtain a highly reliable rank order in summative assessment (2012)

    Google Scholar 

  12. Heldsinger, S., Humphry, S.: Using the method of pairwise comparison to obtain reliable teacher assessments. Aust. Educ. Res. 37(2), 1–19 (2010)

    Article  Google Scholar 

  13. van Daal, T., Lesterhuis, M., Coertjens, L., Donche, V., De Maeyer, S.: Validity of comparative judgement to assess academic writing: examining implications of its holistic character and building on a shared consensus. Assess. Educ.: Princ. Policy Pract. 1–16 (2016)

    Google Scholar 

  14. Bramley, T.: Investigating the reliability of adaptive comparative judgment. Cambridge Assessment Research Report. Cambridge Assessment, Cambridge (2015). http://www.cambridgeassessment.org.uk/Images/232694-investigating-the-reliability-ofadaptive-comparative-judgment.pdf

  15. Jones, I., Inglis, M.: The problem of assessing problem solving: can comparative judgement help?’. Educ. Stud. Math. 89, 337–355 (2015)

    Article  Google Scholar 

  16. Yeates, P., O’neill, P., Mann, K., Eva, K.: ‘You’re certainly relatively competent’: assessor bias due to recent experiences. Med. Educ. 47(9), 910–922 (2013)

    Article  Google Scholar 

  17. Pollitt, A.: The method of adaptive comparative judgement. Assess. Educ.: Princ. Policy Pract. 19(3), 281–300 (2012)

    Article  Google Scholar 

  18. Bouwer, R., Koster, M.: Bringing writing research into the classroom: the effectiveness of Tekster, a newly developed writing program for elementary students, Utrecht (2016)

    Google Scholar 

  19. Bloxham, S., den-Outer, B., Hudson, J., Price, M.: Let’s stop the pretence of consistent marking exploring the multiple limitations of assessment criteria. Assess. Eval. High. Educ. 41(3), 466–481 (2016)

    Article  Google Scholar 

  20. Gwet, K.L.: Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Advanced Analytics, LLC, Gaithersburg (2014)

    Google Scholar 

  21. Lumley, T., McNamara, T.F.: Rater characteristics and rater bias: implications for training. Lang. Test. 12(1), 54–71 (1995)

    Article  Google Scholar 

  22. Thurstone, L.L.: Psychophysical analysis. Am. J. Psychol. 38(3), 368–389 (1927)

    Article  Google Scholar 

  23. Webb, N.M., Shavelson, R.J., Haertel, E.H.: 4 reliability coefficients and generalizability theory. Handb. stat. 26, 81–124 (2006)

    Article  Google Scholar 

  24. Jones, I., Swan, M., Pollitt, A.: Assessing mathematical problem solving using comparative judgement. Int. J. Sci. Math. Educ. 13(1), 151–177 (2015)

    Article  Google Scholar 

  25. McMahon, S., Jones, I.: A comparative judgement approach to teacher assessment. Assess. Educ.: Princ. Policy Pract. 22, 1–22 (2014). (ahead-of-print)

    Google Scholar 

  26. Panadero, E., Jonsson, A.: The use of scoring rubrics for formative assessment purposes revisited: a review. Educ. Res. Rev. 9, 129–144 (2013)

    Article  Google Scholar 

  27. Arter, J., McTighe, J.: Scoring Rubrics in the Classroom: Using Performance Criteria for Assessing and Improving Student Performance. Corwin Press, Thousand Oaks (2000)

    Google Scholar 

  28. Fraile, J., Panadero, E., Pardo, R.: Co-creating rubrics: the effects on self-regulated learning, self-efficacy and performance of establishing assessment criteria with students. Stud. Educ. Eval. 53, 69–76 (2017)

    Article  Google Scholar 

  29. Andrich, D.: Relationships between the Thurstone and Rasch approaches to item scaling. Appl. Psychol. Meas. 2(3), 451–462 (1978)

    Article  Google Scholar 

  30. Bloxham, S., Price, M.: External examining: fit for purpose? Stud. High. Educ. 40(2), 195–211 (2015)

    Article  Google Scholar 

  31. Shrout, P.E., Fleiss, J.L.: Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86(2), 420 (1979)

    Article  Google Scholar 

  32. Linacre, J., Wright, B.: Chi-square fit statistics. Rasch Meas. Trans. 8(2), 350 (1994)

    Google Scholar 

  33. Pollitt, A.: Let’s stop marking exams (2004)

    Google Scholar 

Download references

Acknowledgements

Jan ‘T Sas, Elies Ghysebrechts, Jolien Polus & Tine Van Reeth.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maarten Goossens .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Goossens, M., De Maeyer, S. (2018). How to Obtain Efficient High Reliabilities in Assessing Texts: Rubrics vs Comparative Judgement. In: Ras, E., Guerrero Roldán, A. (eds) Technology Enhanced Assessment. TEA 2017. Communications in Computer and Information Science, vol 829. Springer, Cham. https://doi.org/10.1007/978-3-319-97807-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97807-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97806-2

  • Online ISBN: 978-3-319-97807-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics