Skip to main content
Log in

Abstract

Historically speaking, students were judged long before they were marked. The tradition of marking, or scoring, pieces of work students offer for assessment is little more than two centuries old, and was introduced mainly to cope with specific problems arising from the growth in the numbers graduating from universities as the industrial revolution progressed. This paper describes the principles behind the method of Comparative Judgement, and in particular Adaptive Comparative Judgement, a technique borrowed from psychophysics which is able to generate extremely reliable results for educational assessment, and which is based on the kind of holistic evaluation that we assume was the basis for judgement in pre-marking days, and that the users of assessment results expect our assessment schemes to capture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. This description is based mostly on the description of the late nineteenth century Qing system by Miyazaki (1963).

References

  • Adams, R. M. (1995). Analysing the results of cross-moderation studies. Paper presented at a seminar on comparability, held jointly by the SRAC of the GCE boards and the IGRC of the GCSE groups, London, October.

  • Adams, R. (2007). Cross-moderation methods. In P. Newton, J. Baird, H. Patrick, H. Goldstein, P. Timms, & A. Wood (Eds.), Techniques for monitoring the comparability of examination standards. London, QCA. Available (26/09/2011) at: http://www.ofqual.gov.uk/files/2007-comparability-exam-standards-h-chapter6.pdf.

  • Anastasi, A. (1986). Evolving concepts of test validation. Annual Review of Psychology, 37, 1–15.

    Article  Google Scholar 

  • Andrich, D. (1978). Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement, 2, 451–462.

    Article  Google Scholar 

  • Baker, E. L., Ayers, P., O’Neill, H. F., Choi, K., Sawyer, W., Sylvester., R. M., & Carroll, B. (2008). KS3 English test marker study in Australia. Final report to the National Assessment Agency of England, London, QCA.

  • Bramley, T., (2007). Paired comparison methods. In P. Newton, J. Baird, H. Patrick, H. Goldstein, P. Timms, & A. Wood (Eds). Techniques for monitoring the comparability of examination standards. London, QCA. Available (26/09/2011) at: http://www.ofqual.gov.uk/files/2007-comparability-exam-standards-i-chapter7.pdf.

  • Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley Lawrence Erlbaum Associates.

    Google Scholar 

  • D’Arcy, J. (Ed.). (1997). Comparability studies between modular and non-modular syllabuses in GCE Advanced level biology, English literature and mathematics in the 1996 summer examinations. Standing Committee on Research on behalf of the Joint Forum for the GCSE and GCE.

  • Haley, C., & Wothers, P. (2005). In M. D. Archer, & C. D. Haley (Eds.), The 1702 chair of chemistry at Cambridge. Cambridge: CUP.

  • Kelly, G. A. (1955). The psychology of personal constructs (Vol. I and II). New York: Norton.

    Google Scholar 

  • Kimbell, R., Wheeler, T., Stables, K., Sheppard, T., Martin, F., Davies, D., Pollitt, A., & Whitehouse, G. (2009). e-scape portfolio assessment: phase 3 report. London: Technology Education Research Unit, Goldsmiths, UL. http://www.gold.ac.uk/teru/projectinfo/projecttitle,5882,en.php.

  • Linacre, J. M. (1994). Many-facet Rasch measurement, 2nd ed. Chicago: MESA Press.http://www.rasch.org/books.htm.

  • Miyazaki, I. (1963). China’s examination hell: the civil service examinations of imperial China (C. Schirokauer (1976), Trans). New York: Weatherhill.

  • Pollitt, A., & Elliott, G. (2003). Monitoring and investigating comparability: A proper role for human judgement. Invited paper, QCA comparability seminar, Newport Pagnall. Qualifications and curriculum authority, London. Available at: http://www.camexam.co.uk/.

  • Pollitt, A., & Murray, N. L. (1993). What raters really pay attention to language testing research colloquium, Cambridge (Reprinted from M. Milanovic, & N. Saville (Eds.), 1996, Studies in language testing 3: Performance testing, cognition and assessment. Cambridge: Cambridge University Press).

  • QCDA. (2011). Importance of design and technology key stage 3. http://www.education.gov.uk/schools/teachingandlearning/curriculum/secondary/b00199489/dt/programme. Accessed: 9 Dec 2011.

  • Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research. Reprinted as 2nd ed., 1980, Chicago: University of Chicago Press.

  • Shavelson, R., & Webb, N. (2000). Generalizability theory. In J. L. Green, G. Camilli, & P. B. Elmore (Eds.), Handbook of complementary methods in education research, Chapter 18. London: Lawrence Erlbaum Associates.

    Google Scholar 

  • Stray, C. (2001). The shift from oral to written examinations: Cambridge and Oxford 1700–1900. Assessment in Education, 8, 33–50.

    Article  Google Scholar 

  • Thurstone, L. L. (1927a). Psychophysical analysis. The American Journal of Psychology, 38, 368–389.

    Article  Google Scholar 

  • Thurstone, L. L. (1927b). A law of comparative judgment. Psychological Review, 34, 273–286 (Reprinted as Chapter 3 from Thurstone, L. L. (1959). The measurement of values. Chicago, IL: University of Chicago Press).

  • Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Watson, R. (1818). Anecdotes of the life of Richard Watson … written by himself at different intervals, and revised in 1814. Published by his son, Richard Watson, L. L. B., prebendary of Landaff and Wells. London: T. Cadell and W. Davies.

  • Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.

    Article  Google Scholar 

  • Wordsworth, C. (1877). Scholae academicae. London: Frank Cass.

    Google Scholar 

  • Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press. http://www.rasch.org/books.htm.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alastair Pollitt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pollitt, A. Comparative judgement for assessment. Int J Technol Des Educ 22, 157–170 (2012). https://doi.org/10.1007/s10798-011-9189-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10798-011-9189-x

Keywords

Navigation