Comparative judgement for assessment

  • Alastair Pollitt


Historically speaking, students were judged long before they were marked. The tradition of marking, or scoring, pieces of work students offer for assessment is little more than two centuries old, and was introduced mainly to cope with specific problems arising from the growth in the numbers graduating from universities as the industrial revolution progressed. This paper describes the principles behind the method of Comparative Judgement, and in particular Adaptive Comparative Judgement, a technique borrowed from psychophysics which is able to generate extremely reliable results for educational assessment, and which is based on the kind of holistic evaluation that we assume was the basis for judgement in pre-marking days, and that the users of assessment results expect our assessment schemes to capture.


ACJ Assessment Judgement Reliability Adaptive Comparative Judgement 


  1. Adams, R. M. (1995). Analysing the results of cross-moderation studies. Paper presented at a seminar on comparability, held jointly by the SRAC of the GCE boards and the IGRC of the GCSE groups, London, October.Google Scholar
  2. Adams, R. (2007). Cross-moderation methods. In P. Newton, J. Baird, H. Patrick, H. Goldstein, P. Timms, & A. Wood (Eds.), Techniques for monitoring the comparability of examination standards. London, QCA. Available (26/09/2011) at:
  3. Anastasi, A. (1986). Evolving concepts of test validation. Annual Review of Psychology, 37, 1–15.CrossRefGoogle Scholar
  4. Andrich, D. (1978). Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement, 2, 451–462.CrossRefGoogle Scholar
  5. Baker, E. L., Ayers, P., O’Neill, H. F., Choi, K., Sawyer, W., Sylvester., R. M., & Carroll, B. (2008). KS3 English test marker study in Australia. Final report to the National Assessment Agency of England, London, QCA.Google Scholar
  6. Bramley, T., (2007). Paired comparison methods. In P. Newton, J. Baird, H. Patrick, H. Goldstein, P. Timms, & A. Wood (Eds). Techniques for monitoring the comparability of examination standards. London, QCA. Available (26/09/2011) at:
  7. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley Lawrence Erlbaum Associates.Google Scholar
  8. D’Arcy, J. (Ed.). (1997). Comparability studies between modular and non-modular syllabuses in GCE Advanced level biology, English literature and mathematics in the 1996 summer examinations. Standing Committee on Research on behalf of the Joint Forum for the GCSE and GCE.Google Scholar
  9. Haley, C., & Wothers, P. (2005). In M. D. Archer, & C. D. Haley (Eds.), The 1702 chair of chemistry at Cambridge. Cambridge: CUP.Google Scholar
  10. Kelly, G. A. (1955). The psychology of personal constructs (Vol. I and II). New York: Norton.Google Scholar
  11. Kimbell, R., Wheeler, T., Stables, K., Sheppard, T., Martin, F., Davies, D., Pollitt, A., & Whitehouse, G. (2009). e-scape portfolio assessment: phase 3 report. London: Technology Education Research Unit, Goldsmiths, UL.,5882,en.php.
  12. Linacre, J. M. (1994). Many-facet Rasch measurement, 2nd ed. Chicago: MESA Press.
  13. Miyazaki, I. (1963). China’s examination hell: the civil service examinations of imperial China (C. Schirokauer (1976), Trans). New York: Weatherhill.Google Scholar
  14. Pollitt, A., & Elliott, G. (2003). Monitoring and investigating comparability: A proper role for human judgement. Invited paper, QCA comparability seminar, Newport Pagnall. Qualifications and curriculum authority, London. Available at:
  15. Pollitt, A., & Murray, N. L. (1993). What raters really pay attention to language testing research colloquium, Cambridge (Reprinted from M. Milanovic, & N. Saville (Eds.), 1996, Studies in language testing 3: Performance testing, cognition and assessment. Cambridge: Cambridge University Press).Google Scholar
  16. QCDA. (2011). Importance of design and technology key stage 3. Accessed: 9 Dec 2011.
  17. Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research. Reprinted as 2nd ed., 1980, Chicago: University of Chicago Press.Google Scholar
  18. Shavelson, R., & Webb, N. (2000). Generalizability theory. In J. L. Green, G. Camilli, & P. B. Elmore (Eds.), Handbook of complementary methods in education research, Chapter 18. London: Lawrence Erlbaum Associates.Google Scholar
  19. Stray, C. (2001). The shift from oral to written examinations: Cambridge and Oxford 1700–1900. Assessment in Education, 8, 33–50.CrossRefGoogle Scholar
  20. Thurstone, L. L. (1927a). Psychophysical analysis. The American Journal of Psychology, 38, 368–389.CrossRefGoogle Scholar
  21. Thurstone, L. L. (1927b). A law of comparative judgment. Psychological Review, 34, 273–286 (Reprinted as Chapter 3 from Thurstone, L. L. (1959). The measurement of values. Chicago, IL: University of Chicago Press).Google Scholar
  22. Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  23. Watson, R. (1818). Anecdotes of the life of Richard Watson … written by himself at different intervals, and revised in 1814. Published by his son, Richard Watson, L. L. B., prebendary of Landaff and Wells. London: T. Cadell and W. Davies.Google Scholar
  24. Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361–375.CrossRefGoogle Scholar
  25. Wordsworth, C. (1877). Scholae academicae. London: Frank Cass.Google Scholar
  26. Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  1. 1.Cambridge Exam ResearchCambridgeUK

Personalised recommendations