Comparative judgement for assessment
- 648 Downloads
Historically speaking, students were judged long before they were marked. The tradition of marking, or scoring, pieces of work students offer for assessment is little more than two centuries old, and was introduced mainly to cope with specific problems arising from the growth in the numbers graduating from universities as the industrial revolution progressed. This paper describes the principles behind the method of Comparative Judgement, and in particular Adaptive Comparative Judgement, a technique borrowed from psychophysics which is able to generate extremely reliable results for educational assessment, and which is based on the kind of holistic evaluation that we assume was the basis for judgement in pre-marking days, and that the users of assessment results expect our assessment schemes to capture.
KeywordsACJ Assessment Judgement Reliability Adaptive Comparative Judgement
- Adams, R. M. (1995). Analysing the results of cross-moderation studies. Paper presented at a seminar on comparability, held jointly by the SRAC of the GCE boards and the IGRC of the GCSE groups, London, October.Google Scholar
- Adams, R. (2007). Cross-moderation methods. In P. Newton, J. Baird, H. Patrick, H. Goldstein, P. Timms, & A. Wood (Eds.), Techniques for monitoring the comparability of examination standards. London, QCA. Available (26/09/2011) at: http://www.ofqual.gov.uk/files/2007-comparability-exam-standards-h-chapter6.pdf.
- Baker, E. L., Ayers, P., O’Neill, H. F., Choi, K., Sawyer, W., Sylvester., R. M., & Carroll, B. (2008). KS3 English test marker study in Australia. Final report to the National Assessment Agency of England, London, QCA.Google Scholar
- Bramley, T., (2007). Paired comparison methods. In P. Newton, J. Baird, H. Patrick, H. Goldstein, P. Timms, & A. Wood (Eds). Techniques for monitoring the comparability of examination standards. London, QCA. Available (26/09/2011) at: http://www.ofqual.gov.uk/files/2007-comparability-exam-standards-i-chapter7.pdf.
- Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley Lawrence Erlbaum Associates.Google Scholar
- D’Arcy, J. (Ed.). (1997). Comparability studies between modular and non-modular syllabuses in GCE Advanced level biology, English literature and mathematics in the 1996 summer examinations. Standing Committee on Research on behalf of the Joint Forum for the GCSE and GCE.Google Scholar
- Haley, C., & Wothers, P. (2005). In M. D. Archer, & C. D. Haley (Eds.), The 1702 chair of chemistry at Cambridge. Cambridge: CUP.Google Scholar
- Kelly, G. A. (1955). The psychology of personal constructs (Vol. I and II). New York: Norton.Google Scholar
- Kimbell, R., Wheeler, T., Stables, K., Sheppard, T., Martin, F., Davies, D., Pollitt, A., & Whitehouse, G. (2009). e-scape portfolio assessment: phase 3 report. London: Technology Education Research Unit, Goldsmiths, UL. http://www.gold.ac.uk/teru/projectinfo/projecttitle,5882,en.php.
- Linacre, J. M. (1994). Many-facet Rasch measurement, 2nd ed. Chicago: MESA Press.http://www.rasch.org/books.htm.
- Miyazaki, I. (1963). China’s examination hell: the civil service examinations of imperial China (C. Schirokauer (1976), Trans). New York: Weatherhill.Google Scholar
- Pollitt, A., & Elliott, G. (2003). Monitoring and investigating comparability: A proper role for human judgement. Invited paper, QCA comparability seminar, Newport Pagnall. Qualifications and curriculum authority, London. Available at: http://www.camexam.co.uk/.
- Pollitt, A., & Murray, N. L. (1993). What raters really pay attention to language testing research colloquium, Cambridge (Reprinted from M. Milanovic, & N. Saville (Eds.), 1996, Studies in language testing 3: Performance testing, cognition and assessment. Cambridge: Cambridge University Press).Google Scholar
- QCDA. (2011). Importance of design and technology key stage 3. http://www.education.gov.uk/schools/teachingandlearning/curriculum/secondary/b00199489/dt/programme. Accessed: 9 Dec 2011.
- Rasch, G. (1960/1980). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research. Reprinted as 2nd ed., 1980, Chicago: University of Chicago Press.Google Scholar
- Shavelson, R., & Webb, N. (2000). Generalizability theory. In J. L. Green, G. Camilli, & P. B. Elmore (Eds.), Handbook of complementary methods in education research, Chapter 18. London: Lawrence Erlbaum Associates.Google Scholar
- Thurstone, L. L. (1927b). A law of comparative judgment. Psychological Review, 34, 273–286 (Reprinted as Chapter 3 from Thurstone, L. L. (1959). The measurement of values. Chicago, IL: University of Chicago Press).Google Scholar
- Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- Watson, R. (1818). Anecdotes of the life of Richard Watson … written by himself at different intervals, and revised in 1814. Published by his son, Richard Watson, L. L. B., prebendary of Landaff and Wells. London: T. Cadell and W. Davies.Google Scholar
- Wordsworth, C. (1877). Scholae academicae. London: Frank Cass.Google Scholar
- Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press. http://www.rasch.org/books.htm.