Skip to main content
Log in

ASSESSING MATHEMATICAL PROBLEM SOLVING USING COMPARATIVE JUDGEMENT

  • Published:
International Journal of Science and Mathematics Education Aims and scope Submit manuscript

Abstract

There is an increasing demand from employers and universities for school leavers to be able to apply their mathematical knowledge to problem solving in varied and unfamiliar contexts. These aspects are however neglected in most examinations of mathematics and, consequentially, in classroom teaching. One barrier to the inclusion of mathematical problem solving in assessment is that the skills involved are difficult to define and assess objectively. We present two studies that test a method called comparative judgement (CJ) that might be well suited to assessing mathematical problem solving. CJ is an alternative to traditional scoring that is based on collective expert judgements of students’ work rather than item-by-item scoring schemes. In study 1, we used CJ to assess traditional mathematics tests and found it performed validly and reliably. In study 2, we used CJ to assess mathematical problem-solving tasks and again found it performed validly and reliably. We discuss the implications of the results for further research and the implications of CJ for the design of mathematical problem-solving tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • ACT (2006). Ready for college and ready for work: Same or different? Iowa: American College Tests, INC.

    Google Scholar 

  • AQA (2010). GCSE higher tier mathematics paper 1 (Specification A). Monday 7 June 2010. Manchester: Assessment and Qualifications Alliance.

    Google Scholar 

  • Black, P. (2008). Strategic decisions: Ambitions, feasibility and context. Educational Designer, 1(1).

  • Black, P. & Wiliam, D. (2007). Large-scale assessment systems: Design principles drawn from international comparisons. Measurement: Interdisciplinary Research and Perspectives, 5, 1–53.

    Google Scholar 

  • Black, P., Burkhardt, H., Daro, P., Jones, I., Lappan, G., Pead, D. & Stephens, M. (2012). High-stakes examinations to support policy. Educational Designer, 2(5).

  • Bond, T. G. & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences. Abingdon: Routledge.

    Google Scholar 

  • Bramley, T., Bell, J. & Pollitt, A. (1998). Assessing changes in standards over time using Thurstone paired comparisons. Education Research and Perspectives, 25, 1–24.

    Google Scholar 

  • CBI (2006). Working with the three Rs: Employers’ priorities for functional skills in Mathematics and English. London: DfES.

    Google Scholar 

  • Davies, D., Collier, C. & Howe, A. (2012). Assessing scientific and technological enquiry skills at age 11 using the e-scape system. International Journal of Technology and Design Education, 22, 247–263.

    Article  Google Scholar 

  • Derrick, K. (2012). Developing the e-scape software system. International Journal of Technology and Design Education, 22, 171–185.

    Article  Google Scholar 

  • DfE. (2011). Independent evaluation of the pilot of the linked pair of GCSEs in Mathematics—First interim report (No. DFE-RR181). London: Department for Education.

    Google Scholar 

  • Heldsinger, S. & Humphry, S. (2010). Using the method of pairwise comparison to obtain reliable teacher assessments. The Australian Educational Researcher, 37, 1–19.

    Article  Google Scholar 

  • Husbands, C. T. (1976). Ideological bias in the marking of examinations: A method of testing for its presence and its implications. Research in Education, 15, 17–38.

    Google Scholar 

  • Jürges, H., Schneider, K., Senkbeil, M. & Carstensen, C. H. (2012). Assessment drives learning: The effect of central exit exams on curricular knowledge and mathematical literacy. Economics of Education Review, 31, 56–65.

    Article  Google Scholar 

  • Kimbell, R. (2012). Evolving project e-scape for national assessment. International Journal of Technology and Design Education, 22, 135–155.

    Article  Google Scholar 

  • Laming, D. (1984). The relativity of “absolute” judgements. British Journal of Mathematical and Statistical Psychology, 37, 152–183.

    Article  Google Scholar 

  • Laming, D. (1990). The reliability of a certain university exam compared with the precision of absolute judgements. The Quarterly Journal of Experimental Psychology Section A: Human Experimental Psychology, 42, 239–254.

    Article  Google Scholar 

  • Looney, J. (2009). Assessment and innovation in education. Paris: OECD.

    Book  Google Scholar 

  • Murphy, R. (1979). Removing the marks from exam scripts before re‐marking them: Does it make any difference? British Journal of Educational Psychology, 49, 73–78.

    Article  Google Scholar 

  • Murphy, R. (1982). A further report of investigations into the reliability of marking of GCE examinations. British Journal of Educational Psychology, 52, 58–63.

    Article  Google Scholar 

  • NCETM (2009). Mathematics matters: Final report. London: National Centre for Excellence in the Teaching of Mathematics.

    Google Scholar 

  • Newton, P. (1996). The reliability of marking of General Certificate of Secondary Education scripts: Mathematics and English. British Educational Research Journal, 22, 405–420.

    Article  Google Scholar 

  • NGA and CCSSO (2010). Common core state standards for Mathematics. Washington, DC: National Governors Association and Council of Chief State School Officers.

    Google Scholar 

  • Noyes, A., Wake, G., Drake, P. & Murphy, R. (2011). Evaluating Mathematics pathways final report (Technical Report No. DFE-RR143). London: Department for Education.

    Google Scholar 

  • OECD (2009a). PISA 2009 results: Learning trends: Changes in student performance since 2000 (Volume V). Paris: OECD.

    Google Scholar 

  • OECD (2009b). PISA 2009 assessment framework: Key competencies in reading, Mathematics and Science. Paris: OECD.

    Google Scholar 

  • Ofsted (2008). Mathematics: Understanding the score. London: Office for Standards in Education.

    Google Scholar 

  • Ofsted (2012). Mathematics: Made to measure. London: The Office for Standards in Education.

    Google Scholar 

  • Pollitt, A. (2012). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 19, 281–300.

    Article  Google Scholar 

  • Pollitt, A. & Murray, N. (1996). What raters really pay attention to. In M. Milanovic & N. Saville (Eds.), Performance testing, cognition and assessment: Selected papers from the 15th language testing research colloquium. Cambridge: Cambridge University Press.

    Google Scholar 

  • QCA (2008). National Curriculum for England 2008. London: Qualifications and Curriculum Authority.

    Google Scholar 

  • Rocard, M. (2007). Science education now: A renewed pedagogy for the future of Europe. Brussels: European Commission (Technical Report No. EUR22845). Retrieved from http://ec.europa.eu/research/science-society/document_library/pdf_06/report-rocard-on-science-education_en.pdf

  • Soh, C. K. (2008). An overview of mathematics education in Singapore. In Z. Usiskin & E. Willmore (Eds.), Mathematics curriculum in Pacific Rim countries (pp. 23–36). Mississippi: Information Age.

    Google Scholar 

  • Suto, W. M. I. & Greatorex, J. (2008). What goes through an examiner’s mind? Using verbal protocols to gain insights into the GCSE—marking process. British Educational Research Journal, 34, 213–233.

    Article  Google Scholar 

  • Swan, M. & Burkhardt, H. (2012). Designing assessment of performance in mathematics. Educational Designer, 2(5).

  • Swan, M. & Pead, D. (2008). Bowland Maths Professional development resources. Bowland Trust/Department for Children, Schools and Families. Retrieved from www.bowlandmaths.org.uk

  • Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.

    Article  Google Scholar 

  • Thurstone, L. L. (1954). The measurement of values. Psychological Review, 61, 47–58.

    Article  Google Scholar 

  • Toner, P. (2011). Workforce Skills and Innovation (OECD Education Working Papers). Paris: Organisation for Economic Co-operation and Development.

    Google Scholar 

  • Treilibs, V. (1979). Formulation processes in mathematical modelling. Nottingham: Unpublished MPhil, University of Nottingham.

    Google Scholar 

  • Vordermann, C., Porkess, R., Budd, C., Dunne, R. & Rahman-Hart, P. (2011). A world-class mathematics education for all our young people. London: The Conservative Party.

    Google Scholar 

  • Walport, M., Goodfellow, J., McLoughlin, F., Post, M., Sjøvoll, J., Taylor, M. & Waboso, D. (2010). Science and mathematics secondary education for the 21st century: Report of the science and learning expert group. London: Department for Business, Industry and Skills.

    Google Scholar 

  • Willmott, A. S. & Nuttall, D. L. (1975). The reliability of examination at 16+. London: Macmillan Education.

    Google Scholar 

  • Wright, B. D. & Masters, G. N. (1982). Rating scale analysis: Rasch measurement. Chicago, IL: MESA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ian Jones.

Appendix

Appendix

Table 2 Full details of scaled rank order data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jones, I., Swan, M. & Pollitt, A. ASSESSING MATHEMATICAL PROBLEM SOLVING USING COMPARATIVE JUDGEMENT. Int J of Sci and Math Educ 13, 151–177 (2015). https://doi.org/10.1007/s10763-013-9497-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10763-013-9497-6

Key words

Navigation