Skip to main content
Log in

Using the method of pairwise comparison to obtain reliable teacher assessments

  • Published:
The Australian Educational Researcher Aims and scope Submit manuscript

Abstract

Demands for accountability have seen the implementation of large scale testing programs in Australia and internationally. There is, however, a growing body of evidence to show that externally imposed testing programs do not have a sustained impact on student achievement. It has been argued that teacher assessment is more effective in raising student achievement levels. However, it is also often argued that teacher assessments are less reliable than the results of testing programs. This paper presents a study in which teachers judged writing scripts using the process of pairwise comparison to generate a scale. The analysis showed high internal consistency of the teacher judgements. The scale locations from pairwise comparisons were highly correlated with scale estimates for the same students from a large-scale testing program. The results demonstrate it is possible to efficiently obtain highly reliable and valid teacher judgements using the process of pairwise comparison. Reliability indices are also provided for a series of small-scale assessments that used the same methodology in a range of other domains. The results support the findings of the main study. The article discusses the benefits of using the method to supplement and validate results from large-scale testing programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrich, D. (1978a). Relationships between the Thurstone and Rasch approaches to item scaling.Applied Psychological Measurement, 2(3), 449–460.

    Article  Google Scholar 

  • Andrich, D. (1978b). A rating formulation for ordered response categories.Psychometrika, 43, 561–73.

    Article  Google Scholar 

  • Andrich, D. (1988).Rasch models for measurement. Beverly Hills: Sage Publications.

    Google Scholar 

  • Andrich, D. (2006).A report to the Curriculum Council regarding assessment for tertiary selection. Perth: Curriculum Council of Western Australia. [Available from: www.curriculum.wa.edu.au/internet/_Documents/Publications/Andrich+Report.pdf].

    Google Scholar 

  • Andrich, D., & Luo, G. (2003). Conditional Pairwise estimation in the Rasch model for ordered response categories using principle components.Journal of Applied Measurement, 4, 205–221.

    Google Scholar 

  • Black, P., & Wiliam, D. (1998). Assessment and classroom learning.Assessment in Education: Principles, Policy & Practice, 5(1), 7–74.

    Article  Google Scholar 

  • Bock, D. (1997). A brief history of item response theory.Educational Measurement: Issues and Practice, 16, 21–33.

    Article  Google Scholar 

  • Bond, T., & Caust, M. (2005, November).Silk purses from sows’ ears? Making measures from teacher judgements. Paper presented at the Australian Association for Research in Education Conference, Sydney [published January 2006].

  • Bond, T. G., & Fox, C. M. (2001).Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.

    Google Scholar 

  • Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs, I. The method of paired comparisons.Biometrika, 39, 324–345.

    Google Scholar 

  • Bramley, T., Bell, J. F., & Pollitt, A. (1998). Assessing changes in standards over time using Thurstone’s paired comparisons.Education Research and Perspectives, 2, 1–23.

    Google Scholar 

  • Brookhart, S. M. (2003). Developing measurement theory for classroom assessment purposes and uses.Educational Measurement: Issues and Practice, 22(4), 5–12

    Article  Google Scholar 

  • Chudowsky, N., & Pellegrino, J. W. (2003). Large-scale assessments that support learning: what will it take?Theory into Practice, 42(1), 75–83.

    Article  Google Scholar 

  • Clarke, S., & Gipps, C. (2000). The role of teachers in teacher assessment in England, 1996–1998,Evaluation and Research in Education, 14(1), 38–52.

    Article  Google Scholar 

  • Department of Education and Training, Western Australia (1997).First Steps Writing Developmental Continuum. Richmond, Australia: Heinemann.

    Google Scholar 

  • Gregory, K., & Clarke, M. (2003). High-stakes assessment in England and Singapore.Theory into Practice, 42(1), 66–74.

    Article  Google Scholar 

  • Groves, P. (2002). “Doesn’t it feel morbid here?” High stakes testing and the widening of the equity gap.Educational Foundations, 16(2), 15–31.

    Google Scholar 

  • Gunzenhauser, M. (2003). High-stakes testing and the default philosophy of education.Theory into Practice, 42(1), 51–58.

    Article  Google Scholar 

  • Holme, B., & Humphry, S.M. (2008).PairWise software. Perth: University of Western Australia.

    Google Scholar 

  • Louden, B., Chapman, E., Clarke, S., Cullity, M., & House, H. (2006).Evaluation of the Curriculum Improvement Program Phase 2. Report for the Department of Education and Training prepared in the Graduate School of Education, University of Western Australia. Accessed January 10, 2009, from http://www.det.wa.edu.au/education/ accountability/docs/curriculumreport.pdf

  • Luce, R. D. (1959).Individual Choice Behaviours: A theoretical analysis. New York: J. Wiley.

    Google Scholar 

  • Luke, A., & Woods, A. (2007). Learning lessons: What No Child Left Behind can teach us about literacy, testing and accountability.QTU Professional Magazine, November, 5–9.

  • Masters, G. N. (1982). A Rasch model for partial credit scoring.Psychometrika, 47, 149–174.

    Article  Google Scholar 

  • Ministerial Council for Education, Employment, Training and Youth Affairs (2008).National declaration on educational goals for young Australians. Retrieved December 12, 2008, from http://www.mceetya.edu.au/mceetya/natgoals,24767.html

  • Performance Measurement Review Taskforce.A paper about the benefits of participating in national assessments. Retrieved December 12, 2008, from http://www.curriculum .edu.au/verve/_resources/Benefits_of_participation_in_national_assessments1.pdf

  • Rasch, G. (1961/1980). On General Laws and the Meaning of Measurement in Psychology. In J. Neyman (Ed.),Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 4: Contributions to Biology and Problems of Medicine, pp. 321–333. Berkeley: University of Chicago Press. [Available from: http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.bsmsp/1200512872]

    Google Scholar 

  • Shepard, L. A. (2003). The hazards of high stakes testing.Issues in Science and Technology, 19(2), 53–58

    Google Scholar 

  • Sloane, F. C., & Kelly, A. E. (2003). Issues in high-stakes testing programs.Theory into Practice, 42(1), 12.

    Article  Google Scholar 

  • Smith, A. B., Rush, R., Fallowfield, L. J., Velikova, G., & Sharpe, M. (2008). Rasch fit statistics and sample size considerations for polytomous data.Medical Research Methodology, 8, 33.

    Article  Google Scholar 

  • Stiggins, R. J. (2001).The unfulfilled promise of classroom assessment.Educational Measurement: Issues and Practice, 20(3) 5–15.

    Article  Google Scholar 

  • Thurstone, L. L. (1927). A law of comparative judgement.Psychological Review, 34, 278–286.

    Google Scholar 

  • Thurstone, L. L. (1928). Attitudes can be measured.American Journal of Sociology, 33, 529–54.

    Article  Google Scholar 

  • Thurstone, L. L (1959).The measurement of values. Chicago, USA: The University of Chicago Press.

    Google Scholar 

  • Wright, B. D., & Masters, G. N. (1982).Rating scale analysis. Chicago: MESA Press.

    Google Scholar 

  • Wright, B. D., & Stone, M. H. (1979).Best test design. Chicago, IL: MESA Press.

    Google Scholar 

  • Wyatt-Smith, C. (2000). Exploring the relationship between large-scale literacy testing programs and classroom-based assessment: A focus on teachers’ accounts.Australian Journal of Language and Literacy, 23(2), 109–127.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Heldsinger, S., Humphry, S. Using the method of pairwise comparison to obtain reliable teacher assessments. Aust. Educ. Res. 37, 1–19 (2010). https://doi.org/10.1007/BF03216919

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03216919

Keywords

Navigation